Hadoop is one of the most important tools used in the analysis and utilization of Big Data. It enables enterprises to utilize even the most complex data sets and initiatives, and to do so without incurring huge expenses. It’s highly scalable and flexible, enabling many organizations to build multi-million-dollar products by just building on the basic package.
However, this versatility and vastness of options has also created some confusion for many companies, the main being whether or not Hadoop should be implemented on-site or in the cloud. In making this decision, there are four primary considerations, and these are discussed in detail in the following paragraphs.
At the end of the day, each enterprise must make its choice based on what is most beneficial for them as a company.
Factors to consider when implementing Hadoop
The cloud has over the years been more prone to security breaches and hacks, with the penetration of iCloud from Apple earlier this year proving that no company, status notwithstanding, is completely invincible against hackers. In the same vein, data employed in Hadoop and stored in the cloud is never 100% safe.
Now, that’s not to say that there is absolutely no safety online. In truth, the chances of getting your data hacked into from the cloud are very slim. Majority of security systems are continually being updated to remain ahead of cyber-criminals, whose methods of attack are also being upgraded thanks to technological advancements available to everybody.
Exponentially, more cyber-attacks are curbed than those that happen to succeed. As a matter of fact, the Apple iCloud breach was as a result of individual password weaknesses rather than presence of vulnerabilities in their security systems in the enterprise as a whole.
Point of breach is not in the cloud
Data being accessed and run from the cloud is least secure, not while it is being kept in the cloud, but rather at the point of connection when uploading or downloading it. Many times the GCHQ and NSA have been able to obtain their data by targeting this point – where the data is not at its source and hasn’t reached its destination.
Given these facts, on-site Hadoop implementation provides a stronger security system, especially where data is used within a closed system. By limiting the number and identity of persons that can access an internal database, you reduce your chances of falling victim to a cyber-criminal or losing your data.
That being said, it is worth mentioning that your internal IT team bears the larger burden for information security, which means upgrades in your security systems will be more complex and more expensive for you as an organization.
The verdict: Based solely on the premise that Hadoop is maintained in a closed network, on-site solutions are more secure than in-cloud implementations. This is, however, not assured where there is an open system.
On-site installation, maintenance and upgrades - Hadoop implementation on-site by far exceeds the cost of cloud implementation. Enterprises would have to allocate substantial resources towards acquisition of servers to store the data. These servers’ processing power should be compatible with the requirements for running effective queries.
In addition, new, more complex servers demand more time input from your IT team in maintenance, implying that you may have to hire a few new people to be in charge of security and system upkeep.
Systems upgrades are also substantially costly. Frequently, where the level of performance or volume of storage must be increased, it would cost thousands of dollars to buy the required equipment. In addition, more physical space needed to store the new machines, including implementation of physical security systems, will cost a lot.
Cloud implementation - Cloud-based Hadoop implementation doesn’t raise any of the above costs, since many service providers require subscribers to pay a monthly fee according to their usage requirements. The monthly subscription will increase or decrease according to the amount of system resources being used up by any one client.
This means easy scalability – you need only purchase the next suitable package, and it needs no changes to your in-house team, if any. Where you are using Remote DBA Support, you’ll also need to simply subscribe to the next level of management according to your new needs. Should this change, you can unsubscribe and return to your normal package with ease. There isn’t need for a large investment in anything at the outset.
The Verdict: All factors considered, cloud-based implementation is cheaper.
3. Practical considerations
Cloud-based Hadoop implementation enables companies from anywhere to gain access to their data regardless of their physical location, provided there’s a connection where they are. You can therefore view progress, check reports or work from wherever you are, giving you a level of flexibility unmatched by on-site solutions.
In addition, remote services for database administration are only possible with cloud-based systems, meaning that you can’t take advantage of its cost savings unless you’re in the cloud.
Given the complexities and security considerations of internal security systems, on-site systems do not allow for remote access. Since the idea of on-site implementation has its basis in maintenance of a closed system, it would be illogical to enable remote access, and it would compromise the security benefit provided by onsite systems.
The Verdict: Cloud-based solutions, which do not limit where one must work from.
Conclusion – the Cloud wins
Considering ease of access, scalability and costs, Hadoop cloud implementation is the superior option compared to onsite implementation. The latter isn’t a bad option; however, companies going for onsite implementation must be aware of the cost implication of installation, upgrading and maintenance, which are much higher than the former option.
Onsite implementation benefits only go as far as security, but this is also contingent on the organizations physical and logical security systems. Where a company is dabbled in personal or highly classified data, on-site solutions would still be preferable, given that you have full control over access, and that the nature of data would make it highly targeted by cyber criminals.
Jenny Richards has worked in one of the top DBA experts team that has the training and the experience necessary for any big data job. Visit her blog for more information