Glyn Heatley, Cervello’s Analytics and Information Management Co- Practice Lead speaks to Dan Sheehan a Senior Consultant within the practice about his interesting side project.

1. What is this thing that you have built and looks like a fire hazard in the office?!
“We have built a Hadoop cluster in our office. This particular Hadoop cluster is comprised of multiple Raspberry Pi computers. When the cluster is completed we plan on having a 5 node cluster (1 master node and 4 slave nodes).”

2. What made you think of the idea?
“We kept hearing about “Hadoop this” and “Hadoop that”. Other consultants are working on projects that leverage Hadoop, which piqued our interest in building a cluster to tinker with. After looking into a number of options we realized that the cheapest (and most fun) way to accomplish building a Hadoop cluster was to use the surprisingly powerful, Raspberry Pi 2 computers.”

3. Why is it interesting to you? To others?
“To start with, I am a huge nerd. I like tinkering and building in both software and hardware. As team members of the practice, we are encouraged to constantly innovate and learn about new technologies that we can bring to our client projects. Several people in the office had heard enough about Hadoop that they were interested in at least following along on the hardware and software configuration as we pulled back the curtain on the solution a bit.”

4. What did you do to get it up and running?
“First of all, we are not the first people to do this. Since the Raspberry Pi came out there have been people networking them together to build computing clusters of one form or another. More recently a number of blogs have gone through step by step configurations on how to set up a Raspberry Pi Hadoop cluster.

The setup consisted of installing the Hadoop distribution on each node, and then making a number of configuration changes to allow the nodes to all communicate with each other properly.

The main difference between what we did, and what various blogs have shown is that our Hadoop cluster is running on the newer, more powerful, Raspberry Pi 2.”

5. What sample data are you loading into it?
“For testing, we have been using the texts of various books from the project Gutenberg website. The Hadoop distribution we have installed includes a test routine that performs a word count on a given flat file.

After running a few tests we see that our Hadoop cluster significantly out performs similar clusters built using the original Raspberry Pi’s.”

6. How are you planning to use it? Part of a project? Part of an architecture?
“For now, I think we will continue to use the Hadoop cluster as a way to learn about the Hadoop environment and educate others, including our clients.

Having it locally allows us to tweak settings to see how performance is impacted.

As of today, we do not have any plans to roll this Hadoop cluster into a larger architecture, but we would certainly consider it if the right opportunity arises!”

7. Now that you’ve got it up and running what do you think about it? What do you like, how do you think it can be improved? How hard was it to do?
“I think this exercise was and continues to be a great learning experience. We are still in the process of tweaking settings and trying to optimize the cluster’s performance. I think building it gave the team a unique opportunity to see how a system like Hadoop functions at a physical level, which helps with understanding it in a more abstract way.

Overall, this build wasn’t as difficult as I had expected it to be. As with many Raspberry Pi projects, countless other skilled people have done this project and similar projects, and they were kind enough to share their steps and missteps.”

8. What other similar technologies would you could compare it to?
“We could compare this project to any distributed computing system. The hardware architecture would be similar no matter what we were installing on each node.

Taking a slightly different angle, this project also opens up the whole world of Raspberry Pi projects to us. The small, single purpose computer has been shown to be plenty powerful to be used in many different applications. During the project, we also took some time out to install the included version of Windows 10 onto one of the Pi’s to see what it could do.”

 

Hadoop LogoDo you have experience building a Hadoop cluster using Raspberry Pi2? If so, tell us what you discovered. Leave a comment. We’d love to hear from you.

 

Interested in learning more? Join our mailing list!

Authors:

Heatley_Glyn_1Glyn Heatley
Dan Sheehan

Leave a Reply

Your email address will not be published. Required fields are marked *

3 comments on “Cervello TechChat: Building a Hadoop Cluster using Raspberry Pi 2!

    • Thanks! From a real world performance perspective, a cluster like this is not very practical; it is not a high performance system. From an academic standpoint, the performance was pretty good. As stated in the post it outperformed similar systems built on the Raspberry Pi 1 when running benchmark tests with data from project Gutenberg. I can’t find the benchmark results at the moment, but here is the system we compared our results to: https://www.widriksson.com/raspberry-pi-hadoop-cluster/#Fourth_run

  1. Hi there!

    We are surprised to see that our ideas got in same lane. We took our Bachelor’s Final Year Project on 2nd of February, 2015 named “Data Stream Processing on Hadoop using Raspberry Pi 2 Cluster”. Reason I’m writing this comment, is just we wanted you to know we also did a lot of research on this. And also optimized the configuration to work great on Raspberry Pi 2.

    Thanks for putting this up.

    Regards,
    Maher Shahmeer