ZooKeeper: Distributed Coordination Service

Apache ZooKeeper (latest version 3.5.0 ) is an open-source distributed coordination system for maintaining centralize configuration information, naming service, providing distributed synchronization that was originally developed at Yahoo and written in Java. Back in 2006, Google published a paper on “Chubby“, ZooKeeper, not surprisingly, is a close clone of Chubby.

Read More…

Hadoop HDFS High Availability

See: The Glossary

Prior to Hadoop 2.x (Hadoop 1.x), the NameNode was a single point of failure (SPOF) in an HDFS cluster. Each cluster had a single NameNode, and if that machine or process became unavailable, the cluster as a whole would be unavailable until the NameNode was either restarted or brought up on a separate machine.

This reduced the total availability of the HDFS cluster in two major ways:

Read More…