The NameNode is the most critical piece of the system of an HDFS file system. The NameNode manages the entire HDFS file system metadata (i.e owners of files, file permission, no of blocks, block locations, size etc.) and maintained it in main memory. Clients first contact point is the NameNode for file metadata and then perform actual file I/O directly with the DataNodes. If something goes wrong with the NameNode, then whatever metadata was there in main memory would get lost permanently.
Hadoop 2.0
Hadoop HDFS Federation
In Hadoop 1.x, there is only one NameNode (i.e allow only one Active NameNode) in a cluster, which maintains a single namespace (single directory structure) for the entire cluster. Regarding that Hadoop cluster is becoming larger and larger one enterprise platform and stores the entire file system metadata is in NameNode memory (RAM), when there are more data nodes with many files, the NameNode memory will reach its limit and it becomes the limiting factor for cluster scaling (limiting number of files store in the cluster). Hadoop 1.x, the namespace can only be vertically (add more RAM) scaled on a single NameNode.
YARN : NextGen Hadoop Data Processing Framework
In this BigData world, massive data storage and faster processing is a big challenge. Hadoop is the solution to this challenge. Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters (thousands of machines) of commodity (low cost) hardware. Hadoop has two core components, HDFS and MapReduce. HDFS (Hadoop Distributed File System) store massive data into commodity machines in a distributed manner. MapReduce is a distributed data processing framework to work with this massive data.
Hadoop HDFS High Availability
Prior to Hadoop 2.x (Hadoop 1.x), the NameNode was a single point of failure (SPOF) in an HDFS cluster. Each cluster had a single NameNode, and if that machine or process became unavailable, the cluster as a whole would be unavailable until the NameNode was either restarted or brought up on a separate machine.