HDFS : Hadoop Distributed File System

The Hadoop Distributed File System (HDFS) is the primary data storage system used by Hadoop applications. It employs a NameNode and DataNode architecture to implement a distributed file system that provides high-performance access to data across highly scalable Hadoop clusters.

HDFS is a key part of the many Hadoop ecosystem technologies, as it provides a reliable means for managing pools of big data and supporting related big data analytics applications.

How HDFS works
HDFS supports the rapid transfer of data between compute nodes. At its outset, it was closely coupled with MapReduce, a programmatic framework for data processing.

When HDFS takes in data, it breaks the information down into separate blocks and distributes them to different nodes in a cluster, thus enabling highly efficient parallel processing.

Moreover, the Hadoop Distributed File System is specially designed to be highly fault-tolerant. The file system replicates, or copies, each piece of data multiple times and distributes the copies to individual nodes, placing at least one copy on a different server rack than the others. As a result, the data on nodes that crash can be found elsewhere within a cluster. This ensures that processing can continue while data is recovered.

HDFS uses master/slave architecture. In its initial incarnation, each Hadoop cluster consisted of a single NameNode that managed file system operations and supporting DataNodes that managed data storage on individual compute nodes. The HDFS elements combine to support applications with large data sets.

This master node "data chunking" architecture takes as its design guides elements from Google File System (GFS), a proprietary file system outlined in in Google technical papers, as well as IBM's General Parallel File System (GPFS), a format that boosts I/O by striping blocks of data over multiple disks, writing blocks in parallel. While HDFS is not Portable Operating System Interface model-compliant, it echoes POSIX design style in some aspects.

Continue Reading... this at techtarget

Comments

Popular posts from this blog

Understanding Testing in Apex

5 Ways Blockchain Technology Will Change the Way We Do Business

Microsoft Azure Migration Guide - Key Planning and Decision Points Before Migrating to Azure