3. What is HDFS?
The Hadoop Distributed
File System (HDFS) is the primary storage system used by Hadoop applications.
HDFS is a distributed file system that provides high-performance access to data
across Hadoop clusters.
HDFS uses a master/slave architecture in which one device (the master) controls
one or more other devices (the slaves). The HDFS cluster consists of a single
NameNode and a master server manages the file system namespace and regulates
access to files.
HDFS typically is deployed on
low-cost commodity hardware,
server failures are common. The file system is designed to be highly fault-tolerant,
however, by facilitating the rapid transfer of data between compute nodes and
enabling Hadoop systems to continue running if a node fails.
That decreases the risk of catastrophic
failure, even in the event that numerous nodes fail.
When we copy data to HDFS, it breaks down the
information into multiple pieces and distributes them to different nodes in a
cluster, allowing for parallel. The file system also copies each piece of data
multiple times and distributes the copies to individual nodes, placing at least
one copy on a different server rack than
the others. As a result, the data on nodes that crash can be found elsewhere
within a cluster, which allows processing to continue while the failure is
resolved.
HDFS is built to support applications with
large data sets, including individual files that reach into the terabytes. It
uses a master/slave architecture, with each cluster consisting of a single
NameNode that manages file system operations and supporting DataNodes that
manage data storage on individual compute nodes.
Comments
Post a Comment