What is Sqoop?
What is Sqoop?
As we all know Relational
databases are the main data sources for Big Data, and Hadoop is a framework
which we use to analyze big data. So Sqoop is a tool which imports the data
from Relational databases to Hadoop HDFS and also exports the data from Hadoop
HDFS to Relational databases.
Relational databases can be
MySQL, PostgreSQL ,Oracle and Redshift or any other RDBMS.
Sqoop uses MapReduce
to import and export the data, which provides parallel operation as well as
fault tolerance.
Sqoop is an open
source software product of the Apache Software Foundation.
Prerequisites
Before we start
with Sqoop following prerequisite knowledge is required to run Sqoop jobs:
·
Basic knowledge of linux operating system with commands
·
Concepts of Relational database management systems
·
Concepts of Hadoop and HDFS, with basic commands
Starting with Sqoop
:-
Let’s start with very basic and important command which will
tell you about all the available commands of Sqoop
$ sqoop help
Available commands:
codegen Generate code to interact with
database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the
results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to
HDFS
import-mainframe Import datasets from a mainframe server to
HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
In next blog will see how to use all above Sqoop commands
one by one.
Comments
Post a Comment