What is Sqoop?

April 14, 2016

What is Sqoop?

As we all know Relational databases are the main data sources for Big Data, and Hadoop is a framework which we use to analyze big data. So Sqoop is a tool which imports the data from Relational databases to Hadoop HDFS and also exports the data from Hadoop HDFS to Relational databases.

Relational databases can be MySQL, PostgreSQL ,Oracle and Redshift or any other RDBMS.

Sqoop uses MapReduce to import and export the data, which provides parallel operation as well as fault tolerance.

Sqoop is an open source software product of the Apache Software Foundation.

Prerequisites

Before we start with Sqoop following prerequisite knowledge is required to run Sqoop jobs:

· Basic knowledge of linux operating system with commands

· Concepts of Relational database management systems

· Concepts of Hadoop and HDFS, with basic commands

Starting with Sqoop :-

Let’s start with very basic and important command which will tell you about all the available commands of Sqoop

$ sqoop help

Available commands:

codegen Generate code to interact with database records

create-hive-table Import a table definition into Hive

eval Evaluate a SQL statement and display the results

export Export an HDFS directory to a database table

help List available commands

import Import a table from a database to HDFS

import-all-tables Import tables from a database to HDFS

import-mainframe Import datasets from a mainframe server to HDFS

job Work with saved jobs

list-databases List available databases on a server

list-tables List available tables in a database

merge Merge results of incremental imports

metastore Run a standalone Sqoop metastore

version Display version information

In next blog will see how to use all above Sqoop commands one by one.

Search This Blog

Free Data Anlaytics

What is Sqoop?

Comments

Post a Comment

Popular posts from this blog

1. What is Big Data ?

How to install Cloudera QuickStart VM on VMware - Part1?

2. What is Hadoop ?