Wednesday, January 25, 2017

Learn Big Data : Hive



Hive is a query language developed by facebook. Hadoop can give support to any kind of data
·         Structured data like database tables
·         Unstructured data like videos, audios, pdf, txt files etc
·         Semi structured data like xml
Hadoop supports HiveQL.
Difference between SQL & HiveQL

  • In SQL we can insert data values row by row but not in HQL
  • In SQL we can update any row or column but not in HQL because data is stored in hdfs, after putting data into hdfs you shouldn’t change the contents of data.
  • In SQL we can use delete but not in HQL.
  • In HIVE every table is created as a directory.

HQL datatypes
Like other rdbms (Oracle, mysql, sql server), it also has databases
TinyInt
Float
Map
ShortInt
Double
Array
BigInt
String
Struct

Here map, array, struct are called collection datatypes.
Creating hive tables:
Hive tables can be created two ways :
1.       Managed tables or Internal tables
2.       External tables
Managed tables or Internal tables:
user@machine:~$ hive
                hive> create table employee(id int, name string,salary float)
                >row format delimited
                >fields terminated by ‘\t’;


Important points:
Ø  String can contain any kind of data
Ø  In SQL if you want to insert data you have to first create schema or table but in HQL you can either create table and insert data or you can insert data and then create table.
Ø  If you will apply ; after table column in create statement in HQL, it will give you null,null but not actual data but it will not give you any error, so you need to write delimiter & terminated line.
Loading data into HIVE tables:
Data can be loaded two ways-
Either from local file system or from hdfs
Loading data from local file system:
hive>load data local inpath <filepath> into table <tablename>
Loading data from hdfs:
hive>load data inpath <filepath> into table <tablename>
Ø  If it is a local file system the default path is home/user
Ø  If it is hdfs, it is user/user
Here we will hear few words like metadata which means data about the data & metastore which means keeping metadata to store.
External Tables:
hive> create external table employeeE(id int, name string,salary float)
                >row format delimited
                >fields terminated by ‘\t’
                >location “/vimal/newfolder”;
Concept:
Ø  If we are creating internal tables the table name is created as a directory on warehouse. If we are creating external tables the table name will never be created as a directory name but is just trying to refresh some location /vimal/newfolder.
Ø  For global usage you can refer external table but not internal table.

Intenal Table:
/user / hive / warehouse
                                           employee  (directory)
                                                                employee (file)
                                                                employee1 (file)
External Table:
/vimal / newfolder /
                                employee (file)

No comments:

Post a Comment