Apache Pig Tutorial on Apache Pig Running Scripts

Back to Course

Apache Pig Introduction

Apache Pig Overview

Read

Apache Pig Architecture

Read

Apache Pig Environment

Apache Pig Installation

Read

Apache Pig Execution

Read

Apache Pig Grunt Shell

Read

Pig Latin

Pig Latin Ã¢ÂÂ Basics

Read

Load & Store Operators

Apache Pig Reading Data

Read

Apache Pig Storing Data

Read

Diagnostic Operators

Apache Pig Diagnostic Operators

Read

Apache Pig Describe Operator

Read

Apache Pig Explain Operator

Read

Apache Pig Illustrate Operator

Read

Grouping & Joining

Apache Pig Group Operator

Read

Apache Pig Cogroup Operator

Read

Apache Pig Join Operator

Read

Apache Pig Cross Operator

Read

Combining & Splitting

Apache Pig Union Operator

Read

Apache Pig Split Operator

Read

Apache Pig Filter Operator

Read

Apache Pig Distinct Operator

Read

Apache Pig Foreach Operator

Read

Apache Pig Order By

Read

Apache Pig Limit Operator

Read

Pig Latin BuiltIn Functions

Apache Pig Eval Functions

Read

Apache Pig Load & Store Functions

Read

Apache Pig Bag & Tuple Functions

Read

Apache Pig String Functions

Read

Apache Pig Datetime Functions

Read

Apache Pig Math Functions

Read

Other Modes Of Execution

Apache Pig Running Scripts

Read

Apache Pig Quick Guide

Read

Apache Pig Useful Resources

Read

Discuss Apache Pig

Read

here in this chapter, we will see how how to run apache pig scripts in batch mode.

comments in pig script

while writing a script in a file, we can include comments in it as shown below.

multi-line comments

we will begin the multi-line comments with '/*', end them with '*/'.

/* these are the multi-line comments 
  in the pig script */

single –line comments

we will begin the single-line comments with '--'.

--we can write single line comments like this.

executing pig script in batch mode

while executing apache pig statements in batch mode, follow the steps given below.

step 1

write all the required pig latin statements in a single file. we can write all the pig latin statements and commands in a single file and save it as .pig file.

step 2

execute the apache pig script. you can execute the pig script from the shell (linux) as shown below.

local mode	mapreduce mode
$ pig -x local sample_script.pig	$ pig -x mapreduce sample_script.pig

you can execute it from the grunt shell as well using the exec command as shown below.

grunt> exec /sample_script.pig

executing a pig script from hdfs

we can also execute a pig script that resides in the hdfs. suppose there is a pig script with the name sample_script.pig in the hdfs directory named /pig_data/. we can execute it as shown below.

$ pig -x mapreduce hdfs://localhost:9000/pig_data/sample_script.pig

example

assume we have a file student_details.txt in hdfs with the following content.

student_details.txt

001,rajiv,reddy,21,9848022337,hyderabad 
002,siddarth,battacharya,22,9848022338,kolkata
003,rajesh,khanna,22,9848022339,delhi 
004,preethi,agarwal,21,9848022330,pune 
005,trupthi,mohanthy,23,9848022336,bhuwaneshwar 
006,archana,mishra,23,9848022335,chennai 
007,komal,nayak,24,9848022334,trivendram 
008,bharathi,nambiayar,24,9848022333,chennai

we also have a sample script with the name sample_script.pig, in the same hdfs directory. this file contains statements performing operations and transformations on the student relation, as shown below.

student = load 'hdfs://localhost:9000/pig_data/student_details.txt' using pigstorage(',')
   as (id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray);
	
student_order = order student by age desc;
  
student_limit = limit student_order 4;
  
dump student_limit;

the first statement of the script will load the data in the file named student_details.txt as a relation named student.
the second statement of the script will arrange the tuples of the relation in descending order, based on age, and store it as student_order.
the third statement of the script will store the first 4 tuples of student_order as student_limit.
finally the fourth statement will dump the content of the relation student_limit.

let us now execute the sample_script.pig as shown below.

$./pig -x mapreduce hdfs://localhost:9000/pig_data/sample_script.pig

apache pig gets executed and gives you the output with the following content.

(7,komal,nayak,24,9848022334,trivendram)
(8,bharathi,nambiayar,24,9848022333,chennai) 
(5,trupthi,mohanthy,23,9848022336,bhuwaneshwar) 
(6,archana,mishra,23,9848022335,chennai)
2015-10-19 10:31:27,446 [main] info  org.apache.pig.main - pig script completed in 12
minutes, 32 seconds and 751 milliseconds (752751 ms)

Previous Lesson

Next Lesson