here in this chapter, we will see how how to run apache pig scripts in batch mode.
comments in pig script
while writing a script in a file, we can include comments in it as shown below.
multi-line comments
we will begin the multi-line comments with '/*', end them with '*/'.
/* these are the multi-line comments in the pig script */
single –line comments
we will begin the single-line comments with '--'.
--we can write single line comments like this.
executing pig script in batch mode
while executing apache pig statements in batch mode, follow the steps given below.
step 1
write all the required pig latin statements in a single file. we can write all the pig latin statements and commands in a single file and save it as .pig file.
step 2
execute the apache pig script. you can execute the pig script from the shell (linux) as shown below.
| local mode | mapreduce mode |
|---|---|
| $ pig -x local sample_script.pig | $ pig -x mapreduce sample_script.pig |
you can execute it from the grunt shell as well using the exec command as shown below.
grunt> exec /sample_script.pig
executing a pig script from hdfs
we can also execute a pig script that resides in the hdfs. suppose there is a pig script with the name sample_script.pig in the hdfs directory named /pig_data/. we can execute it as shown below.
$ pig -x mapreduce hdfs://localhost:9000/pig_data/sample_script.pig
example
assume we have a file student_details.txt in hdfs with the following content.
student_details.txt
001,rajiv,reddy,21,9848022337,hyderabad 002,siddarth,battacharya,22,9848022338,kolkata 003,rajesh,khanna,22,9848022339,delhi 004,preethi,agarwal,21,9848022330,pune 005,trupthi,mohanthy,23,9848022336,bhuwaneshwar 006,archana,mishra,23,9848022335,chennai 007,komal,nayak,24,9848022334,trivendram 008,bharathi,nambiayar,24,9848022333,chennai
we also have a sample script with the name sample_script.pig, in the same hdfs directory. this file contains statements performing operations and transformations on the student relation, as shown below.
student = load 'hdfs://localhost:9000/pig_data/student_details.txt' using pigstorage(',')
as (id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray);
student_order = order student by age desc;
student_limit = limit student_order 4;
dump student_limit;
the first statement of the script will load the data in the file named student_details.txt as a relation named student.
the second statement of the script will arrange the tuples of the relation in descending order, based on age, and store it as student_order.
the third statement of the script will store the first 4 tuples of student_order as student_limit.
finally the fourth statement will dump the content of the relation student_limit.
let us now execute the sample_script.pig as shown below.
$./pig -x mapreduce hdfs://localhost:9000/pig_data/sample_script.pig
apache pig gets executed and gives you the output with the following content.
(7,komal,nayak,24,9848022334,trivendram) (8,bharathi,nambiayar,24,9848022333,chennai) (5,trupthi,mohanthy,23,9848022336,bhuwaneshwar) (6,archana,mishra,23,9848022335,chennai) 2015-10-19 10:31:27,446 [main] info org.apache.pig.main - pig script completed in 12 minutes, 32 seconds and 751 milliseconds (752751 ms)