Apache Pig Tutorial on Apache Pig Explain Operator

the explain operator is used to display the logical, physical, and mapreduce execution plans of a relation.

syntax

given below is the syntax of the explain operator.

grunt> explain relation_name;

example

assume we have a file student_data.txt in hdfs with the following content.

001,rajiv,reddy,9848022337,hyderabad
002,siddarth,battacharya,9848022338,kolkata
003,rajesh,khanna,9848022339,delhi
004,preethi,agarwal,9848022330,pune
005,trupthi,mohanthy,9848022336,bhuwaneshwar
006,archana,mishra,9848022335,chennai.

and we have read it into a relation student using the load operator as shown below.

grunt> student = load 'hdfs://localhost:9000/pig_data/student_data.txt' using pigstorage(',')
   as ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );

now, let us explain the relation named student using the explain operator as shown below.

grunt> explain student;

output

it will produce the following output.

$ explain student;

2015-10-05 11:32:43,660 [main]
2015-10-05 11:32:43,660 [main] info  org.apache.pig.newplan.logical.optimizer
.logicalplanoptimizer -
{rules_enabled=[addforeach, columnmapkeyprune, constantcalculator,
groupbyconstparallelsetter, limitoptimizer, loadtypecastinserter, mergefilter, 
mergeforeach, partitionfilteroptimizer, predicatepushdownoptimizer,
pushdownforeachflatten, pushupfilter, splitfilter, streamtypecastinserter]}  
#-----------------------------------------------
# new logical plan: 
#-----------------------------------------------
student: (name: lostore schema:
id#31:int,firstname#32:chararray,lastname#33:chararray,phone#34:chararray,city#
35:chararray)
| 
|---student: (name: loforeach schema:
id#31:int,firstname#32:chararray,lastname#33:chararray,phone#34:chararray,city#
35:chararray)
    |   |
    |   (name: logenerate[false,false,false,false,false] schema:
id#31:int,firstname#32:chararray,lastname#33:chararray,phone#34:chararray,city#
35:chararray)columnprune:inputuids=[34, 35, 32, 33,
31]columnprune:outputuids=[34, 35, 32, 33, 31]
    |   |   | 
    |   |   (name: cast type: int uid: 31) 
    |   |   |     |   |   |---id:(name: project type: bytearray uid: 31 input: 0 column: (*))
    |   |   |     
    |   |   (name: cast type: chararray uid: 32)
    |   |   | 
    |   |   |---firstname:(name: project type: bytearray uid: 32 input: 1
column: (*))
    |   |   |
    |   |   (name: cast type: chararray uid: 33)
    |   |   |
    |   |   |---lastname:(name: project type: bytearray uid: 33 input: 2
	 column: (*))
    |   |   | 
    |   |   (name: cast type: chararray uid: 34)
    |   |   |  
    |   |   |---phone:(name: project type: bytearray uid: 34 input: 3 column:
(*))
    |   |   | 
    |   |   (name: cast type: chararray uid: 35)
    |   |   |  
    |   |   |---city:(name: project type: bytearray uid: 35 input: 4 column:
(*))
    |   | 
    |   |---(name: loinnerload[0] schema: id#31:bytearray)
    |   |  
    |   |---(name: loinnerload[1] schema: firstname#32:bytearray)
    |   |
    |   |---(name: loinnerload[2] schema: lastname#33:bytearray)
    |   |
    |   |---(name: loinnerload[3] schema: phone#34:bytearray)
    |   | 
    |   |---(name: loinnerload[4] schema: city#35:bytearray)
    |
    |---student: (name: loload schema: 
id#31:bytearray,firstname#32:bytearray,lastname#33:bytearray,phone#34:bytearray
,city#35:bytearray)requiredfields:null 
#-----------------------------------------------
# physical plan: #-----------------------------------------------
student: store(fakefile:org.apache.pig.builtin.pigstorage) - scope-36
| 
|---student: new for each(false,false,false,false,false)[bag] - scope-35
    |   |
    |   cast[int] - scope-21
    |   |
    |   |---project[bytearray][0] - scope-20
    |   |  
    |   cast[chararray] - scope-24
    |   |
    |   |---project[bytearray][1] - scope-23
    |   | 
    |   cast[chararray] - scope-27
    |   |  
    |   |---project[bytearray][2] - scope-26 
    |   |  
    |   cast[chararray] - scope-30 
    |   |  
    |   |---project[bytearray][3] - scope-29
    |   |
    |   cast[chararray] - scope-33
    |   | 
    |   |---project[bytearray][4] - scope-32
    | 
    |---student: load(hdfs://localhost:9000/pig_data/student_data.txt:pigstorage(',')) - scope19
2015-10-05 11:32:43,682 [main]
info  org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mrcompiler - 
file concatenation threshold: 100 optimistic? false
2015-10-05 11:32:43,684 [main]
info  org.apache.pig.backend.hadoop.executionengine.mapreducelayer.multiqueryop timizer - 
mr plan size before optimization: 1 2015-10-05 11:32:43,685 [main]
info  org.apache.pig.backend.hadoop.executionengine.mapreducelayer.
multiqueryop timizer - mr plan size after optimization: 1 
#--------------------------------------------------
# map reduce plan                                   
#--------------------------------------------------
mapreduce node scope-37
map plan
student: store(fakefile:org.apache.pig.builtin.pigstorage) - scope-36
|
|---student: new for each(false,false,false,false,false)[bag] - scope-35
    |   |
    |   cast[int] - scope-21 
    |   |
    |   |---project[bytearray][0] - scope-20
    |   |
    |   cast[chararray] - scope-24
    |   |
    |   |---project[bytearray][1] - scope-23
    |   |
    |   cast[chararray] - scope-27
    |   | 
    |   |---project[bytearray][2] - scope-26 
    |   | 
    |   cast[chararray] - scope-30 
    |   |  
    |   |---project[bytearray][3] - scope-29 
    |   | 
    |   cast[chararray] - scope-33
    |   | 
    |   |---project[bytearray][4] - scope-32 
    |  
    |---student:
load(hdfs://localhost:9000/pig_data/student_data.txt:pigstorage(',')) - scope
19-------- global sort: false
 ----------------