Apache Pig Tutorial on Apache Pig Split Operator

the split operator is used to split a relation into two or more relations.

syntax

given below is the syntax of the split operator.

grunt> split relation1_name into relation2_name if (condition1), relation2_name (condition2),

example

assume that we have a file named student_details.txt in the hdfs directory /pig_data/ as shown below.

student_details.txt

001,rajiv,reddy,21,9848022337,hyderabad
002,siddarth,battacharya,22,9848022338,kolkata
003,rajesh,khanna,22,9848022339,delhi 
004,preethi,agarwal,21,9848022330,pune 
005,trupthi,mohanthy,23,9848022336,bhuwaneshwar 
006,archana,mishra,23,9848022335,chennai 
007,komal,nayak,24,9848022334,trivendram 
008,bharathi,nambiayar,24,9848022333,chennai

and we have loaded this file into pig with the relation name student_details as shown below.

student_details = load 'hdfs://localhost:9000/pig_data/student_details.txt' using pigstorage(',')
   as (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray, city:chararray); 

let us now split the relation into two, one listing the employees of age less than 23, and the other listing the employees having the age between 22 and 25.

split student_details into student_details1 if age<23, student_details2 if (22<age and age>25);

verification

verify the relations student_details1 and student_details2 using the dump operator as shown below.

grunt> dump student_details1;  

grunt> dump student_details2; 

output

it will produce the following output, displaying the contents of the relations student_details1 and student_details2 respectively.

grunt> dump student_details1; 
(1,rajiv,reddy,21,9848022337,hyderabad) 
(2,siddarth,battacharya,22,9848022338,kolkata)
(3,rajesh,khanna,22,9848022339,delhi) 
(4,preethi,agarwal,21,9848022330,pune)
  
grunt> dump student_details2; 
(5,trupthi,mohanthy,23,9848022336,bhuwaneshwar) 
(6,archana,mishra,23,9848022335,chennai) 
(7,komal,nayak,24,9848022334,trivendram) 
(8,bharathi,nambiayar,24,9848022333,chennai)