Apache Pig Tutorial on Apache Pig Distinct Operator

Back to Course

Apache Pig Introduction

Apache Pig Overview

Read

Apache Pig Architecture

Read

Apache Pig Environment

Apache Pig Installation

Read

Apache Pig Execution

Read

Apache Pig Grunt Shell

Read

Pig Latin

Pig Latin Ã¢ÂÂ Basics

Read

Load & Store Operators

Apache Pig Reading Data

Read

Apache Pig Storing Data

Read

Diagnostic Operators

Apache Pig Diagnostic Operators

Read

Apache Pig Describe Operator

Read

Apache Pig Explain Operator

Read

Apache Pig Illustrate Operator

Read

Grouping & Joining

Apache Pig Group Operator

Read

Apache Pig Cogroup Operator

Read

Apache Pig Join Operator

Read

Apache Pig Cross Operator

Read

Combining & Splitting

Apache Pig Union Operator

Read

Apache Pig Split Operator

Read

Apache Pig Filter Operator

Read

Apache Pig Distinct Operator

Read

Apache Pig Foreach Operator

Read

Apache Pig Order By

Read

Apache Pig Limit Operator

Read

Pig Latin BuiltIn Functions

Apache Pig Eval Functions

Read

Apache Pig Load & Store Functions

Read

Apache Pig Bag & Tuple Functions

Read

Apache Pig String Functions

Read

Apache Pig Datetime Functions

Read

Apache Pig Math Functions

Read

Other Modes Of Execution

Apache Pig Running Scripts

Read

Apache Pig Quick Guide

Read

Apache Pig Useful Resources

Read

Discuss Apache Pig

Read

the distinct operator is used to remove redundant (duplicate) tuples from a relation.

syntax

given below is the syntax of the distinct operator.

grunt> relation_name2 = distinct relatin_name1;

example

assume that we have a file named student_details.txt in the hdfs directory /pig_data/ as shown below.

student_details.txt

001,rajiv,reddy,9848022337,hyderabad
002,siddarth,battacharya,9848022338,kolkata 
002,siddarth,battacharya,9848022338,kolkata 
003,rajesh,khanna,9848022339,delhi 
003,rajesh,khanna,9848022339,delhi 
004,preethi,agarwal,9848022330,pune 
005,trupthi,mohanthy,9848022336,bhuwaneshwar
006,archana,mishra,9848022335,chennai 
006,archana,mishra,9848022335,chennai

and we have loaded this file into pig with the relation name student_details as shown below.

grunt> student_details = load 'hdfs://localhost:9000/pig_data/student_details.txt' using pigstorage(',') 
   as (id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray);

let us now remove the redundant (duplicate) tuples from the relation named student_details using the distinct operator, and store it as another relation named distinct_data as shown below.

grunt> distinct_data = distinct student_details;

verification

verify the relation distinct_data using the dump operator as shown below.

grunt> dump distinct_data;

output

it will produce the following output, displaying the contents of the relation distinct_data as follows.

(1,rajiv,reddy,9848022337,hyderabad)
(2,siddarth,battacharya,9848022338,kolkata) 
(3,rajesh,khanna,9848022339,delhi) 
(4,preethi,agarwal,9848022330,pune) 
(5,trupthi,mohanthy,9848022336,bhuwaneshwar)
(6,archana,mishra,9848022335,chennai)

Previous Lesson

Next Lesson