Spark SQL Tutorial on Spark Installation

spark is hadoop’s sub-project. therefore, it is better to install spark into a linux based system. the following steps show how to install apache spark.

step1: verifying java installation

java installation is one of the mandatory things in installing spark. try the following command to verify the java version.

$java -version

if java is already, installed on your system, you get to see the following response −

java version "1.7.0_71"
java(tm) se runtime environment (build 1.7.0_71-b13)
java hotspot(tm) client vm (build 25.0-b02, mixed mode)

in case you do not have java installed on your system, then install java before proceeding to next step.

step2: verifying scala installation

you should scala language to implement spark. so let us verify scala installation using following command.

$scala -version

if scala is already installed on your system, you get to see the following response −

scala code runner version 2.11.6 -- copyright 2002-2013, lamp/epfl

in case you don’t have scala installed on your system, then proceed to next step for scala installation.

step3: downloading scala

download the latest version of scala by visit the following link download scala. for this tutorial, we are using scala-2.11.6 version. after downloading, you will find the scala tar file in the download folder.

step4: installing scala

follow the below given steps for installing scala.

extract the scala tar file

type the following command for extracting the scala tar file.

$ tar xvf scala-2.11.6.tgz

move scala software files

use the following commands for moving the scala software files, to respective directory (/usr/local/scala).

$ su –
password:
# cd /home/hadoop/downloads/
# mv scala-2.11.6 /usr/local/scala
# exit

set path for scala

use the following command for setting path for scala.

$ export path = $path:/usr/local/scala/bin

verifying scala installation

after installation, it is better to verify it. use the following command for verifying scala installation.

$scala -version

if scala is already installed on your system, you get to see the following response −

scala code runner version 2.11.6 -- copyright 2002-2013, lamp/epfl

step5: downloading apache spark

download the latest version of spark by visiting the following link download spark. for this tutorial, we are using spark-1.3.1-bin-hadoop2.6 version. after downloading it, you will find the spark tar file in the download folder.

step6: installing spark

follow the steps given below for installing spark.

extracting spark tar

the following command for extracting the spark tar file.

$ tar xvf spark-1.3.1-bin-hadoop2.6.tgz

moving spark software files

the following commands for moving the spark software files to respective directory (/usr/local/spark).

$ su –
password:
# cd /home/hadoop/downloads/
# mv spark-1.3.1-bin-hadoop2.6 /usr/local/spark
# exit

setting up the environment for spark

add the following line to ~/.bashrc file. it means adding the location, where the spark software file are located to the path variable.

export path = $path:/usr/local/spark/bin

use the following command for sourcing the ~/.bashrc file.

$ source ~/.bashrc

step7: verifying the spark installation

write the following command for opening spark shell.

$spark-shell

if spark is installed successfully then you will find the following output.

spark assembly has been built with hive, including datanucleus jars on classpath
using spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/06/04 15:25:22 info securitymanager: changing view acls to: hadoop
15/06/04 15:25:22 info securitymanager: changing modify acls to: hadoop
disabled; ui acls disabled; users with view permissions: set(hadoop); users with modify permissions: set(hadoop)
15/06/04 15:25:22 info httpserver: starting http server
15/06/04 15:25:23 info utils: successfully started service 'http class server' on port 43292.
welcome to
    ____             __
   / __/__ ___ _____/ /__
   _\ \/ _ \/ _ `/ __/ '_/
   /___/ .__/\_,_/_/ /_/\_\ version 1.4.0
      /_/
		
using scala version 2.10.4 (java hotspot(tm) 64-bit server vm, java 1.7.0_71)
type in expressions to have them evaluated.
spark context available as sc
scala>