spark is hadoop’s sub-project. therefore, it is better to install spark into a linux based system. the following steps show how to install apache spark.
step1: verifying java installation
java installation is one of the mandatory things in installing spark. try the following command to verify the java version.
$java -version
if java is already, installed on your system, you get to see the following response −
java version "1.7.0_71" java(tm) se runtime environment (build 1.7.0_71-b13) java hotspot(tm) client vm (build 25.0-b02, mixed mode)
in case you do not have java installed on your system, then install java before proceeding to next step.
step2: verifying scala installation
you should scala language to implement spark. so let us verify scala installation using following command.
$scala -version
if scala is already installed on your system, you get to see the following response −
scala code runner version 2.11.6 -- copyright 2002-2013, lamp/epfl
in case you don’t have scala installed on your system, then proceed to next step for scala installation.
step3: downloading scala
download the latest version of scala by visit the following link download scala. for this tutorial, we are using scala-2.11.6 version. after downloading, you will find the scala tar file in the download folder.
step4: installing scala
follow the below given steps for installing scala.
extract the scala tar file
type the following command for extracting the scala tar file.
$ tar xvf scala-2.11.6.tgz
move scala software files
use the following commands for moving the scala software files, to respective directory (/usr/local/scala).
$ su – password: # cd /home/hadoop/downloads/ # mv scala-2.11.6 /usr/local/scala # exit
set path for scala
use the following command for setting path for scala.
$ export path = $path:/usr/local/scala/bin
verifying scala installation
after installation, it is better to verify it. use the following command for verifying scala installation.
$scala -version
if scala is already installed on your system, you get to see the following response −
scala code runner version 2.11.6 -- copyright 2002-2013, lamp/epfl
step5: downloading apache spark
download the latest version of spark by visiting the following link download spark. for this tutorial, we are using spark-1.3.1-bin-hadoop2.6 version. after downloading it, you will find the spark tar file in the download folder.
step6: installing spark
follow the steps given below for installing spark.
extracting spark tar
the following command for extracting the spark tar file.
$ tar xvf spark-1.3.1-bin-hadoop2.6.tgz
moving spark software files
the following commands for moving the spark software files to respective directory (/usr/local/spark).
$ su – password: # cd /home/hadoop/downloads/ # mv spark-1.3.1-bin-hadoop2.6 /usr/local/spark # exit
setting up the environment for spark
add the following line to ~/.bashrc file. it means adding the location, where the spark software file are located to the path variable.
export path = $path:/usr/local/spark/bin
use the following command for sourcing the ~/.bashrc file.
$ source ~/.bashrc
step7: verifying the spark installation
write the following command for opening spark shell.
$spark-shell
if spark is installed successfully then you will find the following output.
spark assembly has been built with hive, including datanucleus jars on classpath using spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/06/04 15:25:22 info securitymanager: changing view acls to: hadoop 15/06/04 15:25:22 info securitymanager: changing modify acls to: hadoop disabled; ui acls disabled; users with view permissions: set(hadoop); users with modify permissions: set(hadoop) 15/06/04 15:25:22 info httpserver: starting http server 15/06/04 15:25:23 info utils: successfully started service 'http class server' on port 43292. welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.4.0 /_/ using scala version 2.10.4 (java hotspot(tm) 64-bit server vm, java 1.7.0_71) type in expressions to have them evaluated. spark context available as sc scala>