Spark SQL Tutorial on Spark Installation

Back to Course

Spark SQL Tutorial

Spark Introduction

Read

Spark Installation

Read

Spark SQL Introduction

Read

Spark SQL DataFrames

Read

Spark SQL Data Sources

Read

Spark SQL Quick Guide

Read

Spark SQL Useful Resources

Read

spark is hadoop’s sub-project. therefore, it is better to install spark into a linux based system. the following steps show how to install apache spark.

step1: verifying java installation

java installation is one of the mandatory things in installing spark. try the following command to verify the java version.

$java -version

if java is already, installed on your system, you get to see the following response −

java version "1.7.0_71"
java(tm) se runtime environment (build 1.7.0_71-b13)
java hotspot(tm) client vm (build 25.0-b02, mixed mode)

in case you do not have java installed on your system, then install java before proceeding to next step.

step2: verifying scala installation

you should scala language to implement spark. so let us verify scala installation using following command.

$scala -version

if scala is already installed on your system, you get to see the following response −

scala code runner version 2.11.6 -- copyright 2002-2013, lamp/epfl

in case you don’t have scala installed on your system, then proceed to next step for scala installation.

step3: downloading scala

download the latest version of scala by visit the following link download scala. for this tutorial, we are using scala-2.11.6 version. after downloading, you will find the scala tar file in the download folder.

step4: installing scala

follow the below given steps for installing scala.

extract the scala tar file

type the following command for extracting the scala tar file.

$ tar xvf scala-2.11.6.tgz

move scala software files

use the following commands for moving the scala software files, to respective directory (/usr/local/scala).

$ su –
password:
# cd /home/hadoop/downloads/
# mv scala-2.11.6 /usr/local/scala
# exit

set path for scala

use the following command for setting path for scala.

$ export path = $path:/usr/local/scala/bin

verifying scala installation

after installation, it is better to verify it. use the following command for verifying scala installation.

$scala -version

if scala is already installed on your system, you get to see the following response −

scala code runner version 2.11.6 -- copyright 2002-2013, lamp/epfl

step5: downloading apache spark

download the latest version of spark by visiting the following link download spark. for this tutorial, we are using spark-1.3.1-bin-hadoop2.6 version. after downloading it, you will find the spark tar file in the download folder.

step6: installing spark

follow the steps given below for installing spark.

extracting spark tar

the following command for extracting the spark tar file.

$ tar xvf spark-1.3.1-bin-hadoop2.6.tgz

moving spark software files

the following commands for moving the spark software files to respective directory (/usr/local/spark).

$ su –
password:
# cd /home/hadoop/downloads/
# mv spark-1.3.1-bin-hadoop2.6 /usr/local/spark
# exit

setting up the environment for spark

add the following line to ~/.bashrc file. it means adding the location, where the spark software file are located to the path variable.

export path = $path:/usr/local/spark/bin

use the following command for sourcing the ~/.bashrc file.

$ source ~/.bashrc

step7: verifying the spark installation

write the following command for opening spark shell.

$spark-shell

if spark is installed successfully then you will find the following output.

spark assembly has been built with hive, including datanucleus jars on classpath
using spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/06/04 15:25:22 info securitymanager: changing view acls to: hadoop
15/06/04 15:25:22 info securitymanager: changing modify acls to: hadoop
disabled; ui acls disabled; users with view permissions: set(hadoop); users with modify permissions: set(hadoop)
15/06/04 15:25:22 info httpserver: starting http server
15/06/04 15:25:23 info utils: successfully started service 'http class server' on port 43292.
welcome to
    ____             __
   / __/__ ___ _____/ /__
   _\ \/ _ \/ _ `/ __/ '_/
   /___/ .__/\_,_/_/ /_/\_\ version 1.4.0
      /_/
		
using scala version 2.10.4 (java hotspot(tm) 64-bit server vm, java 1.7.0_71)
type in expressions to have them evaluated.
spark context available as sc
scala>

Previous Lesson

Next Lesson