Installing Spark on Windows can be more involved than installing it on Linux or Mac OS X because many of the dependencies (such as Python and Java) need to be addressed first. Try it Yourself: Install Spark on Microsoft Windows You should see a similar result by running the spark-shell command in the terminal from any directory. profile file or similar user or system profile scripts.
The SPARK_HOME environment variable could also be set using the. If Spark has been successfully installed, you should see the following output: Welcome to
Open the PySpark shell by running the pyspark command in the Terminal from any directory.
Mavericks includes installed versions of Python (2.7.5) and Java (1.8), so I don’t need to install them.Īs shown in Figure 3.1, download the spark-1.5.2-bin-hadoop2.6.tgz package from your local mirror into your home directory using curl. In this example, I install Spark on OS X Mavericks (10.9.5). Try it Yourself: Install Spark on Mac OS X Open the PySpark shell by running the pyspark command from any directory. You will need to do this if you wish to persist the SPARK_HOME variable beyond the current session. bashrc file or similar user or system profile scripts.
Alternatively, you could use the Oracle JDK instead: sudo apt-get update If Java 1.7 or higher is not installed, install the Java 1.7 runtime and development environments using Ubuntu’s APT (Advanced Packaging Tool). 7 is already installed with the operating system, so we do not need to install Python. In this example, I’m installing Spark on an Ubuntu 14.04 LTS Linux distribution.Īs with the Red Hat example, Python 2. Try it Yourself: Install Spark on Ubuntu/Debian Linux Note, this is an estimator program, so the actual result may vary: Pi is roughly 3.140576 If the installation was successful, you should see something similar to the following result (omitting the informational log messages). Run the included Pi Estimator example by executing the following command: spark-submit -class .SparkPi -master local $SPARK_HOME/lib/spark-examples*.jar 10 You should see a similar result by running the spark-shell command from any directory. SparkContext available as sc, HiveContext available as sqlContext. If Spark has been successfully installed, you should see the following output (with informational logging messages omitted for brevity): Welcome to
Open the PySpark shell by running the pyspark command from any directory (as you’ve added the Spark bin directory to the PATH). You need to do this if you wish to persist the SPARK_HOME variable beyond the current session. Sudo mv spark-1.5.2-bin-hadoop2.6 /opt/spark OpenJDK 64-Bit Server VM (build 24.91-b01, mixed mode)Įxtract the Spark package and create SPARK_HOME: tar -xzf spark-1.5.2-bin-hadoop2.6.tgz If Java 1.7 or higher is not installed, install the Java 1.7 runtime and development environments using the OpenJDK yum packages (alternatively, you could use the Oracle JDK instead): sudo yum install java-1.7.0-openjdk java-1.7.0-openjdk-develĬonfirm Java was successfully installed: $ java -version However, the same installation steps would apply to Centos distributions as well.Īs shown in Figure 3.1, download the spark-1.5.2-bin-hadoop2.6.tgz package from your local mirror into your home directory using wget or curl. In this example, I’m installing Spark on a Red Hat Enterprise Linux 7.1 instance. Try it Yourself: Install Spark on Red Hat/Centos