Install hadoop and spark on macOS

hadoop版本 2.7.2
spark版本为spark-2.1.0-bin-hadoop2.7
python版本为macOS自带的2.7版本

1. 安装java

$ java -version
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
$ /usr/libexec/java_home
/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home

增加java的环境变量到.bash_profile中
if which java > /dev/null; then export JAVA_HOME=$(/usr/libexec/java_home); fi

2. 开启ssh

按照这个路径进行设置System Preferences -> Sharing -> and check “Remote Login”.
$ ssh-keygen -t rsa -P ”
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
检查ssh到有效性
$ ssh localhost

3.下载hadoop 2.7.2版本

curl -O http://apache.fayea.com/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz

4. 创建hadoop目录

$ cd ~/Downloads
$ tar xzvf hadoop-2.7.2.tar
$ mv hadoop-2.7.2 /usr/local/hadoop

5. 配置hadoop的配置文件

$ cd /usr/local/hadoop/etc/hadoop/

1

6. 启动hadoop服务

Format HDFS
$ cd /usr/local/hadoop
$ bin/hdfs namenode -format

Start HDFS
$ sbin/start-dfs.sh

Start YARN
$ sbin/start-yarn.sh

Check HDFS file Directory
$ bin/hdfs dfs -ls /

在hdfs上增加mapreduce job目录
$ hdfs dfs -mkdir /user
$ hdfs dfs -mkdir /user/leo

7. 安装spark

修改spark目录conf下的spark-env.sh文件,增加如下内容

export SPARK_LOCAL_IP=127.0.0.1
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME

8. 配置环境变量

配置后的环境变量内容为

$ more .bash_profile
if which java > /dev/null; then export JAVA_HOME=$(/usr/libexec/java_home); fi

export SPARK_HOME=/Users/leo/spark-2.1.0-bin-hadoop2.7
export PATH=$SPARK_HOME/bin:$PATH
export PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH
export PATH=$PATH:/usr/local/hadoop/bin
export HADOOP_HOME=/usr/local/hadoop/
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native/:$LD_LIBRARY_PATH

9. 配置完成

启动spark-shell和pyspart,信息如下

$ pyspark
Python 2.7.10 (default, Jul 30 2016, 19:40:32)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/05/03 21:22:38 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/05/03 21:22:42 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.1.0
/_/

Using Python version 2.7.10 (default, Jul 30 2016 19:40:32)
SparkSession available as 'spark'.
>>>
$ spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/05/03 21:23:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/05/03 21:24:02 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://127.0.0.1:4040
Spark context available as 'sc' (master = local[*], app id = local-1493817838677).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.1.0
/_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131)
Type in expressions to have them evaluated.
Type :help for more information.
scala>

错误信息记录

在配置过程中启动scala shell的过程中会报错,可以通过http://localhost:8088来查看错误,此次安装配置过程中遇到因为java_home的问题会抛出一些奇怪的报错,通过使用8088的页面查看log信息,发现有/bin/bash: /bin/java: No such file or directory的错误。查看到https://issues.apache.org/jira/browse/HADOOP-8717关于这个错误的解决办法
修改文件HADOOP_HOME/libexec/hadoop-config.sh

if [ -x /usr/libexec/java_home ]; then
export JAVA_HOME=($(/usr/libexec/java_home))
else
export JAVA_HOME=(/Library/Java/Home)
fi
修改后如下
if [ -x /usr/libexec/java_home ]; then
export JAVA_HOME=$(/usr/libexec/java_home)
else
export JAVA_HOME=/Library/Java/Home
fi
此条目发表在hadoop分类目录。将固定链接加入收藏夹。

One Response to Install hadoop and spark on macOS

  1. Leo说道:

    centos7
    etc/hadoop/libexec/hadoop-config.sh add
    export JAVA_HOME=/hadoop/jdk1.8.0_141

发表评论

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / 更改 )

Twitter picture

You are commenting using your Twitter account. Log Out / 更改 )

Facebook photo

You are commenting using your Facebook account. Log Out / 更改 )

Google+ photo

You are commenting using your Google+ account. Log Out / 更改 )

Connecting to %s