用docker搭建spark集群

第一步 从docker官方仓库下载docker镜像

docker pull epahomov/docker-spark

第二步 启动master

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
bash# docker run -d -t -P -i --name spark_master epahomov/docker-spark
bash# docker attach spark_master
#编辑 docker 的master 启动脚本
root@40defc1fa605:/# vi spark-shell.sh
----172.17.0.3 为本机的ip
#!/usr/bin/env bash
export SPARK_LOCAL_IP=`awk 'NR==1 {print $1}' /etc/hosts`
/remove_alias.sh # problems with hostname alias, see https://issues.apache.org/jira/browse/SPARK-6680
cd /usr/local/spark
./bin/spark-shell \
--master spark://172.17.0.3:7077 \
-i 172.17.0.3 \
--properties-file /spark-defaults.conf \
"$@"
-----
# 启动master
root@3746e21dff30:/# ./start-master.sh

第三步 启动worker

1
2
3
docker run -d -t -P -i --name spark_worker1 epahomov/docker-spark
docker attach spark_worker1

配置master-ip 172.17.0.4 为本机ip,172.17.0.3为master ip

1
2
3
4
5
6
7
8
9
root@40defc1fa605:/# cat ./start-worker.sh
#!/usr/bin/env bash
cd /usr/local/spark
export SPARK_LOCAL_IP=`awk 'NR==1 {print $1}' /etc/hosts`
./bin/spark-class org.apache.spark.deploy.worker.Worker \
spark://172.17.0.3:7077 \
--properties-file /spark-defaults.conf \
-i 172.17.0.4 \
"$@"

启动worker
./start-worker.sh

可以启动多台worker

打开master:8080 可以查看集群部署情况
enter image description here

在master 中打开spark-shell 就可以进行编程了