1 环境说明与准备

CPU:12th Gen Intel(R) Core(TM) i7-12700F
内存:32GB 3600MT/s
操作系统:Windows11
Docker Desktop: 4.68.0
Docker Engine: 29.3.1

实验全程在root用户下进行

1.1配置Docker镜像加速地址

点击Settings -> Docker Engine 在第二层添加如下配置:

"registry-mirrors": [
  "https://docker.1panel.live",
  "https://hub.rat.dev",
  "https://docker.m.daocloud.io",
  "https://docker.1ms.run",
  "https://dockerhub.icu"
]

1.2 拉取镜像(Ubuntu需要梯子)

docker pull ubuntu

1.3 创建容器并返回容器 ID

docker run -itd --name ubuntu -test ubuntu

1.4 进入容器

docker exec -it ubuntu /bin/bash

2 系统基础配置

2.1 更新源 + 安装基础工具

apt-get update
apt-get install vim -y

2.2 替换软件源

vim /etc/apt/sources.list
------
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ noble main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ noble-updates main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ noble-security main restricted universe multiverse
------

2.3 刷新缓存

apt-get clean
apt-get update

3 SSH 环境配置

3.1 安装 SSH和网络管理工具

apt-get install ssh openssh-server iproute2 -y

3.3 修改 SSH 配置

vim /etc/ssh/sshd_config
------
#LoginGraceTime 2m
#PermitRootLogin prohibit-password
PermitRootLogin yes
PasswordAuthentication yes
PubkeyAuthentication yes
------

3.4 设置 root 密码

passwd root

3.5 生成密钥

ssh-keygen -t rsa

4 SSH 自启动脚本

4.1 启动脚本

vim /root/start_ssh.sh
------
#!/bin/bash

LOGTIME=$(date "+%Y-%m-%d %H:%M:%S")
echo "[$LOGTIME] startup run..." >> /root/ssh_log/start_ssh.log
service ssh restart >> /root/ssh_log/start_ssh.log
------

4.2 赋权

chmod +x /root/start_ssh.sh

4.3 写入启动项

vim /root/.bashrc
------
# startup run
if [ -f /root/start_ssh.sh ]; then
    /root/start_ssh.sh
fi
------

4.4 创建日志目录

mkdir -p /root/ssh_log

4.5 应用配置

source /root/.bashrc

查看服务状态:

service ssh status

5 Java & Hadoop 环境

5.1 上传组件

docker ps
docker cp 本地路径\hadoop-2.7.7.tar.gz 容器ID:/root/
docker cp 本地路径\jdk-8u171-linux-x64.tar.gz 容器ID:/root/

5.2 解压

cd /root
tar -xzvf hadoop-2.7.7.tar.gz
tar -xzvf jdk-8u171-linux-x64.tar.gz

5.3 配置环境变量

vim /root/.bashrc
------
# JAVA
export JAVA_HOME=/root/jdk1.8.0_171
export PATH=$JAVA_HOME/bin:$PATH

# HADOOP
export HADOOP_HOME=/root/hadoop-2.7.7
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
------

5.4 应用配置

source /root/.bashrc

✔ 验证

java -version

6 Hadoop 配置

cd ~/hadoop-2.7.7/etc/hadoop

6.1 HDFS 配置

6.1.1 配置hadoop-env.sh文件

vim hadoop-env.sh
------
export JAVA_HOME=/root/jdk1.8.0_171
------

6.1.2 配置core-site.xml文件

vim core-site.xml
------
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop330:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/root/hadoopdata</value>
    </property>
</configuration>
------

6.1.3 配置 hdfs-site.xml 文件

vim hdfs-site.xml
------
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <property>
        <name>dfs.namenode.http-address</name>
        <value>0.0.0.0:50070</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>hadoop330:50090</value>
    </property>
</configuration>
------

6.2 YARN 配置

6.2.1配置 yarn-env.sh 文件

vim yarn-env.sh
------
export JAVA_HOME=/root/jdk1.8.0_171
------

6.2.2配置 yarn-site.xml 文件

vim yarn-site.xml
------
<configuration>
  <property>
     <name>yarn.resourcemanager.hostname</name>
     <value>hadoop330</value>
  </property>
  <property>
     <name>yarn.nodemanager.aux-services</name>
     <value>mapreduce_shuffle</value>
  </property>
</configuration>
------

6.2.3配置 mapred-env.sh 文件

vim mapred-env.sh
------
export JAVA_HOME=/root/jdk1.8.0_171
------

6.2.4配置 mapred-site.xml 文件

复制模板:

cp mapred-site.xml.template mapred-site.xml

修改文件:

vim mapred-site.xml
------
<configuration>
  <property>
     <name>mapreduce.framework.name</name>
     <value>yarn</value>
  </property>
</configuration>
------

6.3 配置slaves 文件

vim slaves
------
worker1
worker2
worker3
------

7 构建镜像 & 集群部署

7.1 提交镜像

退出容器终端:

exit

查看容器ID并制作镜像:

docker ps -a
docker commit 容器ID hadoop

7.2 docker-compose.yml

创建 hadoop-cluster目录,并在其中创建 docker-compose.yml文件,内容如下:

version: '3'
services:
  master1:
    image: hadoop
    stdin_open: true
    tty: true
    command: bash
    hostname: hadoop330
    ports:
      - "50070:50070"
      - "8088:8088"
      - "9000:9000"
  master2:
    image: hadoop
    stdin_open: true
    tty: true
    command: bash
  worker1:
    image: hadoop
    stdin_open: true
    tty: true
    command: bash
  worker2:
    image: hadoop
    stdin_open: true
    tty: true
    command: bash
  worker3:
    image: hadoop
    stdin_open: true
    tty: true
    command: bash

7.3 启动集群

hadoop-cluster目录下按住Shift右键配置文件,选择在此处打开PowerShell窗口,通过配置文件批量创建并启动容器,-d后台运行

docker compose up -d

8 集群初始化

8.1 进入主节点

查看容器ID

docker ps -a

PORTS 参数中有 50070; 8088; 9000 的容器

docker exec -it 容器ID /bin/bash

8.2 SSH 免密

ssh-copy-id worker1
ssh-copy-id worker2
ssh-copy-id worker3
ssh-copy-id master2    #可以不要,本实验master2并无任何免密通信需要
ssh-copy-id hadoop330

✔ 验证

ssh worker1

8.3 格式化 HDFS

hdfs namenode -format

8.4 启动集群

start-dfs.sh
start-yarn.sh

9 最终验证(关键)

9.1 进程验证

jps

hadoop330应包含:

  • SecondaryNameNode

  • NameNode

  • ResourceManager

worker1/2/3应包含:

  • DataNode

  • NodeManager


9.2 Web UI

  • HDFS:http://宿主机IP:50070

  • YARN:http://宿主机IP:8088