环境
基础篇需要4台机器(一台namenode,三台datanode);
HA篇需要8台机器:两台namenode(一台作active nn,另一台作standby nn),三台datanode,三台zookeeper(也可以省去这三台,把zookeeper daemon部署在其他机器上)。实际上还需要3台journalnode,但因为它比较轻量级,所以这里就把它部署在datanode上了。
三台zookeeper机器上配置以下信息:
1 创建hadoop用户
2 做好ssh免密码登陆
3 修改主机名
4 安装JDK
5 下载zookeeper安装包
下载地址:
下载zookeeper-3.4.6到/opt/目录下,解压
6 修改/etc/profile
export ZOO_HOME=/opt/zookeeper-3.4.6
export ZOO_LOG_DIR=/opt/zookeeper-3.4.6/logs使之生效:
source /etc/profile7 建立zookeeper数据存放目录:
mkdir /opt/zookeeper-3.4.6/data
8 在$ZOO_HOME/conf下创建配置文件:
vi zoo.cfg 加入以下内容:
# The number of milliseconds of each tick
tickTime=2000# The number of ticks that the initial # synchronization phase can takeinitLimit=10# The number of ticks that can pass between # sending a request and getting an acknowledgementsyncLimit=5# the directory where the snapshot is stored.# do not use /tmp for storage, /tmp here is just # example sakes.dataDir=/opt/zookeeper-3.4.6/data# the port at which the clients will connectclientPort=2181server.1=10.9.214.167:31316:31317server.2=10.9.214.18:31316:31317server.3=10.9.214.211:31316:313179 在/opt/zookeeper-3.4.6/data/目录下创建文件myid,并写入内容,zookeeper1写1,zookeeper2写2,zookeeper3写3 ,如:
echo 1 >/opt/zookeeper-3.4.6/data/myid
10 启动zookeeper 服务:
cd $ZOO_HOME
./bin/zkServer.sh start
11 验证
测试zookeeper集群是否建立成功,在$ZOO_HOME目录下执行以下命令即可,如无报错表示集群创建成功:
./bin/zkCli.sh -server localhost:31315
hadoop配置文件只需要修改core-site.xml和hdfs-site.xml
配置core-site.xml
<property>
<name>hadoop.tmp.dir</name> <value>/opt/hadoop-2.6.0/tmp</value> </property> <property> <name>fs.default.name</name> <value>hdfs://10.9.214.151:9000</value> </property> <property> <name>hadoop.proxyuser.root.hosts</name> <value>10.9.214.151</value> </property> <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property> <property> <name>fs.defaultFS</name> <value>hdfs://cluster_haohzhang</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>10.9.214.167:2181,10.9.214.18:2181,10.9.214.211:2181</value> </property>配置hdfs-site.xml
<property> <name>dfs.namenode.name.dir</name> <value>file:/opt/hadoop-2.6.0/hdfs/name</value> </property> <property> <name>dfs.dataname.data.dir</name> <value>file:/opt/hadoop-2.6.0/hdfs/data</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.nameservices</name> <value>cluster_haohzhang</value> </property> <property> <name>dfs.ha.namenodes.cluster_haohzhang</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.cluster_haohzhang.nn1</name> <value>10.9.214.151:8020</value> </property> <property> <name>dfs.namenode.rpc-address.cluster_haohzhang.nn2</name> <value>10.9.214.15:8020</value> </property> <property> <name>dfs.namenode.http-address.cluster_haohzhang.nn1</name> <value>10.9.214.151:50070</value> </property> <property> <name>dfs.namenode.http-address.cluster_haohzhang.nn2</name> <value>10.9.214.15:50070</value> </property> <property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://10.9.214.158:8485;10.9.214.160:8485;10.9.214.149:8485/cluster_haohzhang</value> </property> <property> <name>dfs.client.failover.proxy.provider.cluster_haohzhang</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_rsa</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/opt/hadoop-2.6.0/journalnode</value> </property> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property>操作细节
1 先删除所有namenode和datanode,journalnode上的metadata
2 启动三个journalnode进程
hadoop-daemon.sh start journalnode
3 格式化namenode
在一台namenode上执行:
hdfs namenode -format
这个步骤会连接journalnode,然后会把journalnode也格式化掉
4 启动刚刚格式化的namenode上的hdfs:
cd $HADOOP_HOME/sbin; ./start-dfs.sh
5 在另一台namenode上执行:
hdfs namenode -bootstrapStandby
6 验证手动fail over
在任意一个namenode上执行:
hdfs haadmin -help
可以查看命令用法,这里我们用
hdfs haadmin -getServiceState nn1hdfs haadmin -getServiceState nn2
获取两个namenode的状态,有两种状态:standby , active
手动切换状态:
hdfs haadmin -failover nn1 nn2
成功的化,nn2就成了active状态了
7 用zookeeper自动切换
7.1 在其中一个namenode上初始化zkfc
hdfs zkfc -formatZK
这步会尝试连接zookeeper上的2181端口,并在zookeeper里面创建一个znode
7.2 在namenode上启动hdfs
cd $HADOOP_HOME; ./start-dfs.sh
7.3 验证进程是否都启动成功
[hadoop@hadoopmaster-standby sbin]$ jps
12277 NameNode12871 Jps12391 DFSZKFailoverController[hadoop@hadoopslave1 hadoop-2.6.0]$ jps7698 DataNode7787 JournalNode7933 Jps7.4 验证failover自动切换
杀掉active namenode上的所有hadoop进程:
kill -9
然后查看另外一个namenode是否已经从standby 变为active状态,
注意:配置文件中默认每5妙钟检测一次健康状态