Some Note About Deploy Hadoop Clusters

1.Download source code, using maven to compile and package, remember to compile the native for the same OS version.

mvn package -Pdist -Pnative -Dtar -DskipTests

2.Edit the core-site.xml, hdfs-site.xml, mapred-site.xml and yarn-site.xml file
you should add some JVM parameters or log position in /etc/bashrc, just a example below:

export JAVA_HOME=/usr/local/jdk1.7.0_67
export JRE_HOME=/usr/local/jdk1.7.0_67/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export HADOOP_HOME=/usr/local/hadoop-2.4.0
export HADOOP_LOG_DIR=/data0/hadoop/log/hadoop
export HADOOP_PID_DIR=/data0/hadoop/pid/hadoop
export YARN_LOG_DIR=/data0/hadoop/log/yarn
export YARN_PID_DIR=/data0/hadoop/pid/yarn
export HADOOP_NAMENODE_OPTS=" -Xmx20480m -Xms20480m -Xmn3072m -verbose:gc -Xloggc:/data0/hadoop/gclog/namenode.gc.log -XX:ErrorFile=/data0/hadoop/gclog/hs_err_pid.log -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=85 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCMSCompactAtFullCollection -XX:CMSMaxAbortablePrecleanTime=1000 -XX:+CMSClassUnloadingEnabled -XX:+DisableExplicitGC"
export YARN_RESOURCEMANAGER_OPTS=" -Xmx10240m -Xms10240m -Xmn3072m -verbose:gc -Xloggc:/data0/hadoop/gclog/yarn.gc.log -XX:ErrorFile=/data0/hadoop/gclog/hs_err_pid.log -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCMSCompactAtFullCollection -XX:CMSMaxAbortablePrecleanTime=1000 -XX:+CMSClassUnloadingEnabled -XX:+DisableExplicitGC"
ulimit -u 65535

3.Put the packaged code into all the servers, including namenodes, resourcemanagers, nodemanagers and datanodes
4.startup journalnode first, i assume you use qjournal start journalnode

5.format namenode for specific namespace

hdfs namenode -format
if you use federation, make sure the cluster id are the same,so if the first ns cluster id is abcdefg, the second ns should format with the cluster id, hdfs namenode -format -clusterId=abcdefg

6.init the standby namenode for the same namespace

hdfs namenode -bootstrapStandby

7.start namenodes and datanodes start namenode start datanode

8.transition to active namenode

for example namespace is ns, active namenode is nn1
 hdfs haadmin -ns ns -transitionToActive nn1

9.mkdir dir for hadoop user and mapred(the user to startup resource manager and history server) user
10.mkdir for history server
for example the mapred-site.xml set history directory


you have to set directory like this

hdfs dfs -mkdir -p /hadoop/history/tmp
hdfs dfs -chown -R mapred:mapred /hadoop/history
hdfs dfs -chmod -R 1777 /hadoop/history/tmp
hdfs dfs -mkdir -p /hadoop/history/done
hdfs dfs -chmod -R 1777 /hadoop/history/done

11.startup resourcemanager and nodemanager and mr history server start resourcemanager start nodemanager start historyserver