Here’s what I’ve done recently. There’re many articles telling us how to install hadoop, however, few mention NFS and NIS which make this installation easier. Also, I make more effort to enable more users to use hadoop in the same cluster at one time.

System configuration

OS: RedHat AS4
JDK: 1.6.0
Hadoop: 0.20.1
Hardware:
 sun1: 192.168.1.1  
 sun2: 192.168.1.2
 sun3: 192.168.1.3
Prerequisites:
 NFS share folder: 192.168.1.1:/cshare
 NIS server: 192.168.1.1

step 1. modify /etc/hosts on each machine

[root@sun1 ~]$vim /etc/hosts              (do the same thing to sun2-3)

192.168.1.1 sun1
192.168.1.2 sun2
192.168.1.3 sun3

step 2. create user1 with NIS

[root@sun1 ~]$mkdir /cshare/home
[root@sun1 ~]$chmod 1777 /cshare/home
[root@sun1 ~]$mkdir /cshare/user1
[root@sun1 ~]$useradd -p 123456 user1 -d /cshare/user1
[root@sun1 ~]$cd /var/yp
[root@sun1 ~]$make
[root@sun1 ~]$chown -R user1:user1 /cshare/home/user1

step 3. create SSH trusted connection

[user1@sun1 ~]$ssh-keygen -t rsa  (press ‘Enter’ all along)
[user1@sun1 ~]$cd .ssh
[user1@sun1 .ssh]$cp id_rsa.pub authorized_keys
[user1@sun1 .ssh]$ssh sun2   (type ‘yes’)
[user1@sun1 .ssh]$ssh sun3   (type ‘yes’)

step 4. Install Hadoop for user1

step 4.1 create folder for hadoop log and tmp

[root@sun1 ~]$mkdir -p /hadoop/user1
[root@sun1 ~]$chmod 1777 /hadoop
[root@sun1 ~]$chown user1:user1 /hadoop/user1

step 4.2 config Hadoop

[user1@sun1 ~]$tar zxvf hadoop-0.20.1.tar.gz
[user1@sun1 ~]$cd hadoop-0.20.1/conf
[user1@sun1 conf]$vim hadoop-env.sh

# The java implementation to use.  Required.

export JAVA_HOME=/usr/java/jdk_1.6.0

# Where log files are stored.  $HADOOP_HOME/logs by default.

export HADOOP_LOG_DIR=/hadoop/user1

[user1@sun1 conf]$vim core-site.xml

<property>
  <name>hadoop.tmp.dir</name>
  <value>/hadoop/user1</value>
  <description>A base for other temporary directories.</description>
</property>
 
<property>
  <name>fs.default.name</name>
  <value>hdfs://sun1:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri’s scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri’s authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>

[user1@sun1 conf]$vim mapred-site.xml

<property>
  <name>mapred.job.tracker</name>
  <value>sun1:54311</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If “local”, then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>

[user1@sun1 conf]$vim hdfs-site.xml

<property>
  <name>dfs.replication</name>
  <value>3</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>

step 4.3 start hadoop

[user1@sun1 conf]$cd ../bin
[user1@sun1 bin]$./hadoop namenode -format (before this, make sure there’s no data in /hadoop/user1 for sun1-3)
[user1@sun1 bin]$./start-all.sh

step 4.4 run grep

[user1@sun1 bin]$./hadoop dfsadmin -report
[user1@sun1 bin]$./hadoop dfs -mkdir grepin
[user1@sun1 bin]$vim /tmp/test.txt
test
test
test
[user1@sun1 bin]$./hadoop dfs -put /tmp/test.txt grepin
[user1@sun1 bin]$./hadoop jar ../hadoop-0.20.1-examples.jar grep grepin grepout test
[user1@sun1 bin]$./hadoop dfs -cat grepout/part-00000
3 test

step 5. install hadoop for user2

step 5.1 create user2 as step 2, 3, 4.1 does

step 5.2 config hadoop for user2  
(notice port config!!! user2 must use different port numbers for each port that hadoop uses.
What’s more, some ports are for master which should use sun1′ ip while others are for slaves which should use 0.0.0.0
As this cluster is not connected to http, I need not config dfs.datanode.https.address and dfs.https.address in hdfs-site.xml)

[root@sun1 ~]$cp /cshare/home/user1/hadoop-0.20.1 /cshare/home/user2
[root@sun1 ~]$chown -R user1:user1 /cshare/home/user1
[user2@sun1 ~]$cd hadoop-0.20.1/conf
[user2@sun1 conf]vim hadoop-env.sh

# The java implementation to use.  Required.

export JAVA_HOME=/usr/java/jdk_1.6.0

# Where log files are stored.  $HADOOP_HOME/logs by default.

export HADOOP_LOG_DIR=/hadoop/user2

[user2@sun1 conf]$vim core-site.xml

<property>
  <name>hadoop.tmp.dir</name>
  <value>/hadoop/user2</value>
  <description>A base for other temporary directories.</description>
</property>
 
<property>
  <name>fs.default.name</name>
  <value>hdfs://sun1:54312</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri’s scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri’s authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>

[user2@sun1 conf]$vim mapred-site.xml

<property>
  <name>mapred.job.tracker</name>
  <value>sun1:54313</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If “local”, then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>

<property>
  <name>mapred.job.tracker.http.address</name>
  <value>sun1:50031</value>
  <description>
    The job tracker http server address and port the server will listen on.
    If the port is 0 then the server will start on a free port.
  </description>
</property>

<property>
  <name>mapred.task.tracker.http.address</name>
  <value>0.0.0.0:50061</value>
  <description>
    The task tracker http server address and port.
    If the port is 0 then the server will start on a free port.
  </description>
</property>

[user2@sun1 conf]$vim hdfs-site.xml

<property>
  <name>dfs.replication</name>
  <value>3</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>

<property>
  <name>dfs.secondary.http.address</name>
  <value>sun1:50090</value>
  <description>
    The secondary namenode http server address and port.
    If the port is 0 then the server will start on a free port.
  </description>
</property>

<property>
  <name>dfs.datanode.address</name>
  <value>0.0.0.0:50010</value>
  <description>
    The address where the datanode server will listen to.
    If the port is 0 then the server will start on a free port.
  </description>
</property>

<property>
  <name>dfs.datanode.http.address</name>
  <value>0.0.0.0:50075</value>
  <description>
    The datanode http server address and port.
    If the port is 0 then the server will start on a free port.
  </description>
</property>

<property>
  <name>dfs.datanode.ipc.address</name>
  <value>0.0.0.0:50020</value>
  <description>
    The datanode ipc server address and port.
    If the port is 0 then the server will start on a free port.
  </description>
</property>

<property>
  <name>dfs.http.address</name>
  <value>sun1:50070</value>
  <description>
    The address and the base port where the dfs namenode web ui will listen on.
    If the port is 0 then the server will start on a free port.
  </description>
</property>

step 5.3 start hadoop and run as step 4.3, 4.4 does

Well done!

Notice:
1. Permission: make sure the user can access each folder hadoop needs
2. SSH trusted connection must be created
3. For user2, ports config is vital, or there must be conflict in ports and failures in address assignment
4. before namenode format, make sure there’s no data in /hadoop/user1 for sun1-3, or there may be some conflict
5. After some change to user, for example, passwd user1, you should enter /var/yp and “$make” to take effect

Reference:
1.Running Hadoop On Ubuntu Linux (Single-Node Cluster)  http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)
2.Running Hadoop On Ubuntu Linux (Multi-Node Cluster) http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
3.Hadoop的安装与使用(转) http://jiajiam.spaces.live.com/blog/cns!E9F2928B37455D08!340.entry
4.Hadoop的安装配置 http://www.javaeye.com/topic/365168


评论

该日志第一篇评论

发表评论

评论也有版权!