To summarize how to build a virtual cluster and to memorize what I have done in the last half month.
This blog will show you how to build a virtual cluster with Xen step by step.
I almost did the same step as "ClusterMonkey – Building A Virtual Cluster with Xen" did, but you know, there’re much modification. The most important is that I have no internet access. So, I am here, to show you a latest version to build a virtual cluster offline.

My role: a brand new linux user.

structure:
A Xen installation, the creation of 3 virtual machines (one to act as the master and two slaves), shared storage through NFS.

Step 1. Install linux OS
I tried Fedora 8, Fedora 11, Ubuntu 9.04 and CentOS 5.3, and fianlly I choosed CentOS 5.3(final). For some reason, I deleted all the disks and gave CentOS the whole 320GB.

Here’s network configuration
ip  192.168.1.1
netmask  255.255.255.0
gateway  192.168.1.254

Step 2. Install Xen
You may wonder why I tried so many OS. Because, at the first moment, I choosed to install Xen from a source tarball. However, I could not make it due to the version problem. Sometimes, OS lacks of some necessary packages for Xen(Remember: I have no internet access), and sometimes, Xen could not recognize my hard disk, although I had a successful installation. At last, I got useful advices from one senior that CentOS has included Xen(later, I found Fedora also did the same thing).
So, the installation is much easier. Get all xen, xen-kernel and other dependent packages(such as Virtualization-en-US, libvirt, virt-manager, python-virtinst and etc.) from DVD. Like:

[root@daisy ~]# mkdir xen-install
[root@daisy ~]# cp /media/disk/CentOS/xen-3.0.3-80.e15.i386.rpm ./xen-install

Then install Xen.

[root@daisy ~]# cd xen-install
[root@daisy xen-install]# rpm -ivh *.rpm

If you are lack of some packages, copy them from DVD. Finally, you will succeed in installing xen, and then reboot. Then you will check whether the installation is ok by xm command.

modify the enviroment

[root@daisy ~]# cat /etc/bashrc
PATH=$PATH:/usr/sbin
PATH=$PATH:/sbin

[root@daisy ~]# xm list
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0     3280     4 r—–    926.5

ok, you have made it.

Step 3. Install guest OS in Xen (Use another user: frank)

Step 3.1. Make a ftp server for guest OS installation source.

[frank@daisy ~]# rpm -ivh vsftpd-2.0.5-12.el5.rpm
[frank@daisy ~]# vsftpd

upload the CentOS 5.3 installation files to ftp
[frank@daisy ~]# cp -rf /media/disk/* /var/ftp/pub/centos5.3

Step 3.2. Create a 10GB img for master installation

[frank@daisy ~]# mkdir vcluster
[frank@daisy ~]# dd if=/dev/zero of=/vcluster/master/master.img bs=4M count=2560

Step 3.3. install master on Xen

[frank@daisy ~]# virt-install
What is the name of your virtual machine? master
How much RAM should be allocated (in megabyte)? 256
What would you like to use as the disk (path)? ./vcluster/master/master.img
Would you like to enable graphics support? (yes or no) no
What is the install location? ftp://192.168.1.1/pub/centos5.3

And then you will find the familiar installation interface. Maybe you are not comfortable without graphics support, but you will soon find it is the same thing. Make sure turn off the firewall and SElinux.

Step 3.4. login master and configure master’s network

[frank@daisy ~]# xm create master
[frank@daisy ~]# xm console master

the following instruction is in master node.
modify /etc/sysconfig/network-scrpits/ifcfg-eth0 to make
ip   192.168.1.2
netmask  255.255.255.0
gateway  192.168.1.254

Use "Virtual system manager" to create a private network eth1, and modify master’s network
ip   192.168.0.2
netmask  255.255.255.0
gateway  192.168.0.254

Step 3.5. install slaves

you can do the similar thing as you install master.

[frank@daisy ~]# dd if=/dev/zero of=/vcluster/slave1/slave1.img bs=4M count=2560
[frank@daisy ~]# virt-install
[frank@daisy ~]# xm create slave1
[frank@daisy ~]# xm console slave1

Use "Virtual system manager" to create a private network eth1, and modify salve1’s network, with ip 192.168.0.3
Do the same thing to slave2 with ip 192.168.0.4

Ok, you will see 3 virtual machine running in xen

[root@daisy ~]# xm list
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0     3280     4 r—–    926.5
master                                     1      255     1 -b—-    290.4
slave1                                     2      255     1 -b—-     36.4
slave2                                     3      255     1 -b—-     38.5

Step 4. NFS Configuration

Step 4.1. Master NFS Server Configuration

You should firstly get nfs-utils. So

[root@master ~]# ftp 192.168.1.1
name: anonymous
ftp> cd pub/centos5.3/CentOS
ftp> mget
(remote files) nfs-utils-1,0.9-40.e15.i386.rpm
ftp> quit

Then, you can get nfs-utils. In the following part, I won’t mention how I get rpm package because I always did in this way.

[root@master ~]# rpm -ivh nfs-utils-1,0.9-40.e15.i386.rpm
[root@master ~]# mkdir /cshare
[root@master ~]# chmod 777 /cshare
[root@master ~]# cat /etc/exports
/cshare         192.168.0.0/255.255.255.0(rw,sync)
[root@master ~]# cat /etc/hosts.deny
portmap:ALL
lockd:ALL
mountd:ALL
rquotad:ALL
statd:ALL
[root@master ~]# cat /etc/hosts.allow
portmap: 192.168.0.0/255.255.255.0
lockd: 192.168.0.0/255.255.255.0
mountd: 192.168.0.0/255.255.255.0
rquotad: 192.168.0.0/255.255.255.0
statd: 192.168.0.0/255.255.255.0
[root@master ~]# cat /etc/fstab
none                    /proc/fs/nfsd           nfsd    defaults 0 0
[root@master ~]# cat /etc/hosts
#127.0.0.1             localhost
192.168.0.2            master
192.168.0.3            slave1
192.168.0.4            slave2
[root@master ~]# chkconfig –level 345 nfs on

Step 4.2. Slave NFS Configuration

[root@slave1 ~]# rpm -ivh portmap-4.0-65.2.2.1
[root@slave1 ~]# cat /etc/fstab
192.168.0.2:/cshare    /cshare                 nfs     rw,hard,intr 0 0
[root@slave1 ~]# mount -t nfs 192.168.0.2:/cshare /cshare
[root@slave1 ~]# cat /etc/hosts
#127.0.0.1             localhost
192.168.0.2            master
192.168.0.3            slave1
192.168.0.4            slave2
[root@slave1 ~]# mkdir /cshare

you should do the same thing for slave2.

Step 4.3. Check NFS
[root@slave1 ~]# df -h
[root@slave1 ~]# ping master

Here, you have accomplished the basic building of the virtual cluster. The following is to install cluster softwares. I downloaded C3, MPICH2, Torque and Maui in other places.

Step 5. C3 Installation

For master

[root@master ~]# rpm -ivh rsync-2.6.8-3.1.i386.rpm rsh-0.17-38.e15.i386.rpm rsh-server-0.17-38.e15.i386.rpm xinetd-2.3.14-10.e15.i386.rpm
[root@master ~]# vi /etc/hosts.equiv
master
slave1
slave2
[root@master ~]# cp rsync-2.6.8-3.1.i386.rpm rsh-0.17-38.e15.i386.rpm rsh-server-0.17-38.e15.i386.rpm xinetd-2.3.14-10.e15.i386.rpm /cshare

For slaves

[root@slave1 cshare]# cp rsync-2.6.8-3.1.i386.rpm rsh-0.17-38.e15.i386.rpm rsh-server-0.17-38.e15.i386.rpm xinetd-2.3.14-10.e15.i386.rpm ~
[root@slave1 cshare]# cd ~
[root@slave1 ~]# rpm -ivh rsync-2.6.8-3.1.i386.rpm rsh-0.17-38.e15.i386.rpm rsh-server-0.17-38.e15.i386.rpm xinetd-2.3.14-10.e15.i386.rpm

In the file /etc/xinetd.d/rsh change the line disable = yes to disable = no
In the file /etc/securetty add a line containing rsh
In the file /etc/pam.d/rsh change the line auth required pam_rhosts_auth.so by adding at the end hosts_equiv_rootok

[root@slave1 ~]# vi /etc/hosts.equiv
master
slave1
slave2

Then, (remember, only in the slaves) we start xinetd:
 
[root@slave1 ~]# service xinetd start

Do the same thing to slave2

For master

[root@master ~]# tar -zxf c3-4.0.1.tar.gz
[root@master ~]# cd c3-4.0.1
[root@master c3-4.0.1]# ./Install-c3
[root@master c3-4.0.1]# ln -s /opt/c3-4/c[^0-9]* /usr/local/bin/
[root@master c3-4.0.1]# vi /etc/c3.conf

cluster vcluster {
  master:192.168.0.2
  slave[1-2]
}

[root@master ~]# cat /etc/profile
 
export C3_RSH=rsh

[root@master ~]# cexec uname -a

************************* master *************************
——— slave1———
Linux slave1 2.6.18.128-e15xen #1 SMP Wed Jan 21 11:55:02 EST 2009 i686 i686 i386 GNU/Linux
——— slave2———
Linux slave2 2.6.18.128-e15xen #1 SMP Wed Jan 21 11:55:02 EST 2009 i686 i686 i386 GNU/Linux

Step 6. Install MPI

Step 6.1 create SSH trusted connection

For master

[root@master ~]# ssh-keygen -t rsa (press ‘Enter’ all along)
[root@master ~]# cd .ssh
[root@master .ssh]# cp id_rsa.pub authorized_keys
[root@master .ssh]# cd ..
[root@master ~]# ssh master      (type ‘yes’)

For slaves

[root@slave1 ~]# ssh-keygen -t rsa
[root@master ~]# scp 192.168.0.2:/root/.ssh/* /root/.ssh
[root@master ~]# scp 192.168.0.2:/etc/hosts /etc/hosts
[root@master ~]# ssh slave1  (type ‘yes’)

Do the same thing to slave2
Check the 3 machines have trusted connection built

[root@master ~]# ssh slave1  (type ‘yes’, and no password)
[root@master ~]# ssh slave2
[root@slave1 ~]# ssh master
[root@slave1 ~]# ssh slave2
[root@slave2 ~]# ssh master
[root@slave2 ~]# ssh slave1

Step 6.2. Install MPICH2

[root@master ~]# tar -zxvf mpich2-1.1.tar.gz
[root@master ~]# mkdir /usr/mpich2
[root@master ~]# cd mpich2-1.1
[root@master mpich2-1.1]# ./configure –prefix=/usr/mpich2
[root@master mpich2-1.1]# make
[root@master mpich2-1.1]# make install
[root@master mpich2-1.1]# cat /etc/bashrc

PATH=$PATH:/usr/mpich2/bin

[root@master mpich2-1.1]# logout

[root@master ~]# touch /etc/mpd.conf
[root@master ~]# chmod 600 /etc/mpd.conf
[root@master ~]# vi /etc/mpd.conf

MPD_SECRETWORD=mr45-j9z

[root@master ~]# vi mpd.hosts

master
slave1
slave2

Step 6.3. check installation

[root@master ~]# mpd &
[root@master ~]# mpdtrace
[root@master ~]# mpdallexit

Do the same thing to slave1 and slave2.

Step 6.4. Run MPI

[root@master ~]# vi hello.c

#include "mpi.h"
#include <stdio.h>
#include <math.h>
void main(argc,argv)
int argc;
char *argv[];
{
 int myid, numprocs;
 int namelen;
 char processor_name[MPI_MAX_PROCESSOR_NAME];
 
 MPI_Init(&argc,&argv);
 MPI_Comm_rank(MPI_COMM_WORLD,&myid);
 MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
 MPI_Get_processor_name(processor_name,&namelen);
 fprintf(stderr,"Hello World! Process %d of %d on %s\n", myid, numprocs, processor_name);
 MPI_Finalize();
}

[root@master ~]# mpicc -o hello hello.c
[root@master ~]# cpush ./hello
[root@master ~]# mpdboot -n 3 -f mpd.hosts
[root@master ~]# mpirun -n 3 ./hello

Step 7. Install Torque

Step 7.1 Install Torque on master

[root@master ~]# tar zxvf torque-2.3.7.tar.gz
[root@master ~]# cd torque-2.3.7
[root@master torque-2.3.7]# ./configure -prefix=/opt/torque-2.3.7
[root@master torque-2.3.7]# make
[root@master torque-2.3.7]# make install
[root@master torque-2.3.7]# cat /etc/bashrc

PATH=$PATH:/opt/torque-2.3.7/bin
PATH=$PATH:/opt/torque-2.3.7/sbin

[root@master ~]# logout

[root@master ~]# cd torque-2.3.7
[root@master torque-2.3.7]# ./torque_setup root
[root@master torque-2.3.7]# vi /var/spool/torque/server_priv/nodes

master
slave1
slave2

[root@master torque-2.3.7]# chmod 1777 /var/spool/torque/spool
[root@master torque-2.3.7]# chmod 1777 /var/spool/torque/undelivered

[root@master torque-2.3.7]# pbs_server -t create
[root@master torque-2.3.7]# qmgr -c "set server scheduling=true"
[root@master torque-2.3.7]# qmgr -c "create queue dque queue_type=execution"
[root@master torque-2.3.7]# qmgr -c "set queue dque started=true"
[root@master torque-2.3.7]# qmgr -c "set queue dque enabled=true"
[root@master torque-2.3.7]# qmgr -c "set queue dque resources_default.nodes=1"
[root@master torque-2.3.7]# qmgr -c "set queue dque resources_default.walltime=3600"
[root@master torque-2.3.7]# qmgr -c "set server default_queue=dque"

Check status

shutdown server
[root@master torque-2.3.7]# qterm -t quick

start server
[root@master torque-2.3.7]# pbs_server

verify all queues are properly configured
[root@master torque-2.3.7]# qstat -q

view additional server configuration
[root@master torque-2.3.7]# qmgr -c ‘p s’

verify all nodes are correctly reporting
[root@master torque-2.3.7]# pbsnodes -a
[root@master torque-2.3.7]# pbs_mom

Step 7.2. Install Torque on slaves

[root@master torque-2.3.7]# make packages
[root@master torque-2.3.7]# cp torque-package-clients-linux-i686.sh, torque-package-devel-linux-i686.sh, torque-package-doc-linux-i686.sh, torque-package-mom-linux-i686.sh, torque-package-server-linux-i686.sh /cshare

[root@slave1 ~]# cd /cshare
[root@slave1 cshare]# cp torque-package-clients-linux-i686.sh, torque-package-devel-linux-i686.sh, torque-package-doc-linux-i686.sh, torque-package-mom-linux-i686.sh, torque-package-server-linux-i686.sh ~
[root@slave1 cshare]# cd ~
[root@slave1 ~]# ./torque-package-clients-linux-i686.sh

you should run all the sh files to install

[root@slave1 ~]# vi /var/spool/torque/server_name

master

[root@slave1 ~]# vi /var/spool/torque/mom_priv/config

$pbsserver master
$logevent 255
$usecp master:/cshare /cshare

[root@slave1 ~]# cat /etc/bashrc

PATH=$PATH:/opt/torque-2.3.7/bin
PATH=$PATH:/opt/torque-2.3.7/sbin

[root@slave1 ~]# logout
[root@slave1 ~]# pbs_mom

Do the same thing to slave2

Step 8. Install Maui on master (not in slaves)

[root@master ~]# tar zxvf maui-3.2.6p21.tar.gz
[root@master ~]# cd maui-3.2.6p21
[root@master ~]# ./configure -prefix=/opt/maui-3.2.6p21 -with-pbs=/opt/torque-2.3.7
[root@master ~]# make
[root@master ~]# make install

[root@master ~]# vi /usr/local/maui/maui.cfg

SERVERHOST master
# primary admin must be first in list
ADMIN1 root
# Resource Manager Definition
RMCFG[master] TYPE=PBS@RMNMHOST@
RMTYPE[0] PBS

[root@master ~]# cat /etc/bashrc

PATH=$PATH:/opt/maui-3.2.6p21/bin
PATH=$PATH:/opt/maui-3.2.6p21/sbin

[root@master ~]# logout
[root@master ~]# maui

Notice: do not start pbs_sched on slaves

[root@master ~]# logout

[frank@master ~]# vi hello.pbs

#!/bin/sh
#PBS -N hello
#PBS -o hello.log
#PBS -e hello.err
#PBS -q dque
#PBS -l nodes=3
cd ~/tmp
echo Time is `date`
echo Directory is $PWD
echo This job runs on the following nodes:
cat $PBS NODEFILE
NPROCS=`wc -l<$PBS NODEFILE`
echo This job has allocated $NPROCS nodes
mpiexec -machinefile $PBS NODEFILE -np $NPROCS ./hello

[frank@master ~]# qsub hello.pbs

Then you can check the status

[frank@master ~]# qstat -q
[frank@master ~]# qstat -a

Ok, well done!!!

The above is my steps to build a virtual cluster with Xen offline. I complete this blog in rush, so please tell me if you find any problem.

Reference

ClusterMonkey – Building A Virtual Cluster with Xen
Centos 5 xen 最简单装配置: http://www.diybl.com/course/6_system/linux/Linuxjs/200899/141086.html
Linux下MPICH2集群系统安装手册(采用建立信任ssh)
资源管理软件TORQUE与作业调度软件Maui的安装、设置及使用


1条评论

  1. nice read. I would love to follow you on twitter.

发表评论

评论也有版权!