HOWTO_Torque/Maui_-_grid_scheduler_and_resource_manager
| Installation • Kernel & Hardware • Networks • Portage • Software • System • X Server • Gaming • Non-x86 • Emulators • Misc |
Contents |
Introduction
More and more Linux clusters are being built. To build a cluster, you need a lot of software, among which the most important are the scheduler and resource manager.
Many commercial schedulers come with an embedded resource manager. However, most open source schedulers don't, so you need to install a separate resource manager.
This HOWTO will show you how to install and configure Torque, a resource manager, and Maui, an advanced cluster scheduler with many advanced features. Both are open source.
Hardware setup & naming convention
I will set up a 4-node cluster, using 4 machines with different hardware setups. I have a mix of AMD Sempron and Intel Pentium III processors, and the amount of installed RAM ranges from 128MB to 1024MB. The only thing the nodes have in common is Gentoo.
I will use a P-III 750MHz/128MB laptop to submit jobs and as a compute node, a P-III 933MHz/512MB RAM as the master/head node, and the two last machines will serve as compute nodes.
| Hostname | Role | Hardware |
|---|---|---|
| main | head node/master node | P-III 933MHz/512MB RAM |
| kitty | job submission/compute node | P-III 750MHz/128MB RAM Laptop |
| valinor | compute node | |
| caladan | compute node |
It is mandatory that your /etc/hosts file is set up correctly and that the hostnames comes first in your list, followed by the FQDN. As an alternative, you could set up a DNS server to resolve names in your network.
Also I will be using the following variables to define some directories path:
- $TORQUECFG=/home/PBS_spool
- $MAUIDIR=/home/maui
Requirements
All clusters require cluster management software; I use xCAT. If you don't already have cluster management software installed, I suggest you follow my xCAT HOWTO (coming soon).
We're going to install Torque and Maui. The latest Maui release is in the portage tree, but the Torque ebuild is for an old version. We will need to download and compile it the latest version of Torque ourselves. Don't emerge anything yet!
Maui is free, but you must register on their website at http://www.clusterresources.com/product/maui/index.php to download it. Once you've got the Maui package, copy it to /usr/portage/distfiles.
Your nodes can obtain their IP addresses either through DHCP or you can give them static addresses; however, name resolution must work, so make sure that your /etc/hosts or DNS servers are properly set up.
Installation
Network and name resolution must work properly before starting the installation. Let's make sure that all our nodes are set up right - each node should be able to ping all other nodes.
Torque
Build
First, untar your torque package and then cd to the directory. Run ./configure --help to see all the options. By default, Torque builds with GUI support, sets the default server to the hostname of the machine you are compiling the server, . I will change the default spool directory setting the -set-server-home to /home/PBS_spool, also enable the server, monitor and clients, and will use scp as the file transfer tool.
From now on $TORQUECFG=/home/PBS_spool
| Code: Torque installation: |
# export TORQUECFG=/home/PBS_spool # tar xvzf torque-2.0.0p8.tar.gz # cd torque-2.0.0p8 # ./configure --enable-server --enable-monitor\ --enable-clients --with-server-home=$TORQUECFG\ --with-scp # make # make install |
If you have not followed the tip, then you must check if you have /usr/local/bin and /usr/local/sbin set in your $PATH. If you donīt, set it in /etc/profile, for example, and update your environment and source your profile so the new variables are set:
| Code: Updating the environment |
# env-update && source /etc/profile |
Configuration
We need to create the initial database which holds all the configuration, then we are going to start qmgr and set all the parameters for the pbs_server. To accomplish that, just start the pbs_server with -t create, and then run qmgr which will gives us a shell prompt to set some parameters:
| Code: Configuring pbs_server |
# pbs_server -t create # qmgr Qmgr: set server operators = root@mail; Qmgr: set server operators += pbsuser@mail Qmgr: create queue batch Qmgr: set queue batch queue_type = Execution Qmgr: set queue batch started = True Qmgr: set queue batch enabled = True Qmgr: set server default_queue = batch Qmgr: set server resources_default.nodes = 1 Qmgr: set server scheduling = True Qmgr: quit # |
Also we need to tell pbs_server which machines it must contact:
| File: $TORQUECFG/server_priv/nodes |
valinor caladan kitty |
Ok. Now letīs build the packages for the other nodes, copy those packages to the compute nodes and install them:
| Code: Build and copy packages |
# cd /tmp/torque-2.0.0p8 # make packages # pscp compute torque-package-mom-linux-i686.sh # pscp compute torque-package-clients-linux-i686.sh # psh compute torque-package-clients-linux-i686.sh --install # psh compute torque-package-mom-linux-i686.sh --install |
Also it is a good idea to check if the nodes know who is the master:
| Code: |
# cat $TORQUECFG/server_name main |
Last part is make sure we can stage data between the nodes. For that you will need the nodes to be able to ssh to each other without prompting for password.
http://www-128.ibm.com/developerworks/library/l-keyc.html
Choose one compute node and edit itīs configuration file, then copy this file to all other nodes:
| File: $TORQUECFG/mom_priv/config: |
arch x86 opsys Gentoo $logevent 255 |
Start pbs_mom on all compute nodes, then kill the server using qterm -t quick and restart it using pbs_server. Wait a few moments and then check nodes availability using pbsnodes -a. You should see all the nodes listed with their features.
There you go! Ok, but your scheduler/resource manager installation is not done yet! :) If you check the queue with qstat, you will see that the status of the job is Q. This is because we donīt have a scheduler yet. We need to set up Maui now, so it can interact with Torque and schedule our jobs flawlessly!
Maui
Build
Ok, with the maui tarball in hands (got to download it as previously said), letīs start the build process:
| Code: Compiling maui: |
# export MAUIDIR=/home/maui # tar xvzf maui-3.2.6p13.tar.gz # ./configure --with-pbs=$TORQUECFG --with-spooldir=$MAUIDIR # make # make install |
If everything compiled ok, we need to add $MAUIADMIN as a Torque Manager. Also we are going to set the default node count to one and default walltime to 5 minutes (system-wide values):
| Code: Add $MAUIADMIN as a manager: |
# qmgr Qmgr: set server managers += root@mail Qmgr: set server resources_default.nodect = 1 Qmgr: set server resources_default.walltime = 00:05:00 Qmgr: quit |
Done! Maui is already integrated with Torque! All you gotta do is start Maui (/usr/local/maui/sbin/maui) and check that it can show queue (/usr/local/maui/bin/showq).
Configuration
No configuration is needed because we have already done it during the configure script.
Conclusion
Maui can integrate easily with Torque, and makes a perfect scheduler/resource manager for you cluster running Gentoo! Too bad we donīt have the latest releases of both softwares in the portage tree, thereīs only Maui, but if you enable the torque integration (pbs), it tries to emerge torque (which is an old release).
Now that I have this installed, I will see if I can create an ebuild and submit it to the Gentoo guys.
Update: Torque 2.2.1 is available via portage since 11/24/07 (sys-cluster/torque).
Troubleshooting
- If you submit a job and it keeps showing as blocked or idle, make sure you set queue batch started = True and set queue batch enabled = True in the qmgr prompt shell.
Related Links
- http://www.clusterresources.com/wiki/doku.php?id=torque:torque_wiki
- http://www.clusterresources.com/wiki/doku.php?id=torque:appendix:c_mom_configuration
- http://www.clusterresources.com/products/maui/docs/mauistart.shtml
- http://www.clusterresources.com/products/maui/docs/pbsintegration.shtml
Article created by : Paragao
Created by NickStallman.net, Luxury Homes Australia
Real estate agents should be using interactive floor plans and list their apartments, townhouses and units.
