Thursday, 11 June 2015

Setting up a computational cluster (HPC), part 1

So by the power elimination I got put in charge of administration/setup of the local cluster system for the theoretical/computational chemistry department. The current system was completely out-dated, and made it impossible to apt-get update/upgrade, so with the addition of additional 60+ nodes from another cluster it was up to me to save the system! Which practically means I had to set it up from scratch. And a lot of googling. So much googling.

So here is what I did.

First thing first, I wanted it easily maintainable and scalable. There is no way I wanted to install software manually on all the nodes, which means all installation and setup needs to be done automatically from the masternode (frontend).

This was done via PXE/TFTP booting, and installing of a netboot Debian image (with a few extra packages). After the Debian installation, package management and configuration of the nodes is done via Puppet.

To speed things up, the whole installation is done via a local apt-get mirror on the master node. This also insures that all the packages are exactly the same version.

What you need of physical hardware:

  • a frontend computer ( (probably with two ethernet ports)
  • Nx nodes (192.168.10.x)
  • switch(s)
  • Ethernet cables

The frontend:

  • hosts a apt-mirror
  • hosts all the user accounts (NIS/ypbind)
  • hosts home folder (for NFS)
  • running {DHCP, TFTP, DNS} (via DNSMASQ) and has PXE image
  • running Puppetmaster
  • running apache
  • running slurm

the nodes:

  • uses the frontend for apt-get server
  • uses frontend NIS for all user accounts
  • network mounted home folder (NFS)
  • running puppet agent
  • running slurm deamon

Setup of the master

Setup apt-mirror

We want all the nodes to have the same packages installed, also on the frontend, for consistency. The way this is implemented is to have local copy of the apt-get server. You will need apache for http requests.

apt-get install apt-mirror
mkdir /srv/apt # basepath
vi /etc/apt/mirror.list # edit and set basepath in

Remember to add debian-installer to the repository list, or the netboot (later on) will have trouble installing debian. Your mirror list should look something like this;


set base_path    /srv/apt
set nthreads     20
set _tilde       0

deb jessie main main/debian-installer
deb-src jessie main

After configuration, apt-mirror and create a symbolic link in your apache webfolder. Apt-mirror will take a few hours to download (approximate 70-90gb)

cd /var/www
sudo ln -s /srv/apt/mirror/ debian # create symbolic link to the mirror

Now we edit our source list to point and our own mirror instead of the internet


deb jessie main

so we know that we are using same packages as the nodes. Now to update our system we need too;

on the frontend


on the nodes



The first thing to setup is the DHCP server on the frontend, and because we want to run a DNS server as well, the easiest service to setup is dnsmasq, instead of ics-dhcp etc.

apt-get install dnsmasq

after installation we configure the server with /etc/dnsmasq.conf.

We want to serve DHCP on eth0 interface with TFTP/PXE boot in range 192.168.10.x. For all nodes the mac addresses are then registered


# tftp boot
pxe-service=x86PC, "Install Debian", pxelinux
# log
# nodes

We server internet on eth1 and local dhcp on eth0, so we setup a static ip on eth0


auto eth0
iface eth0 inet static    

Notice the dhcp-host line where I couple a mac-address to a hostname. The same hostname is then added to the /etc/hosts, for example;   node-hostname

Setup PXE booting image

download netboot/netboot.tar.gz for the version of Debian you are using, and setup the PXE boot;

mkdir /srv/tftp
cd /srv/tftp
tar -xf netboot.tar.gz
vi pxelinux.cfg/default

edit and setup PXE to use a preseed configuration. If you are unsure what to put in your preseed script, you can always manually install debian and check the debconf-get-selections --installer > preseed.cfg output after the installation, or look at this guide


default install
label install
    menu label ^Install
    menu default
    kernel debian-installer/amd64/linux
    append initrd=debian-installer/amd64/initrd.gz auto=true priority=critical url=http://192.168.

The preseed cfg is placed in the apache http folder so it can be loaded over the net. Remember to setup the mirror settings to use the local mirror on the frontend.


Setup NIS and NFS

Next is setup of user management and network shared folders (home and opt).

apt-get install nfs-common nfs-kernel-server

Set the mount drives



and run

nfs-kernel-server restart
showmount -e

And now for NIS

apt-get install nis

give it a NIS domain (remember it, mine was "sunray-nis")

Setup the master to be the server, by editing the file /etc/defaults/nis making sure that you have the following lines:


Once this is done you need to control which machines are allowed to access the NIS server. Do this by editing the file /etc/ypserv.securenets as in the following example:

# Restrict to 192.168.1.x

Run the configuration for NIS

/usr/lib/yp/ypinit -m

and restart nis

service nis restart

Next "setting up puppet/nodes"

No comments:

Post a Comment