Running experiments on a cluster of computers

Guides‎ > ‎

Running experiments on a cluster of computers

Warning! Some of this page is specific to the clusters at Oxford University.

Running NetLogo on clusters

Nice description of the problems and links to scripts: http://lukas.ahrenberg.se/archives/731

An open source project to run on clusters: https://code.google.com/p/clusterlogo/

Detailed message about how to use BehaviorSpace on a cluster: http://groups.yahoo.com/group/netlogo-users/message/11210

Documentation of advanced command line uses of BehaviorSpace: http://ccl.northwestern.edu/netlogo/docs/behaviorspace.html#advanced

New project that might be useful: http://www.openmole.org/getting-started/

Getting started in Oxford

All researchers at the University of Oxford can register for a free account.

If on Microsoft Windows install Putty and X11. I use XLaunch to connect to one of the Oxford clusters. Then I run emacs -font fixed & to have a better interface than a bare command line. I run a shell within emacs as well as Dired. Note that it seems you need to be on the Oxford network to do this (either physically or via VPN).

To install NetLogo (version 5.0.4) I used

wget http://ccl.northwestern.edu/netlogo/5.0.4/netlogo-5.0.4.tar.gz

tar -xzf netlogo-5.0.4.tar.gz

To setup experiments

I created experiments using BehaviorSpace accessible from NetLogo's tool menu.

Then I read the documentation of how to construct job scripts and submit them.

I then wrote a script to run the same experiment on different nodes of the cluster. The script is for the Sal cluster in Oxford.

#!/bin/bash

# set the number of nodes and processes per node

#PBS -l nodes=8:ppn=1

# set max wallclock time

#PBS -l walltime=1:00:00

# set name of job

#PBS -N spanish-flu1

# mail alert at (b)eginning, (e)nd and (a)bortion of execution

#PBS -m bea

# send mail to the following address

#PBS -M kenneth.kahn@it.ox.ac.uk

# use submission environment

#PBS -V

# start job from the directory it was submitted

cd $PBS_O_WORKDIR

# define MPI host details

. enable_hal_mpi.sh

# run through the mpirun launcher

# the model is spanish-flu-cluster.nlogo

# the BehaviorSpace experiment is Etaples-infection-odds-and-encounter-rate-v5

# the output is table format in the file named test-N.csv where N is the node number

# run on 8 cores for each processor

mpirun $MPI_HOSTS /home/ouit-modelling4all/kkahn/netlogo-5.0.4/netlogo-headless.sh \

--model /home/ouit-modelling4all/kkahn/models/spanish-flu-cluster.nlogo \

--experiment Etaples-infection-odds-and-encounter-rate-v5 \

--table /home/ouit-modelling4all/kkahn/models/test1-$PBS_ARRAYID.csv \

--threads 8

Update:

Since the first post on how to run NetLogoon clusters, there have been a few interesting changes on the Oxford clusters.Most notably, the new system (arcus-b) uses the SLURM scheduler. This is a verycommon scheduler and is used also by the NOTUR system (among others). Soincluded in this update is the script to run scripts with a SLURM scheduler.

In order to run an experiment, the setup isstill the same as the previous post. However, in order to submit the script,SLURM uses a different format. Notice a few differences, #PBS is now #SBATCHand the arguments are different in many ways.

Also, we have to load define new MPI hostdetails. These can be seen below.

#!/bin/bash

# set the number of nodes to 16
#SBATCH --nodes=16

# set number of processes 16 per node
#SBATCH --ntasks-per-node=1

# set max wall time to 1 hour
#SBATCH --time=01:00:00

# set the name of the job
#SBATCH --job-name=my_job_id

# mail alerts at beginning and end 
#SBATCH --mail-type=BEGIN
#SBATCH --mail-type=END

# send mail to the following address
#SBATCH --mail-user=my.name@dept.ox.ac.uk

# start job from the directory it was submitted
cd $SLURM_SUBMIT_DIR

# load NetLogo
module load netlogo

# define MPI host details
. enable_arcus-b_mpi.sh

# run in headless
# the model is CreateReligion.nlogo
# the BehaviorSpace experiment is test_arcus-b
# the output is table format in the file named test-N.csv where N is the node number
# run on 16 cores for each processor<
bash netlogo-headless.sh \
--model /home/project/user/NL_Model/model.nlogo \
--experiment test_arcus-b \
--table /home/project/user/NL_Model/test-$SLURM_ARRAYID.csv \
--threads 16

To copy files between a cluster and my PC

In a command prompt I used the following to copy the file spanish-flu-cluster.nlogo from the connected directory to the cluster named Sal:

\bin\pscp spanish-flu-cluster.nlogo kkahn@sal.osc.ox.ac.uk:models

***If you are a windows user, you can also use WinSCP to copy files

Later to copy all the files in the models folder on the Sal cluster to the connected directory in my PC:

\bin\pscp kkahn@sal.osc.ox.ac.uk:models/* .

To submit jobs to the cluster

To run 4 copies of the same experiment I use (where spanish-flu-test1.sh is the name of the file containing the script):

qsub -t 1-4 spanish-flu-test1.sh

Update: When using the slurm scheduler this command is

SBATCH script.sh

To check it is in the queue and see what else is queued use:

qstat

Update: When running this on the slurm scheduler the command is:

SQUEUE -u user

To combine the results of multiple runs

You can use this Java application to combine the results from multiple runs of the same experiment. It takes two arguments -- a folder that contains only CSV files created by NetLogo's BehaviorSpace and the file name of the desired combined file. E.g.

java -jar MergeCSV.jar results/experiment1 results/allResults.csv

You'll need to quote the file paths if they contain spaces or other special characters.

To split a large NetLogo BehaviorSpace experiment into pieces

Sometimes one has a very large experiment that doesn't involve many copies of the same run (perhaps because the model is not very stochastic). For this one can

Install split_nlogo following the instructions in https://github.com/ahrenberg/split_nlogo_experiment/wiki

Then run

python Scripts\split_nlogo_experiment c:\applets\spanish-flu-cluster.nlogo Etaples-infection-odds-and-encounter-rate-v5

Update concerning the use ofsplit_nlogo_experiments.py...

First, if you are using a windows machineto write the template file, you have to change the formatting. It isn’t readilyapparent to everyone, but running dos2unix <scripttemplate>.sh willconvert the line breaks. This is a subtle issue and the way around it is towrite your template in a linux environment.

Second, don’t split the files on your localmachine. It is too complicated and can be time consuming. Split the files onthe cluster, it makes it easier to manage later on.

Lastly, you may not have the ability toinstall the split_nlogo_experiments script file on the cluster with your current privileges (this is the casefor research students for example). In this case, you can just run the scriptwith the necessary arguments afterward. For example, I ran the script below tosplit an experiment into separate files and save the output XML files (used tosetup the experiments) and the script files (used to submit the files).

Here is an example...

python /home/project/user/.local/lib/python2.7/site-packages/split_nlogo_experiment-0.3-py2.7.egg-info --output_dir /home/project/user/NL_Models/Full_Sweep --output_prefix PREFIX_ --create_script

/home/project/user/NL_Models/Full_Sweep/template.sh /home/project/user/NL_Models/Full_Sweep/my_model.nlogo my_experiment

This will split the experiment my_experiment from the NetLogo file my_model.nlogo into experiments stored in /home/project/user/NL_Models/Full_Sweep and scripts based on /home/project/user/NL_Models/Full_Sweep/template.sh . All files will have the output prefix of PREFIX_ .

modelling4allold

Navigation