This page has moved to here: http://hpcc.ucr.edu/manuals_linux-cluster_intro.html IIGB Linux Cluster
Introduction
[ Cluster Status Announcements ]
This manual provides an introduction to the usage of IIGB's Linux cluster, Biocluster.
All servers and compute resources of the IIGB bioinformatics facility are available to
researchers from all departments and colleges at UC Riverside for a minimal
recharge fee (see rates).
To request an account, please contact Rakesh Kaundal (rkaundal@ucr.edu).
The latest hardware/facility description for grant applications is available here: Facility Description [pdf].
Biocluster Overview

Storage
- Four enterprise class HPC storage systems
- Approximately 2 PB (2048 TB) of network storage
- GPFS and NFS
- Automatic snapshots and archival backups
Network
- Ethernet:
- 1 Gb/s switch x 5
- 1 Gb/s switch 10 Gig uplink
- 10 Gb/s switch for Campus wide Science DMZ
- redundant, load balanced, robust mesh topology
- Interconnect
Head NodesAll users should access the cluster via ssh through biocluster.ucr.edu, this address will automatically balance traffic to one of the available head nodes.
- Penguin
- Resources: 8 cores, 64 GB memory
- Primary function: submitting jobs to the queuing system (Torque/Maui)
- Secondary function: development; code editing and running small (under 50 % CPU and under 30 % RAM) sample jobs
- Pigeon
- Resources: 16 cores, 128 GB memory
- Primary function: submitting jobs to the queuing system (Torque/Maui)
- Secondary function: development; code editing and running small (under 50 % CPU and under 30 % RAM) sample jobs
- Pelican
- Resources: 32 cores, 64 GB memory
- Primary function: submitting jobs to the queuing system (Torque/Maui)
- Secondary function: development; code editing and running small (under 50 % CPU and under 30 % RAM) sample jobs
- Owl
- Resources: 16 cores, 64 GB memory
- Primary function: testing; running test sets of jobs
- Secondary function: submitting jobs to the queuing system (Torque/Maui)
- Globus
- Resources: 32 cores, 32 GB memory
- Primary function: submitting jobs to the queuing system (Slurm)
- Secondary function: development; code editing and running small (under 50 % CPU and under 30 % RAM) sample jobs
Worker Nodes
- Batch
- c01-c48: each with 64 AMD cores and 512 GB memory
- Highmem
- h01-h06: each with 32 Intel cores and 1024 GB memory
- GPU
- gpu01-gpu02: each with 32 (HT) cores Intel Haswell CPUs and 2 x NVIDIA Tesla K80 GPUs (~10000 CUDA cores) and 128 GB memory
- Intel
- i01-i12: each with 32 Intel Broadwell cores and 512 GB memory
Current status of Biocluster nodes
Getting Started
The initial login, brings users into a Biocluster head node (i.e. pigeon, penguin, owl). From there, users can submit jobs via qsub to the compute nodes or log into owl to perform memory intensive tasks.
Since all machines are mounting a centralized file system, users will always see the same home directory on all systems. Therefore, there is no need to copy files from one machine to another.
Login from Mac or Linux
Open the terminal and type
ssh -X username@biocluster.ucr.edu
Login from Windows
Please refer to the login instructions of our Linux Basics manual.
Change Password
- Log-in via SSH using the Terminal on Mac/Linux or Putty on Windows
- Once you have logged in type the following command:
passwd
- Enter the old password (the random characters that you were given as your initial password)
- Enter your new password
The password minimum requirements are:
- Total length at least 8 characters long
- Must have at least 3 of the following:
- Lowercase character
- Uppercase character
- Number
- Punctuation character
Modules
All software used on Biocluster is managed through a simple module system.
You must explicitly load and unload each package as needed.
More advanced users may want to load modules within their bashrc, bash_profile, or profile files.
Available Modules
To list all available software modules, execute the following:
This should output something like:
------------------------- /usr/local/Modules/versions --------------------------
3.2.9
--------------------- /usr/local/Modules/3.2.9/modulefiles ---------------------
BEDTools/2.15.0(default) modules
PeakSeq/1.1(default) python/3.2.2
SOAP2/2.21(default) samtools/0.1.18(default)
bowtie2/2.0.0-beta5(default) stajichlab
cufflinks/1.3.0(default) subread/1.1.3(default)
matrix2png/1.2.1(default) tophat/1.4.1(default)
maui/3.3.1(default) trans-ABySS/1.2.0(default)
module-info
Using Modules
To load a module, run:
module load <software name>[/<version>]
To load the default version of the tophat module, run:
If a specific version of tophat is needed, 1.4.1 for example, run:
Show Loaded Modules
To show what modules you have loaded at any time, you can run:
Depending on what modules you have loaded, it will produce something
like this:
Currently Loaded Modulefiles:
1) maui/3.3.1 2) tophat/1.4.1 3) PeakSeq/1.1
Unloading Software
Sometimes you want to no longer have a piece of software in path. To do
this you unload the module by running:
module unload <software name>
Additional Features
There are additional features and operations that can be done with the
module command. Please run the following to get more information:
Quotas
CPU
Currently, the maximum number of CPU cores a user can use simultaneously on biocluster is 256 CPU cores when the load on the cluster is <30% and 128 CPU cores when the load is above 30%. If a user submits jobs for more than 256/128 CPU cores then the additional requests will be queued until resources within the user's CPU quota become available. Upon request a user's upper CPU quota can be extended temporarily, but only if sufficient CPU resources are available. To avoid monopolisation of the cluster by a small number of users, the high load CPU quota of 128 cores is dynamically readjusted by an algorithm that considers the number of CPU hours accumulated by each user over a period of 2 weeks along with the current overall CPU usage on the cluster. If the CPU hour average over the 2 week window exceeds an allowable amount then the default CPU quota will be reduced for such a heavy user to 64 CPU cores, and if it exceeds the allowable amount by two-fold it will be reduced to 32 CPU cores. Once the average usage of a heavy user drops again below those limits, the upper CPU limit will be raised accordingly. Note: when the overall CPU load on the cluster is below 70% then the dynamically readjusted CPU quotas are not applied. At those low load times every user has the same CPU quota: 256 CPU cores at <30% load and 128 CPU cores at 30-70% load.
* All users and PIs are requested to please adhere to the facility rules and regulations as mentioned above. In case of any violation or misuse of allowable CPU quota, the facility reserves the right to cancel the subscription at any time. Data Storage
A standard user account has a storage quota of 20GB. Much more storage space, in the range of many TBs, can be made available in a user account's bigdata directory. The amount of storage space available in bigdata depends on a user group's annual subscription. The pricing for extending the storage space in the bigdata directory is available here.
Memory
From the Biocluster head node users can submit jobs to the batch queue or the highmem queue. The nodes associated with the batch queue are mainly for CPU intensive tasks, while the nodes of the highmem queue are dedicated to memory intensive tasks. The batch nodes allow a 1GB RAM minimum limit on jobs and and the highmem nodes allow 16GB-512GB RAM jobs.
What's Next?
You should now know the following:
- Basic orginization of Biocluster
- How to login to Biocluster
- How to use the Module system to gain access to Biocluster software
- CPU, storage, and memory limitations (quotas and hardware limits)
Now you can start using Biocluster.
The recommended way to run your jobs (scripts, pipelines, experiments, etc...) is to submit them to the queuing system by using qsub .
Biocluster uses Torque/Maui software as a PBS, Portable Batch System, queuing system.
Please do not run ANY computationally intensive tasks on any Biocluster head node that starts with the letter "P" (i.e. penguin, pigeon, parrot). If this policy is violated, your jobs will be killed to limit the negative impact on others.
The head nodes are a shared resource and should be accessible by all users. Negatively impacting performance would affect all users on the system and will not be tolerated.
However you may run memory intensive jobs on Owl.
Login to Owl like so:
Managing Jobs
Submitting and managing jobs is at the heart of using the cluster. A 'job' refers to the script, pipeline or experiment that you run on the nodes in the cluster.
Queues/Partitions
In the past we used queues under the old Torque system, we now refer to these logically grouped nodes as partitions. There are several different partitions available for cluster users to send jobs to:
- batch
- Nodes: c01-c48
- Cores: AMD, 256 per user
- RAM: 1 GB default
- Time (walltime): 168 hours (7 days) default
- highmem
- Nodes: h01-h04
- Cores: Intel, 32 per user
- RAM: 16 GB min and 1024 GB max
- Time (walltime): 48 hours (2 days) default
- gpu
- Nodes: gpu01-gpu02
- Cores: Intel, 16 per user
- RAM: 128 GB default
- Time (walltime): 100 hours default
- intel
- Default partition
- Nodes: i01-i12
- Cores: Intel, 64 per user
- RAM: 1 GB default
- Time (walltime): 168 hours (7 days) default
- Group Partition
- This partition is unique to the group, if your lab has purchased nodes then you will have a priority partition with the same name as your group (ie. girkelab).
In order to submit a job to different partitions add the optional '-p' parameter with the name of the partition you want to use:
sbatch -p batch SBATCH_SCRIPT.sh
sbatch -p highmem
SBATCH_SCRIPT.sh
sbatch -p gpu
SBATCH_SCRIPT.sh
sbatch -p intel
SBATCH_SCRIPT.sh
sbatch -p mygroup SBATCH_SCRIPT.sh
SlurmCurrently all the above partitions are available under Slurm, however Slurm jobs can only be submitted from Globus. Therefore after logging into Biocluster via ssh, ssh again into globus.
username@pigeon:~$ ssh -XY globus
Submitting JobsThere are 2 basic ways to submit jobs; non-interactive, interactive. Slurm will automatically start within the directory where you submitted the job from, so keep that in mind when you use relative file paths. Non-interactive submission of a SBATCH script:
Here is an example of an SBATCH script:
#!/bin/bash -l
#SBATCH --nodes=1 #SBATCH --ntasks=10 #SBATCH --mem-per-cpu=1G #SBATCH --time=1-00:15:00 # 1 day and 15 minutes #SBATCH --output=my.stdout #SBATCH --mail-user=useremail@address.com #SBATCH --mail-type=ALL #SBATCH --job-name="just_a_test" #SBATCH -p intel # This is the default partition, you can use any of the following; intel, batch, highmem, gpu
# Print current date date
# Load samtools
module load samtools
# Change directory to where you submitted the job from, so that relative paths resolve properly cd $SLURM_SUBMIT_DIR
# Concatenate BAMs
samtools cat -h header.sam -o out.bam in1.bam in2.bam
# Print name of node
hostname
The above job will request 1 node, 10 task (assumes 1 cpu core per task), 10GB of memory (1GB per task), for 1 day and 15 minutes. All STDOUT will be redirected to a file called "my.stdout" as well as an email sent to the user when the status of the job changes.
Interactive submission:
If you do not specify a partition then the intel partition is used by default. Here is a more complete example:
srun --x11 --mem=1gb --cpus-per-task 1 --ntasks 1 --time 10:00:00 --pty bash -l
The above example enables X11 forwarding and requests, 1GB of memory, 1 cores, for 10 hours within an interactive session.
Monitoring JobsTo check on your jobs states, run the following:
To list all the details of a specific job, run the following:
scontrol show job <JOBID>
Advanced JobsThere is a third way of submitting jobs by using steps.
Single Step submission:
Under a single step job your command will hang until appropriate resources are found and when the step command is finished the results will be sent back on STDOUT. This may take some time depending on the job load of the cluster.
Multi Step submission:
salloc -N 4 bash -l srun <command> ... srun <command> exit
Under a multi step job the salloc command will request resources and then your parent shell will be running on the head node. This means that all commands will be executed on the head node unless preceeded by the srun command. You will also need to exit this shell in order to terminate your job.
GPU JobsA single GPU job will no longer reserve an entire node. For each node there are 4 GPUs. This means that you need to request how many GPUs that you would like to use.
Non-Interactive:
srun - p gpu --mem=100g --time=1:00:00 SBATCH_SCRIPT.sh
Interactive
srun - p gpu --gres=gpu:4 --mem=100g --time=1:00:00 --pty bash -l
Of
course you should adjust the time argument according to your job re quirements. Once your job starts your code must reference the environment variable " CUDA_VISIBLE_DEVICES" which will indicate which GPUs have been assigned to your job. Most CUDA enabled software, like MegaHIT, will check this environment variable and automatically limit accordingly. For example, when reserving 4 GPUs for a NAMD2 job: echo $ CUDA_VISIBLE_DEVICES 0,1,2,3
namd2 +idlepoll +devices $CUDA_VISIBLE_DEVICES MD1.namd
Each user is
limited to a maximum of 4 GPUs on the gpu partition. Please be respectful of
others and keep in mind that the GPU nodes are a limited shared resource. Since
the CUDA libraries will only run with GPU hardware then development and
compiling of code must be done within a job session on a GPU
node, examples mentioned above.
Torque/Maui
Submitting Jobs
The command used to submit jobs is qsub . There are two basic ways to submit jobs:
The first way this can be done is by using a technique where you pipe in the command via STDIN.
For example, lets say you have a set of files in your home directory and want to run blast against them, you could run a command similar to what we find below to have that run on a node.
echo blastall -p blastp -i myseq.fasta -d AE004437.faa -o blastp.out -e 1e-6 -v 10 -b 10 | qsub
When using the cluster it quickly becomes useful to be able to run multiple commands as part of a single job. To solve this we write scripts. In this case, the way it works is that we invoke the script as the last argument to qsub.
A script is just a set of commands that we want to make happen once the job runs. Below is an example script that does the same thing that we do with Exercise 5 in the Linux Basics Manual.
#!/bin/bash
#PBS -M email@address.com
# Define email address for job notifications
#PBS -m abe
# Send email notification if job is (a) aborted, (b) begins, or (e) ends
# Create a directory for us to do work in.
# We are using a special variable that is set by the cluster when a job runs.
mkdir $PBS_JOBID
# Change to that new directory
cd $PBS_JOBID
# Copy the proteome of Halobacterium spec.
cp /srv/projects/db/ex/AE004437.faa .
# Do some basic analysis
# The echo command prints info to our output file
echo "How many predicted proteins are there?"
grep '^>' AE004437.faa --count
echo "How many proteins contain the pattern \"WxHxxH\" or \"WxHxxHH\"?"
egrep 'W.H..H{1,2}' AE004437.faa
# Start preparing to do a blast run
# Use awk to grab a number of proteins and then put them in a file.
echo "Generating a set of IDs"
awk --posix -v RS='>' '/W.H..(H){1,2}/ { print ">" $0;}' AE004437.faa | grep '^>' | awk --posix -v FS='|' '{print $4;}' > my_IDs
# Make the proeome blastable
echo "Making a blastable database"
formatdb -i AE004437.faa -p T -o
# Make blastable IDs
echo "Making a set of blastable IDs"
fastacmd -d AE004437.faa -i my_IDs > myseq.fasta
# Run blast
echo "Running blast"
blastall -p blastp -i myseq.fasta -d AE004437.faa -o blastp.out -e 1e-6 -v 10 -b 10
So if this script was called blast_AE004437.sh we could run the following to make all of those steps happen.
Tracking Jobs
Now that we have a job in the queue, how do I know if it is running? For that, there is a command called qstat . The command qstat will provide you with the current state of all the jobs running or queued to run on the cluster. The following is an example of that output:
Job id Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
467655.torque-server ...MTcauLSS_5.sh xzhang 2562047: R batch
467660.torque-server ...TnormLSS_5.sh xzhang 5124113: R batch
467663.torque-server ARSUMTt3LSS_5.sh xzhang 5124113: R batch
474003.torque-server Aedes2 bradc 1095:30: R batch
7475989.torque-server Culex2 bradc 928:40:0 R batch
478663.torque-server STDIN snohzadeh 00:36:28 R batch
626327.torque-server STDIN wangya 11:16:38 R batch
645318.torque-server STDIN yfu 477:49:3 R batch
645353.torque-server STDIN yfu 464:31:4 R batch
655060.torque-server newphyml.sh nponts 364:57:5 R batch
655077.torque-server newphyml.sh nponts 401:32:2 R batch
655182.torque-server newphyml.sh nponts 396:35:2 R batch
655385.torque-server newphyml.sh nponts 337:29:4 R batch
655469.torque-server newphyml.sh nponts 146:23:5 R batch
655493.torque-server newphyml.sh nponts 335:05:0 R batch
655571.torque-server newphyml.sh nponts 358:33:5 R batch
655742.torque-server newphyml.sh nponts 314:08:5 R batch
655754.torque-server newphyml.sh nponts 299:45:2 R batch
655814.torque-server newphyml.sh nponts 109:59:4 R batch
655951.torque-server newphyml.sh nponts 268:58:3 R batch
655962.torque-server newphyml.sh nponts 325:04:2 R batch
656054.torque-server newphyml.sh nponts 277:43:3 R batch
656055.torque-server newphyml.sh nponts 327:37:2 R batch
656195.torque-server newphyml.sh nponts 270:07:2 R batch
656309.torque-server newphyml.sh nponts 261:18:4 R batch
656339.torque-server newphyml.sh nponts 306:47:0 R batch
656340.torque-server newphyml.sh nponts 275:05:4 R batch
656486.torque-server newphyml.sh nponts 259:59:3 R batch
659489.torque-server STDIN zwu 00:25:01 R batch
672645.torque-server STDIN snohzadeh 00:00:00 R batch
674351.torque-server STDIN yfu 165:40:5 R batch
674819.torque-server Seqrank_CL_08.sh xzhang 115:43:3 R batch
674940.torque-server submit_script.sh nsausman 683:14:1 R batch
675260.torque-server ...eatModeler.sh robb 233:01:5 R batch
675266[].torque-server sa.sh jban 0 R batch
675275[].torque-server LeucoMakerMrctr hishak 0 R batch
675853[].torque-server sa.sh jban 0 R batch
677089.torque-server LFPcorun.sh jychen 57:31:33 R batch
679437.torque-server Chr8.mergeBam.sh robb 0 Q batch
679438.torque-server Chr9.mergeBam.sh robb 0 Q batch
679439.torque-server Chr1.cat_fq.sh robb 0 Q batch
679440.torque-server Chr10.cat_fq.sh robb 0 Q batch
679441.torque-server Chr11.cat_fq.sh robb 0 Q batch
679442.torque-server Chr12.cat_fq.sh robb 0 Q batch
... CONTINUED ...
The R in the S column means a job is running and a Q means that the job is queued waiting to run. Jobs get queued for a number of reasons, the most common are:
- A job scheduling run has not yet completed. Scheduling runs take place approximately every 15 seconds.
- The queue is at ~75% capacity and the job is requesting a significant amount of walltime.
- The queue is at 100% capacity and the job has no place it can be started.
- The job is requesting specific resources, such as 8 processors, and there is no place the system is able to fit it.
- The user submitting the job has reached a resource maximum for that queue and cannot start any more jobs running until other jobs have finished.
There are additional flags that can be passed to qstat to get more information about the state of the cluster, including the -u flag that will only display the status of jobs for a particular user.
Once a job has finished, it will no longer show up in this listing.
Job Results
By default, results from the jobs come out two different ways.
- The system sends STDOUT and STDERR to files called
<job_name>.o<job_number> and <job_name>.e<job_number> .
- Any output created by your script, like the
blastp.out in the example above.
For example if you ran the example from above and got a job number of 679746, you would end up with a file called blast_AE004437.sh.o679746 and a file called blast_AE004437.sh.e679746 in the directory where you ran qsub . Additionally, because our script creates a directory using the PBS_JOBID variable, you would have a directory in your home directory called 679746.torque01 .
Deleting Jobs
Sometimes you need to delete a job. You may need to do this if you accidentally submitted something that will run longer than you want or perhaps you accidentally submitted the wrong script. To do delete a job, you use the qdel command. If you wanted to delete job number 679440, you would run:
Please be aware that you can only delete jobs that you own.
Delete all jobs of one user:
qselect -u $USER | xargs qdel
Delete all jobs running by one user:
qselect -u $USER -s R | xargs qdel
Delete all jobs queued jobs by one user:
qselect -u $USER -s Q | xargs qdel
Advanced Usage
There are number of additional things you can do with qsub that do a better job of taking advantage of the cluster.
To view qsub options please visit the online manual, or run the following:
man qstat
Requesting Additional Resources
Frequently, there is a need to use more than one processor or to specify some amount of memory. The qsub command has a -l flag that allows you to do just that.
Example: Requesting A Single Node with 8 Processors
Let's assume that the script we used above was multi-threaded and spins up 8 different processes to do work. If you wanted to ask for the processors required to do that, you would run the following:
qsub -l nodes=1:ppn=8 blast_AE004437.sh
This tells the system that your job needs 8 processors and it allocates them to you.
Example: Requesting 16GB of RAM for a Job
Using the same script as above, let's instead assume that this is just a monolitic process but we know that it will need about 16GB of RAM. Below is an example of how that is done:
qsub -l mem=16gb blast_AE004437.sh
Example: Requesting 2 Weeks of Walltime for a Job Using the same script as above, let's instead assume that it is going to run for close to 2 weeks. We know there are 7 days in a week and 24 hours in a day, so 2 weeks in hours would be (2 * 7 *24) 336 hours. Below is an example of requesting that a job can run for 336 hours.
qsub -l walltime=336:00:00 blast_AE004437.sh
Example: Requesting Specific Node(s) The following requests 8 CPU cores and 16GB of RAM on a high memory node for 220 hours:
qsub -q highmem -l nodes=1:ppn=8,mem=16gb,walltime=220:00:00 assembly.sh
Interactive Jobs
Sometimes, when testing, it is useful to run commands interactively instead of with a script. To do this you would run:
Just like scripts though, you may need additional resources. To solve this, specify resources, just like you would above:
GPU JobsGPU jobs are no longer supported under Torque/Maui. Please refer to the GPU Jobs section under Slurm. Testing the Queueing System
The following submits 4 jobs to the queue each running 20 seconds. Note: the '-o' and '-e' arguments are assigned '/dev/null', meaning the standard output/error stream files will be omitted.
for i in {1..4}; do echo "sleep 20" | qsub -q batch -l nodes=1:ppn=1,mem=4gb,walltime=02:00:00 -o /dev/null -e /dev/null; done;
Many tasks in Bioinformatics need to be parallelized to be efficient. One of the ways we address this is using array jobs. An array job executes the same script a number of times depending on what arguments are passed. To specify that an array should be used, you use the -t flag. For example, if you wanted a ten element array, you would pass -t 1-10 to qsub . You can also specify arbitrary numbers in the array. Assume for a second that the 3 and 5-7 jobs failed for some unknown reason in your last run, you can specify -t 3,5-7 and run just those array elements.
Below is an example that does the same thing that the basic example from above, except that it spreads the workload out into seven different processes. This technique is particularly useful when dealing with much larger datasets.
Prepare Dataset
The script below creates a working directory and builds out the usable dataset. The job is passed to qsub with no arguments.
#!/bin/bash
# Create a directory for us to do work in.
# We are using a special variable that is set by the cluster when a job runs.
mkdir blast_AE004437
# Change to that new directory
cd blast_AE004437
# Copy the proteome of Halobacterium spec.
cp /srv/projects/db/ex/AE004437.faa .
# Do some basic analysis
# The echo command prints info to our output file
echo "How many predicted proteins are there?"
grep '^>' AE004437.faa --count
echo "How many proteins contain the pattern \"WxHxxH\" or \"WxHxxHH\"?"
egrep 'W.H..H{1,2}' AE004437.faa
# Start preparing to do a blast run
# Use awk to grab a number of proteins and then put them in a file.
echo "Generating a set of IDs"
awk --posix -v RS='>' '/W.H..(H){1,2}/ { print ">" $0;}' AE004437.faa | grep '^>' | awk --posix -v FS='|' '{print $4;}' > my_IDs
# Make the proteome blastable
echo "Making a blastable database"
formatdb -i AE004437.faa -p T -o
Analyze the Dataset
The script below will do the actual analysis. Assuming the name is blast_AE004437-multi.sh , the command to submit it would be qsub -t 1-7 blast_AE004437-multi.sh .
#!/bin/bash
# Specify the number of array runs. This means we are going to specify -t 1-7
# when calling qsub.
NUM=7
# Change to that new directory
cd blast_AE004437
# Do some math based on the number of runs we are going to do to figure out how
# many lines, and which lines should be in this run.
LINES=`cat my_IDs | wc -l`
MULTIPLIER=$(( $LINES / $NUM ))
SUB=$(( $MULTIPLIER - 1 ))
END=$(( $PBS_ARRAYID * $MULTIPLIER ))
START=$(( $END - $SUB ))
# Grab the IDs that are going to be part of each blast run
awk "NR==$START,NR==$END" my_IDs > $PBS_ARRAYID.IDs
# Make blastable IDs
echo "Making a set of blastable IDs"
fastacmd -d AE004437.faa -i $PBS_ARRAYID.IDs > $PBS_ARRAYID.fasta
# Run blast
echo "Running blast"
blastall -p blastp -i $PBS_ARRAYID.fasta -d AE004437.faa -o $PBS_ARRAYID.blastp.out -e 1e-6 -v 10 -b 10
Specifying Queues
Queues provide access to additional resources or allow use of resources in different ways. To take advantage of the queues, you will need to specify the -q option with the queue name on the command line.
For example, if you would like to run a job that consumes 16GB of memory, you should submit this job to the highmem queue:
qsub -q highmem myJob.sh
Troubleshooting
If a job has not started, or is in a queued state for a long period of time, users should try the following.
Check which nodes have available resources (CPU, memory, walltime, etc..):
Check how many processors are immediately available per walltime window on the batch queue:
showbf -f batch
Check earliest start and completion times (should not be infinity):
Check status of job and display reason for failure (if applicable):
Data Storage
Biocluster users are able to check on their home and bigdata storage usage from the Biocluster Dashboard.
Home DirectoriesHome directories are where you start each session on biocluster and where your jobs start when running on the cluster. This is usually where you place the scripts and various things you are working on. This space is very limited. Please remember that the home storage space quota per user account is 20 GB.
Path |
/rhome/<username> |
User Availability |
All Users |
Node Availability |
All Nodes |
Quota Responsibility |
User | Big Data
Big data is an area where large amounts of storage can be made available to users. A lab purchases big data space separately from access to the cluster. This space is then made available to the lab via a shared directory and individual directories for each user.
Lab Shared Space
This directory can be accessed by the lab as a whole.
Path |
/bigdata/<labname>/shared |
User Availability |
Labs that have purchased space. |
Node Availability |
All Nodes |
Quota Responsibility |
Lab |
Individual User Space
This directory can be accessed by specific lab members.
Path |
/bigdata/<labname>/<username> |
User Availability |
Labs that have purchased space. |
Node Availability |
All Nodes |
Quota Responsibility |
Lab |
Non-Persistent Space
Frequently, there is a need to do things like, output a significant amount of intermediate data during a job, access a dataset from a faster medium than bigdata or the home directories or write out lock files. These types of things are well suited to the use of non-persistent spaces. Below are the filesystems available on biocluster.
RAM Space
This type of space takes away from physical memory but allows extremely fast access to the files located on it. When submitting a job you will need to factor in the space your job is using in RAM as well. For example, if you have a dataset that is 1G in size and use this space, it will take at least 1G of RAM.
Path |
/dev/shm |
User Availability |
All Users |
Node Availability |
All Nodes |
Quota Responsibility |
N/A |
Temporary Space
This is a standard space available on all Linux systems. Please be aware that it is limited to the amount of free disk space on the node you are running on.
Path |
/tmp |
User Availability |
All Users |
Node Availability |
All Nodes |
Quota Responsibility |
N/A |
SSD Backed Space
This space is much faster than the standard temporary space, but slower than using RAM based storage.
Path |
/scratch |
User Availability |
All Users |
Node Availability |
All Nodes |
Quota Responsibility |
N/A |
Usage and Quotas
To quickly check your usage and quota limits:
check_quota home check_quota bigdata
To get the usage of your current directory, run the following command:
du -sh .
To calculate the sizes of each separate sub directory, run: du -shc *
This may take some time to complete, please be patient. For more information on your home directory, please see the Orientation section in the Linux Basics manual.Sharing data with other users
It is useful to share data and results with other users on the cluster, and we encourage collaboration The easiest way to share a file is to place it in a location that both users can access. Then the second user can simply copy it to a location of their choice. However, this requires that the file permissions permit the second user to read the file.
Basic file permissions on Linux and other Unix like systems are composed of three groups: owner, group, and other. Each one of these represents the permissions for different groups of people: the user who owns the file, all the group members of the group owner, and everyone else, respectively Each group has 3 permissions: read, write, and execute, represented as r,w, and x. For example the following file is owned by the user 'jhayes' (with read, write, and execute), owned by the group 'operations' (with read and execute), and everyone else cannot access it.
jhayes@pigeon:~$ ls -l myFile
-rwxr-x--- 1 jhayes bioinfo 1.6K Nov 19 12:32 myFile
If you wanted to share this file with someone outside the 'operations' group, read permissions must be added to the file for 'other'.
Set Default Permissions
In Linux, it is possible to set the default file permission for new files. This is useful if you are collaborating on a project, or frequently share files and you do not want to be constantly adjusting permissions The command responsible for this is called 'umask'. You should first check what your default permissions currently are by running 'umask -S'.
jhayes@pigeon:~$ umask -S
To set your default permissions, simply run umask with the correct options. Please note, that this does not change permissions on any existing files, only new files created after you update the default permissions. For instance, if you wanted to set your default permissions to you having full control, your group being able to read and execute your files, and no one else to have access, you would run:
jhayes@pigeon:~$ umask u=rwx,g=rx,o=
It is also important to note that these settings only affect your current session. If you log out and log back in, these settings will be reset. To make your changes permanent you need to add them to your '.bashrc' file, which is a hidden file in your home directory (if you do not have a '.bashrc' file, you will need to create an empty file called '.bashrc' in your home directory). Adding umask to your .bashrc file is as simple as adding your umask command (such as 'umask u=rwx,g=rx,o=r') to the end of the file. Then simply log out and back in for the changes to take affect. You can double check that the settings have taken affect by running 'umask -S'.
Further Reading
Copying large folders to and from Biocluster
Rsync can:
- Copy (transfer) folders between different storage hardware
- Perform transfers over the network via SSH
- Compare large data sets (-n, --dry-run option)
- Resume interrupted transfers
To perform over-the-network transfers, it is always recommended that you run the rsync command from your local machine (laptop or workstation).
On your computer open the Terminal and run:
rsync -ai FOLDER_A/ biocluster.ucr.edu:FOLDER_A/
Or:
rsync -ai biocluster.ucr.edu:FOLDER_B/ FOLDER_B/
Rsync will use SSH and will ask you for your biocluster password as SSH or SCP does.
If your connection broke, rsync can pick up when it left from - simply run the same command again.
- Rsync does not exist on Windows. Only Mac and Linux support rsync natively.
- Always put the / after both folder names, e.g:
FOLDER_B/ Failing to do so will result in the nesting folders every time you try to resume. If you don't put / you will get a second folder_B inside folder_B FOLDER_B/ FOLDER_B/
- Rsync does not move but only copies.
man rsync
Copying large folders on Biocluster between Directories
Rsync does not move but only copies. You would need to delete once you confirm that everything has been transfered.
This is the rear case where you would run rsync on Biocluster and not on your computer (laptop or workstation). The format in this case is:
rsync -ai FOLDER_A/ X/FOLDER_A/
where X is a different folder (e.g. a Bigdata folder)
- Once the rsync command is done, run it again. The second run will be short and it just a check. If there was no output, nothing changed, it is safe to delete the original location.
Specifically, running rsync the second time will ensure that everything has been transferred correctly. The -i (--itemize-changes) option asks rsync to report (output) all the changes that occure on to the filesystem during the sync. No output = No changes = The folder has been transfered safely.
- All the bullets in the above section (Copying large folders to and from Biocluster) apply to this section
Copying large folders between Biocluster and other servers
This is a very rear case where you would run rsync on Biocluster and not on your computer (laptop or workstation). The format in this case is:
rsync -ai FOLDER_A/ sever2.xyz.edu:FOLDER_A/
where sever2.xyz.edu is a different server that accepts SSH connection.
- All the bullets in the above sections (Copying large folders to and from Biocluster) apply to this section
Automatic Backups
Biocluster has backups however it is still advantageous for users to periodically make copies of their critical data to separate storage device.
Please remember, Biocluster is a
production system for research computations with a very expensive
high-performance storage infrastructure. It is not a data archiving
system.
Home backups are on a daily schedule and kept for one month.
Bigdata backups are on a weekly schedule and kept for one month.
Home and bigdata backups are located under the following respective directories:
/rhome/.snapshots/ /bigdata/.snapshots/
The individual snapshot directories have names with numerical values in epoch time format. The higher the value the more recent the snapshot. To view the exact time of when each snapshot was taken execute the following commands:
mmlssnapshot home mmlssnapshot bigdata
Databases
Loading Databases
NCBI, PFAM, and Uniprot, do not need to be downloaded by users. They are installed as modules on Biocluster.
module load db-ncbi
module load db-pfam
module load db-uniprot
Specific database release numbers can be identified by the version label on the module:
module avail db-ncbi
----------------- /usr/local/Modules/3.2.9/modulefiles -----------------
db-ncbi/20140623(default)
In order to use the loaded database users can simply provide the corresponding environment variable (NCBI_DB, UNIPROT_DB, PFAM_DB, etc...) for the proper path in their executables.
Examples:
You should avoid using this old deprecated method, it may not work in the near future (old BLAST):
blastall -p blastp -i proteins.fasta -d $NCBI_DB/nr -o blastp.out
You can use this method if you require the old version of BLAST (old BLAST with legacy support):
BLASTBIN=`which legacy_blast.pl | xargs dirname`
legacy_blast.pl blastall -p blastp -i proteins.fasta -d $NCBI_DB/nr -o blast.out --path $BLASTBIN
This is the preferred/recommended method (BLAST+):
blastp -query proteins.fasta -db $NCBI_DB/nr -out proteins_blastp.txt
Usually, we store the most recent release and 2-3 previous releases of each database. This way time consuming projects can use the same database version throughout their lifetime without always updating to the latest releases.
Requests for additional databases should be sent to support@biocluster.ucr.edu
Parallelization Software
MPI Introduction
MPI stands for the Message Passing Interface. MPI is a standardized API
typically used for parallel and/or distributed computing. Biocluster has a custom compiled version of OpenMPI that allows users to run MPI jobs across multiple nodes. These types of jobs have the ability to take advantage of hundreds of CPU cores symultaniously, thus improving compute time.
Many implementations of MPI exists: Open MPI, OpenMP, FT-MPI, LA-MPI, LAM/MPI, PACX-MPI, Adaptive MPI, MPICH, MVAPICH
If you need to compile an MPI application then please email support@biocluster.ucr.edu for assistance.
NAMD Example
Here is how to run a NAMD2 as an MPI job on Biocluster:
- Log-in to Biocluster
- Create PBS script
#!/bin/bash -l
#PBS -N c3d_cr2_md
#PBS -q batch
#PBS -l nodes=32:ppn=1
#PBS -l mem=16gb
#PBS -l walltime=01:00:00
# Load needed modules
# You could also load frequently used modules from within your ~/.bashrc # module load toruqe # Should already be loaded # module load openmpi # Should already be loaded
module load namd
# Swtich to the working directory
cd $PBS_O_WORKDIR
# Run job utilizing all requested processors
# Please visit the namd site for usage details: http://www.ks.uiuc.edu/Research/namd/
mpirun --mca btl ^tcp namd2 run.conf &> run_bio.log
- Submit PBS script to PBS queuing system
qsub run_bio.sh
Monitoring the Load History
Several utilities are available for obtaining information about the history of the cluster load by users, labs and individual nodes.
- Maui can display CPU utilization history.
When this command is executed on a Biocluster head node, it will return an overview of the CPU hour history by users and labs (groups).
mdiag -f
- Cluster Load History (TBA)
There are future plans to develop more utilities under the Biocluster Dashboard. They include the following:
CPU hours used by each Lab
CPU hours used by each Compute Node
CPU hours used by each User
Table of Contents
Communicating with Other Users
Biocluster is a shared resource. Communicating with other users can help to schedule large computations.
Looking-Up Specific Users
A convenient overview of all users and their lab affiliations can be retrieved with the following command:
You can search for specific users by running:
MATCH1='rkaundal' # Searches by real name, and username, and email address and PI name MATCH2='jhayes' user_details.sh | grep -P "$MATCH1|$MATCH2"
Listing Users with Active Jobs on the Cluster
To get a list of usernames:
qstat | awk '{print $3}' | sort | uniq | grep "^[^-N]"
To get the list of real names:
grep <(user_details.sh | awk '{print $2,$3,$4}') -f <(qstat | awk '{print $3}' | \
sort | uniq | grep "^[^-N]") | awk '{print $1,$2}'
To get the list of emails:
grep <(user_details.sh | awk '{print $4,$5}') -f <(qstat | awk '{print $3}' | \
sort | uniq | grep "^[^-N]") | awk '{print $2}'
Simply create a symbolic link or move the files into your html directory when you want to share them.
For exmaple, log into Biocluster and run the following:
# Make new web project directory
mkdir www-project
# Create a default test file
echo '<h1>Hello!</h1>' > ./www-project/index.html
# Create shortcut/link for new web project in html directory
ln -s `pwd`/www-project ~/.html/
First create a password file and then create a new user:
touch ~/.html/.htpasswd
htpasswd ~/.html/.htpasswd newwebuser
This will prompt you to enter a password for the new user 'newwebuser'.
Create a new directory, or go to an existing directory, that you want to password protect:
mkdir ~/.html/locked_dir
cd ~/.html/locked_dir
You can choose a different directory name.
Then do the following:
echo "AuthName 'Please login'
AuthType Basic
AuthUserFile /rhome/$USER/.html/.htpasswd
require user newwebuser" > .htaccess
Be sure to replace 'username' with your actual user name.
List of Installed Software
Systems Software
Software from CentOS repositoryBiocluster Software (debian)
Software from module system
Version lookup:
module avail <software_name>
For example:
Biocluster Software (modules)
R libraries
Version Lookup (in R):
packageVersion(" any-package ")
packageVersion("GOstats")
The output should be similar to the follow:
Here is a list of all the available R libraries:
Python Packages List all installed Python packages and the corresponding versions: python-modules
|