The Shark cluster has a default memory allocation of 4 Gigabyte per slot. This is to make sure that the execution nodes don't run out of memory, starts swapping and finally halts. If your job uses more than the default 4 Gigabyte of memory, your job will be killed immediately by the Open Grid Scheduler.
Set the amount of memory that my job uses
You can assign more memory to your job using the h_vmem option when you submit your job or in your qsub script.
My job uses less then 4Gigabyte of memory (default):
My job uses the parallel environment with 8 cores and needs a total of 64 Gigabyte of memory.
Then that means that you need 64G / 8 slots = 8G per slot.
Thus you will submit your job like this:
qsub -pe BWA 8 -l h_vmem=8G test_script.sh
My job uses less than 1Gigabyte of memory.
It would be good practice to tell the Open Grid Scheduler that you do not need the default 4Gigabyte. Leaving more resources for other users. Example:
Where: (notice that you can use Capital or small letters)
||K/k || =Kilobyte ||
||M/m || =Megabyte ||
||G/g || =Gigabyte ||
Exceeding the memory limit
If your job gets killed, you will often see in your job output a line similar to the one below:
/usr/local/R/R-2.15.1/lib64/R/bin/exec/R: error while loading shared libraries: libreadline.so.6: failed to map segment from shared object: Cannot allocate memory
In addition, your job will have the exit status 127. To retrieve the exit status of your job using qacct, please see the next section.
Segfaults and core dumped for some jobs
If you run programs that are quite large and depend on a lot of libraries from the system (dyn-libs) and the programs segfaults. Try to set the -l h_stack to a higher number. This error mostly occurs with applications that are using multithreading and working on large datasets (e.g. samtools, sambamba )
qsub -l h_stack=32M test_script.sh
By default, this setting is very low ( about 8MB max, and 2MB per thread ) (lookup with: ulimit -s)
Figure out how much memory you need
You can find out the amount of memory used for any finished job with the following command:
qacct -j <job id>
Look for the "maxvmem". This is the maximum amount of memory your job has used. Keep in mind that is the total for the complete parallel job. To get the memory per slot simply divide by the amount of slots.
It is important to keep in mind the following if you're using Java. The maximum amount of memory the Java virtual machine (JVM) is allowed to consume can be set using the -Xmx flag. Please be aware that the actual amount of memory consumed by the JVM (and hence your job) may exceed the limit set by -Xmx because of JVM overhead. For example, limiting the heap size to 4 gigabytes (-Xmx4g) may still mean that your job consumes an additional few hundred megabytes and therefore may result in your job being killed. The amount of memory consumed by the JVM overhead varies depending on the program. To be on the safe side it is generally a good idea to lower the max. heap size by a few hundred megabytes per slot (i.e. set the heap size to 3.7GB of memory times the number of slots).
Keep in mind that over-reserving memory causes unnecessary queues. Please reserve only considerate amounts. Also read the guidelines