Case Status
Log In

Wiki

Options

 
Basic Info
  • RSS Feed

Last modified on 22-08-2017 15:27 by User.

Basic Info

Guidelines

  • Always run your computations on the cluster nodes and not the frontend/login node (fe1).
  • Use symbolic links, instead of duplicating files.
  • Always delete temporary data, which you no longer need.
  • When installing / compiling software, remember to delete unpacked source distributions. Those usually add a huge amount of files which affects the performance of the storage disks.
  • Always prefer "fewer larger files" over "many smaller files".
  • Never put more than 10.000 files in a single folder. As an alternative, use prefix subdirectories (00/,01/,02/, aa/, ab/, ..)
  • Only use project folders for their specific purpose. If another project is getting started, then remember to get a new project folder.

 


How to change your password

Use the command:

[user@fe1]$ yppasswd

 

It will ask for the current password, the new password and the new password once more.

 

 

 


 

Folder structure and access restrictions

Home and project folders:

/home/{USER} EONstor, panansas or other network attached storage. The home folder of a less active user may be moved to slower storage allowing more recent projects to run on faster storage Only the user will have access to his own files.

A user can not change access rights to allow others to read their own files

Thus files must be shared through project folders
/project/{PROJECT} EONstor, panasas or other network attached storage. The project folder of a less active project may be moved to slower storage allowing more active projects to run on faster storage Only users assigned to a specific project is able to read and write files in the project folder.

Users creating new files must also explicitly allow the other project members access to the files

Faststorage (extremely fast):

/faststorage/home/{USER}

/faststorage/project/{PROJECT}

Distributed filesystem for I/O intensive workloads. 

Datafiles in active use should always be located on /faststorage

Scratch folder on a node:

/scratch/$PBS_JOBID

local node harddrive

/scratch/$PBS_JOBID will point to this folder. Jobs must use this space for temporary storage doing executions. The folder is local to every node and only accessible by the jobowner.

The /scratch/$PBS_JOBID folder and /tmp is automatically cleaned when a job finishes.

 

 


Check the available storage at a specific path

In unix the 'df' command will show the available diskspace. 'df -h' will show all mounted drives and the disk usage in human readable numbers.
To see the disk usage for a specific path

[user@fe1]$ df -h ./Genomes
Filesystem           1K-blocks      Used Available Use% Mounted on
gfs3:/home3/project/Genomes
                     52295460864 35363096384 16932364480  68% /project/Genomes

[user@fe1]$ df -h ./Genomes
Filesystem            Size  Used Avail Use% Mounted on
gfs3:/home3/project/Genomes
                       49T   33T   16T  68% /project/Genomes

 

If there is not enough space for new data, then contact the system administrators to either have the path moved or to create a new project folder.

 

 


Ingoing and outgoing access

The cluster must be accessed through SSH to the frontend node: login.genome.au.dk

We recommend the following applications for SSH:

It is possible to tunnel other transport protocols (such as HTTP, X and so on) through SSH. Examples on these pages will sometimes show how to create a SSH-tunnel using OpenSSH.

The login host login.genome.au.dk is only accessible from IPs within the AU network or IP's that appear on a white-list (list of accepted ip-addresses).

Direct outgoing connections from the cluster are blocked. To access ftp or http sites, there is a ftp/http proxy available. To configure wget and others to use the proxy, export these environment variables:

export http_proxy="http://in:3128" && export ftp_proxy="http://in:3128"

For security reasons, only basic http(port 80) and ftp(port 21) is available and it is not possible to upload/post data through the proxy. 

 


Accessing a desktop on GenomeDK

VNC is installed on the frontend node together with a full X-environment. On all compute nodes only X libraries are installed to allow for graphical applications to run and send their display to a vnc-hosted X environment running on the frontend node.

To initialise a new desktop. The first time, you will be asked to enter a password for connecting to VNC. Choose a good one!

[user@fe1]$ vncserver

New 'fe1:2 (user)' desktop is fe1:2

Starting applications specified in /home/user/.vnc/xstartup
Log file is /home/user/.vnc/fe1:2.log

 

Desktops will keep running forever until you kill them using

 

[user@fe1 ~]$ vncserver -kill fe1:2
Killing Xvnc process ID 11910

 

A list of running desktops can be required using the '-list' parameter

 

[user@fe1]$ vncserver -list
TigerVNC server sessions:

X DISPLAY #  PROCESS ID
:2    12496

 

For connecting to the vncserver from your local linux/windows machine, we recommend TigerVNC: http://sourceforge.net/apps/mediawiki/tigervnc/index.php?title=Welcome_to_TigerVNC (in ubuntu: vncviewer).

Mac users should use "Chicken of the VNC" http://sourceforge.net/projects/cotvnc/. For this example we use:

Host:fe1
Display:2
Tunnel over SSH: check
SSH host: login.genome.au.dk
 

 

[local machine]$ vncviewer -via login.genome.au.dk fe1:2

 


X-forwarding

If you don't like vnc we also support X-forwarding. Simply use ssh -X when connecting to one of the front ends. X-forwarding is automatically enabled for interactive jobs.


 

[user@fe1 ~]$ srun --mem=8g -c 4 --pty bash

srun: job 3595532 queued and waiting for resources

srun: job 3595532 has been allocated resources

[user@s02n43 ~]$ source /com/extra/RStudio/LATEST/load.sh 

 

[user@s02n43 ~]$ rstudio

 

 

This only works if you have a local X server running. If you are a linux user you probably already have one. For windows Xming should work and for Mac you can use XQuartz.

 


Using installed software

The list of pre-installed software is long. The following commands may be used to acquire a list of the current software available:

[user@fe1]$ sw-list
/com/extra/allpathslg/42760/load.sh
/com/extra/annovar/2012-07-12/load.sh
/com/extra/ATLAS/3.9.84/load.sh
/com/extra/bedtools/2.16.2/load.sh
...

 

[user@fe1]$ sw-whats-new
...
2012-08-20+13:44:13.9760788800  /com/extra/python/2.7/load.sh
2012-08-20+14:56:38.7150167540  /com/extra/picard/1.74/load.sh
2012-08-20+15:02:03.6090163810  /com/extra/testing/load.sh
2012-08-20+15:50:13.4830177420  /com/extra/allpathslg/42760/load.sh
2012-08-24+13:56:53.6850942710  /com/extra/igraph/0.6/load.sh

 

 

For every software package, files are installed in the folder /com/extra/name/version/. If a software package is dependent on other software packages they will have been copied to the folder. To use any of the available software, the software must be loaded using the "source" command. Example: Load tophat/2.0.4

 

[user@fe1]$ source /com/extra/tophat/2.0.4/load.sh

 

This command updates the path to search in /com/extra/tophat/2.0.4/bin for executable files. To see which files is available by tophat, just list the files in the bin folder:

 

[user@fe1]$ ls /com/extra/tophat/2.0.4/bin
bam2fastx  bed_to_juncs   contig_to_chr_coords  gtf_juncs     juncs_db             map2gtf     sam_juncs      sra_to_solid  tophat2             tophat_reports
bam_merge  closure_juncs  fix_map_ordering      gtf_to_fasta  long_spanning_reads  prep_reads  segment_juncs  tophat        tophat-fusion-post

 

 


Mounting GenomeDK folders on your local Mac

Go to https://osxfuse.github.io/
Download and install the latest version of Fuse for macOS and SSHFS. Unfortunately there is no nice GUI for SSHFS, thus we have to configure the mount-command with a long list of options.

Get your local uid and gid, using the id command (example):

 

user@local:~$ id
uid=1000(runef) gid=1000(runef) groups=1000(runef),4(adm),20(dialout),24(cdrom),46(plugdev),116(lpadmin),118(admin),124(sambashare),127(libvirtd)

 

Put the following into a shell script:

 

#!/bin/bash
# The extracted UID
uid=1000
# The extracted GID
gid=1000
# GenomeDK user account
uname=runef
mkdir /Volumes/GenomeDK
sshfs ${uname}@login.genome.au.dk:/home/${uname} /Volumes/GenomeDK -o idmap=none -o uid=${uid},gid=${gid} -o allow_other -o umask=077 -o follow_symlinks

 

To run the script and mount the GenomeDK folder

 

user@local:~$ sudo sh ./create-genomedk-mount.sh

 

To unmount the GenomeDK folder:

 

user@local:~$  sudo umount /Volumes/GenomeDK

 

 


Mounting GenomeDK folders on your local linux

First, you must install sshfs. On ubuntu "sudo apt-get install sshfs", otherwise download here: http://fuse.sourceforge.net/sshfs.html

? Create local mount point:

sudo mkdir /mnt/genomedk

 

Next, get your local uid and gid, using the id command (example):

 

runef@runef-birc:~$ id
uid=1000(runef) gid=1000(runef) groups=1000(runef),4(adm),20(dialout),24(cdrom),46(plugdev),116(lpadmin),118(admin),124(sambashare),127(libvirtd)

 

Extract the uid and gid, for the mount command:

 

{user} = GenomeDK user account
{uid} = The extracted uid 
{gid} = The extracted gid

sudo sshfs {user}@login.genome.au.dk:/home/{user} /mnt/genomedk -o idmap=none -o uid={uid},gid={gid} \
  -o allow_other -o umask=077 -o follow_symlinks

 

To unmount the GenomeDK folders:

 

sudo umount /mnt/genomedk

 

 


 

Setup SSH to allow password-less login to cluster nodes

This is done in three steps:

Create a ssh-keypair:

[user@fe1] $ ssh-keygen -t dsa

 

Accept all suggestions and specify an empty "passphrase" when prompted.

Authorize all nodes:

 

cd ~/.ssh ; cat id_dsa.pub >> authorized_keys2
(this looks like magic, just cut the line above and paste it into a terminal-window!) 

 

That is all.

 

 


Upload or download data using rsync

We recommend using rsync to transfer large files. This has the benefit, that it is possible to resume or update data-folders by only copying new or changed files. rsync is available of both mac and linux by default. The following examples, show different usages of rsync.

Uploading data:

[localhost] $ rsync -e ssh -avz /location/data user@login.genome.au.dk:data

 

Updating data and also deleting data no longer at the source directory

 

[localhost] $ rsync -e ssh -avz --delete /location/data user@login.genome.au.dk:data

 

Downloading data:

 

[localhost] $ rsync -e ssh -avz user@login.genome.au.dk:data /location/data