README.md 14.2 KB
Newer Older
Khalid Kunji's avatar
Khalid Kunji committed
1
# Run GIGI Split and Merge Scripts 
2 3


Khalid Kunji's avatar
Khalid Kunji committed
4
### Runs GIGI with Multiple Threads by Splitting the Input and Merging the Output.  
5

Khalid Kunji's avatar
Khalid Kunji committed
6
#### Requirements
Khalid Kunji's avatar
Khalid Kunji committed
7 8
Bash 4.3 or newer to use the -q option (wait -n was added in 4.3).  The rest will work with older Bash versions but I am not sure how much older.  
If you need to compile the binaries, a C++ compiler.  Cmake if you want to use the included easy compilation script.  
9

10
#### Getting GIGI-Quick
Khalid Kunji's avatar
Khalid Kunji committed
11
##### With Git
Khalid Kunji's avatar
Khalid Kunji committed
12 13 14 15
Run the following command to clone the repository with git (git is a version management program started by Linus Torvalds https://git-scm.com/downloads)

```git clone https://cse-git.qcri.org/Imputation/GIGI-Quick.git```  

Khalid Kunji's avatar
Khalid Kunji committed
16
##### With a Browser
Khalid Kunji's avatar
Khalid Kunji committed
17
Go to this url: https://cse-git.qcri.org/Imputation/GIGI-Quick/tree/master
Khalid Kunji's avatar
Khalid Kunji committed
18

Khalid Kunji's avatar
Khalid Kunji committed
19
Click on the icon with the download arrow above the column "Last Update" on the right hand side. 
Khalid Kunji's avatar
Khalid Kunji committed
20
There are several download options with different compressions.  If you get run_GIGI this way, then you will need to decompress it before proceeding.
21

Khalid Kunji's avatar
Khalid Kunji committed
22
#### Installation
Khalid Kunji's avatar
Khalid Kunji committed
23 24 25
Once you have the files, most users won't need to do anything else to use GIGI. There are executables compiled on Ubuntu 64 bit Linux for 64 bit and 32 bit (via multilib) x86 systems. 
GIGI-Quick will automatically choose which of these to run. We recommend using these unless your system has a different architecture (e.g. PowerPC, ARM). When GIGI-Quick runs, if there 
are locally compiled versions of the binaries then GIGI-Quick will use those, it will check for them in the following locations: ./GIGI/GIGI, ./MERGE/gigimerge, ./SPLIT/gigisplit. 
Khalid Kunji's avatar
Khalid Kunji committed
26

Khalid Kunji's avatar
Khalid Kunji committed
27
##### Compiling
Khalid Kunji's avatar
Khalid Kunji committed
28

Khalid Kunji's avatar
Khalid Kunji committed
29 30
We use cmake to create make files for the architecture being compiled on, to use that method one will need a reasonably recent cmake installed. This approach should be compiler and 
architecture agnostic. To do this, one need only run the included make.sh script:  
Khalid Kunji's avatar
Khalid Kunji committed
31

Khalid Kunji's avatar
Khalid Kunji committed
32 33 34 35 36 37 38 39 40 41
``` ./make.sh ```

This should create the make file then compile all three binaries. It will write a log file in ./make.log. If the cmake method is not working on your system, you can compile directly 
with your compiler, we give an example with g++ from the gnu gcc:  

```cd ./SPLIT/```

```g++ -O2 GIGISplit.cpp -o gigisplit```  

```cd ../MERGE/```  
Khalid Kunji's avatar
Khalid Kunji committed
42

Ehsan Ullah's avatar
Ehsan Ullah committed
43
```g++ -O2 GIGIMerge.cpp -o gigimerge```  
Khalid Kunji's avatar
Khalid Kunji committed
44

Khalid Kunji's avatar
Khalid Kunji committed
45 46
```cd ../GIGI/src/GIGI_v1.06.1```  

Khalid Kunji's avatar
Khalid Kunji committed
47 48 49 50 51 52 53 54
```g++ -O2 GIGI.cpp -o ../../GIGI```  


#### Extra Integration

The folder structure of GIGI-Quick should not be separated, GIGI-Quick depends on relative paths to locate the scripts and executables included other than run_GIGI. 

##### As an Unprivledged User
Khalid Kunji's avatar
Khalid Kunji committed
55 56

If you like you can now add GIGI-Quick to your path, the examples assume that you have, you can do this by adding the following to your .bashrc (located in your home folder)
Khalid Kunji's avatar
Khalid Kunji committed
57 58 59 60 61 62

```export PATH=${PATH}:/path/to/folder/where/you/put/run_GIGI```

Then source your .bashrc to apply the changes right away

```source ~/.bashrc```
Khalid Kunji's avatar
Khalid Kunji committed
63

Khalid Kunji's avatar
Khalid Kunji committed
64 65
##### As a Root/Sudo User

Khalid Kunji's avatar
Khalid Kunji committed
66 67 68 69
To add run_GIGI to the path system-wide for all users you can create a symlink in /usr/bin pointing to the run_GIGI script: 

```ln -s /path/to/run_GIGI/script /usr/bin/run_GIGI```

Khalid Kunji's avatar
Khalid Kunji committed
70 71 72 73 74
#### Usage

Note: The parameter file is the same as you would use for GIGI normally, but if you are using the long format, then pass the "-l" option
      The examples in shown below use the file "param-v1_06.txt" because it is included in the repository and can be run by simply cutting and pasting the example line.  

Khalid Kunji's avatar
Khalid Kunji committed
75
run_GIGI parameter_file -o [OUTPUT FOLDER] -n [RUN NAME] -t [THREADS] -m [MEMORY IN MB] [-l] [-v] -q [THREADS] -r [START] [END] [-V] [-h]
Khalid Kunji's avatar
Khalid Kunji committed
76 77 78 79 80 81

-o [OUTPUT FOLDER] : This is the path to use for the outputs from the run_GIGI scripts, including temporary files.  
-n [RUN NAME]      : This is a path relative to the [OUTPUT FOLDER] to use to keep the outputs from more than one run of run_GIGI separated.  
-t [THREADS]       : The number of threads to use for run_GIGI, and also the number of chunks to split the input into.  
-m [MEMORY IN MB]  : The amount of RAM that run_GIGI will restrict its use to, not yet implemented  
-l                 : Specifies that the input is in the long format.  
Khalid Kunji's avatar
Khalid Kunji committed
82
-V                 : Verbose mode, output from run_GIGI is much quieter now, you can see much more of what it is doing and what variables are set to at various stages with -V.  
Khalid Kunji's avatar
Khalid Kunji committed
83 84 85 86 87 88 89
-v                 : Display the version of GIGI-Quick and exit.  
-h                 : Display this help text.  
-r [START] [END]   : Run on only a selected region, starting at start and ending at end, this region will be selected before any further splitting.  
-q [THREADS]       : Run in queued mode, this mode will run up to THREADS instances of GIGI at a time and will attempt to keep the total amount of memory being used less than 
                     [MEMORY IN MB] using an estimate of the amount of memory GIGI may need.  If -m [MEMORY IN MB] wasn't given, then it will use the amount of memory available 
                     as shown by 'free.'  For older kernels this isn't shown and we use an estimate that is no longer accurate for modern systems (amount free + amount of buff/cache).
                     https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34e431b0ae398fc54ea69ff85ec700722c9da773  Also, -t is ignored when -q is given.  
Khalid Kunji's avatar
Khalid Kunji committed
90
-e [MEMORY IN MB]  : Manual estimate of how much memory GIGI will need for queued mode in case the calculated estimate is too inaccurate
91

92
Examples: 
93
```bash
Ehsan Ullah's avatar
Ehsan Ullah committed
94 95
  ./run_GIGI ./INPUTS/Sample_Input/param-v1_06.txt    #Output in the current folder with no run name identifying subfolder, threads and memory determined automatically
  ./run_GIGI ./INPUTS/Sample_Input/param-v1_06.txt -o ./OUTPUTS -n test_run    #Output in ./OUTPUTS/test_run
96
  ./run_GIGI ./INPUTS/Sample_Input/param-v1_06.txt -o ./OUTPUTS -n test_run -V #Output in ./OUTPUTS/test_run, verbose mode (print more detailed information)
Ehsan Ullah's avatar
Ehsan Ullah committed
97 98 99 100 101
  ./run_GIGI ./INPUTS/Sample_Input/param-v1_06.txt -o ./OUTPUTS -n test_run -l    #Output in ./OUTPUTS/test_run for a parameter file in the long format, do not cut and paste this one because the included param-v1_06.txt is not in the long format  
  ./run_GIGI ./INPUTS/Sample_Input/param-v1_06.txt -t 2    #Limit to only 2 threads (and hence two chunks)
  ./run_GIGI ./INPUTS/Sample_Input/param-v1_06.txt -m 1000    #Limit memory use to 1 GB, please read the section on memory and cgroups
  ./run_GIGI ./INPUTS/Sample_Input/param-v1_06.txt -lmt 1000 2   #Limit memory use to 1 GB, please read the section on memory and cgroups, and threads to 2 with input in the long format, do not cut and paste this one because the included param-v1_06.txt is NOT in the long format  
  ./run_GIGI INPUTS/Sample_Input/param-v1_06.txt -o RUN_FOLDER/ -n test_run -m 20 -q 3 -V -r 3 70 #Output in ./RUN_FOLDER/test_run, limit memory to 20 MB, use the queued mode with up to 3 threads at a time, and run on only the region from 3 to 70, note: the memory estimated as needed in queued mode does not account for the restricted region
102
```
Khalid Kunji's avatar
Khalid Kunji committed
103 104

If there is a problem that makes GIGI stop before completion, then the output files are left as they are in order to allow users to rerun only failed portions as needed.  
105
If you are unsure where the failure occurred, then the safest approach will be to remove the intermediate files before rerunning (e.g. rm -R [OUTPUT FOLDER]/[RUN NAME]), always use rm with caution as always 
Khalid Kunji's avatar
Khalid Kunji committed
106 107
e.g. if the 2nd example failed, I would "rm -R ./OUTPUTS/test_run" before rerunning.  

Khalid Kunji's avatar
Khalid Kunji committed
108 109 110 111 112 113 114 115 116 117
The -n option is largely redundant, as it is equivalent to using the -o option with a longer path giving the subfolder, e.g.  

```./run_GIGI ./INPUTS/Sample_Input/param-v1_06.txt -o ./OUTPUTS -n test_run```  

is equivalent to:  

```./run_GIGI ./INPUTS/Sample_Input/param-v1_06.txt -o ./OUTPUTS/test_run```  

The inclusion of -n is mostly a semantic convenience.  

Khalid Kunji's avatar
Khalid Kunji committed
118 119 120 121
##### Logs

With the addition of -v and cleanup of output, you may notice that even with -v you don't see the output of split, gigi, and merge any longer.  These are now written to their own individual log files in the output directory/run subdirectory.  

Ehsan Ullah's avatar
Ehsan Ullah committed
122
e.g. ./run_GIGI ./INPUTS/Sample_Input/param-v1_06.txt -o ./OUTPUTS -n test_run will have logs in ./OUTPUTS/test_run/LOGS
Khalid Kunji's avatar
Khalid Kunji committed
123

Khalid Kunji's avatar
Khalid Kunji committed
124 125
#### Memory and cgroups

Khalid Kunji's avatar
Khalid Kunji committed
126
We handle memory restrictions using cgroups.  After looking at a number of different memory limiting mechanisms we saw this as the best solution, unfortunately it has some caveats.  One is that root/sudo access is required to create the initial cgroup.  If you are on a shared machine then we encourage you to discuss this with your system administrator if you intend to use the cgroups.  For most shared clusters, we encourage you to use the built in memory limiting mechanims of your submission system (e.g. qsub, SLURM, Torque) instead of limiting it through run_GIGI, most of these also themselves make use of cgroups (e.g. https://slurm.schedmd.com/cgroups.html and HTCondor http://help.uis.cam.ac.uk/supporting-research/research-support/camgrid/camgrid/technical3/cgroups).    
Khalid Kunji's avatar
Khalid Kunji committed
127 128
If you are using this on your own system where you have root/sudo access, then you will need to make sure that your cgroups are set up and that you have your equivalent of the libcgroup library installed for the cgcreate and cgexec commands for your distribution.  
If you have a very old (e.g. maybe 7+ years old) kernel, then you may need to install a newer kernel that has cgroups (they are part of the Linux kernel technically).  
Khalid Kunji's avatar
Khalid Kunji committed
129 130


Khalid Kunji's avatar
Khalid Kunji committed
131
Here is a list of common distributions and links to help/documentation on cgroups
Khalid Kunji's avatar
Khalid Kunji committed
132

Khalid Kunji's avatar
Khalid Kunji committed
133 134
Redhat: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/ch-Using_Control_Groups.html

Khalid Kunji's avatar
Khalid Kunji committed
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154
Arch: https://wiki.archlinux.org/index.php/cgroups you may note that libcgroup is an AUR package, to install such packages: https://wiki.archlinux.org/index.php/Arch_User_Repository

Debian/Ubuntu: https://www.devinhoward.ca/technology/2015/feb/implementing-cgroups-ubuntu-or-debian

Fedora: https://docs.fedoraproject.org/en-US/Fedora/17/html/Resource_Management_Guide/ch-Using_Control_Groups.html

Fedora: https://docs.fedoraproject.org/en-US/Fedora/15/html/Resource_Management_Guide/sec-Creating_Cgroups.html

OpenSuSE: https://www.suse.com/documentation/opensuse114/book_tuning/data/sec_tuning_cgroups_usage.html


Once you have a functional cgcreate command to create cgroups, you can make them permanent (unfortunately in different syntax) by editing /etc/cgconfig.conf on Linux distributions using systemd (most of them).  

Ubuntu: https://askubuntu.com/questions/836469/install-cgconfig-in-ubuntu-16-04

Already covered in many of the other links above

If your distro isn't covered, it is still worth looking at the above guides, most things will be similar in your distro though they may not be exactly the same (e.g. package names could be different, package manager, etc..).  


Khalid Kunji's avatar
Khalid Kunji committed
155
Here is some distribution agnostic information on cgroups: http://man7.org/linux/man-pages/man7/cgroups.7.html
Khalid Kunji's avatar
Khalid Kunji committed
156

Khalid Kunji's avatar
Khalid Kunji committed
157 158
https://www.kernel.org/doc/Documentation/cgroup-v1/

Khalid Kunji's avatar
Khalid Kunji committed
159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193
cgroups will eventually be replaced with cgroups2, but most of their controllers are not yet functional: https://www.kernel.org/doc/Documentation/cgroup-v2.txt

Technically you can create the cgroup/s we need with mount and mkdir commands, but we ourselves depend on cgcreate and cgexec in code, of course you could create cgcreate and cgexec scripts and add them to your path instead of using the programs in cgroup-tools.  We wouldn't recommend that route though.  


##### Example
Essentially, the goal here is to get a user writable cgroup setup that run_GIGI (running as your user) can make use of to create its own subcgroup. 

On Ubuntu in BASH you can do this as follows: 

First we install cgroup-tools to get cgcreate and cgexec, etc...
```bash
sudo apt-get install cgroup-tools
```
Then we create a cgroup that your user has access to
```bash
sudo cgcreate -a ${USER} -g memory,cpu:user_cgroup
```
We can see that it was create by checking the contents of /sys/fs/cgroup/memory and/or /sys/fs/cgroup/cpu

They should both now have a folder user_cgroup that your user has write permissions to the contents of

```bash
ls -la /sys/fs/cgroup/memory/user_cgroup

ls -la /sys/fs/cgroup/cpu/user_cgroup
```

When run as your user normally with -m, run_GIGI will make its own subcgroup of this cgroup (do not run run_GIGI with sudo)
These are not persistent cgroups (that is, they will disappear on reboot).  
To make persistent ones, please see the distribution documentation above, for most this involves editing a configuration file /etc/cgconfig.conf


##### Future
We may soon also add the ability to control swap usage through the cgroups for run_GIGI, (https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-memory.html) some distributions need a kernel parameter set at boot to allow this (Debian, Ubuntu, Arch, ??).  
Khalid Kunji's avatar
Khalid Kunji committed
194 195
See the issue and solution here: http://matthewkwilliams.com/index.php/2016/03/17/docker-cgroups-memory-constraints-and-java-cautionary-tale/
The same method in a shorter read here: https://unix.stackexchange.com/questions/147158/how-to-enable-swap-accounting-for-memory-cgroup-in-archlinux
Khalid Kunji's avatar
Khalid Kunji committed
196 197
If you have a different bootloader, adding that same option to your boot command should work but you'll need to consult the documentation for your bootloader to see how to do this.  

Khalid Kunji's avatar
Khalid Kunji committed
198

Khalid Kunji's avatar
Khalid Kunji committed
199
##### Caveat and Solution with GRUB/Other Bootloader
Khalid Kunji's avatar
Khalid Kunji committed
200 201 202 203 204 205
Be careful when editing this boot line, mistakes may cause your machine to fail to boot linux.  This will not harm your data but you may need to manually fix or reinstall your bootloader.  
Useful resource for that situation: https://help.ubuntu.com/community/Grub2/Troubleshooting#Editing_the_GRUB_2_Menu_During_Boot
One could also edit during boot like that to test the line without making it permenent.  Thereby avoiding any more serious GRUB issues than a single failed boot.  
If you have messed up your GRUB and can't figure out how to get it back, the most reliable method I have used to reliably get back GRUB is to reinstall via chroot: https://help.ubuntu.com/community/Grub2/Installing#via_ChRoot


Khalid Kunji's avatar
Khalid Kunji committed
206 207 208 209
Going forward, part of this may become easier for Ubuntu users.  From 14.04 and onwards there should be a user writable cgrouup by default.  It is created by systemd automatically and I'm not sure how consistent the location is, https://help.ubuntu.com/lts/serverguide/cgroups-delegation.html

I think the best way to make use of this may be through cgmanager,  we will explore this possibility.