Just a heads up, we don't have a huge amount of space on this machine, ~750 GB for the git repos. We can include some data in the projects, but really big datasets will need to remain elsewhere. For anyone new to Git, this is a fairly good place to start: http://gitref.org/index.html Documentation for Gitlab is available here: http://doc.gitlab.com/

Commit db91a2a5 authored by Khalid Kunji's avatar Khalid Kunji

Initial commit, split, run, and merge GIGI for using multiple threads through data parallelism

parents
OLD/*
RUN_FOLDER/*
File added
input pedigree size 189
input pedigree record names 3 integer 2
input pedigree record father mother
******
203_62 0 0 1 0
203_112 0 0 2 0
203_126 0 0 2 0
203_66 0 0 1 0
203_39 0 0 1 0
203_105 0 0 2 0
203_48 0 0 1 0
203_35 0 0 1 0
203_103 0 0 2 0
203_123 0 0 2 0
203_45 0 0 1 0
203_38 0 0 1 0
203_32 0 0 1 0
203_118 0 0 2 0
203_73 203_38 203_105 1 0
203_116 0 0 2 0
203_63 0 0 1 0
203_100 0 0 2 0
203_120 0 0 2 0
203_107 0 0 2 0
203_113 0 0 2 0
203_110 0 0 2 0
203_55 0 0 1 0
203_19 0 0 2 0
203_137 203_62 203_120 2 0
203_49 0 0 1 0
203_68 0 0 1 0
203_60 0 0 1 0
203_65 0 0 1 0
203_142 203_39 203_110 2 0
203_114 0 0 2 0
203_51 0 0 1 0
203_18 0 0 2 0
203_50 0 0 1 0
203_67 0 0 1 0
203_47 0 0 1 0
203_81 203_39 203_110 1 0
203_8 0 0 1 0
203_54 0 0 1 0
203_11 0 0 1 0
203_98 0 0 2 0
203_106 0 0 2 0
203_53 0 0 1 0
203_57 0 0 1 0
203_43 0 0 1 0
203_29 0 0 1 0
203_145 203_73 203_142 2 0
203_40 0 0 1 0
203_121 0 0 2 0
203_46 0 0 1 0
203_58 0 0 1 0
203_138 203_62 203_120 2 0
203_10 0 0 1 0
203_41 0 0 1 0
203_56 0 0 1 0
203_109 0 0 2 0
203_34 0 0 1 0
203_64 0 0 1 0
203_52 0 0 1 0
203_143 203_39 203_110 2 0
203_30 0 0 1 0
203_9 0 0 1 0
203_156 203_45 203_106 2 0
203_101 0 0 2 0
203_75 203_58 203_123 1 0
203_140 203_54 203_107 2 0
203_33 0 0 1 0
203_59 0 0 1 0
203_117 0 0 2 0
203_136 203_59 203_100 2 0
203_157 203_45 203_106 2 0
203_76 203_51 203_116 1 0
203_77 203_51 203_116 1 0
203_37 0 0 1 0
203_79 203_49 203_113 1 0
203_74 203_75 203_136 1 0
203_78 203_59 203_100 1 0
203_144 203_33 203_137 2 0
203_87 203_32 203_98 1 0
203_134 203_51 203_116 2 0
203_124 0 0 2 0
203_125 0 0 2 0
203_99 0 0 2 0
203_42 0 0 1 0
203_159 203_47 203_121 2 0
203_141 203_49 203_113 2 0
203_3 203_10 203_144 2 0
203_31 0 0 1 0
203_153 203_46 203_125 2 0
203_36 0 0 1 0
203_108 0 0 2 0
203_155 203_32 203_98 2 0
203_158 203_47 203_121 2 0
203_104 0 0 2 0
203_122 0 0 2 0
203_85 203_31 203_99 1 0
203_111 0 0 2 0
203_165 203_87 203_156 2 0
203_84 203_31 203_99 1 0
203_17 203_41 203_111 1 0
203_154 203_31 203_99 2 0
203_150 203_35 203_124 2 0
203_168 203_68 203_145 1 0
203_161 203_34 203_109 2 0
203_16 203_43 203_122 1 0
203_163 203_43 203_122 2 0
203_7 0 0 1 0
203_188 203_67 203_165 2 0
203_147 203_17 203_134 2 0
203_22 203_84 203_103 2 0
203_176 203_9 203_147 2 0
203_15 203_81 203_158 1 0
203_160 203_30 203_155 2 0
203_91 203_78 203_108 1 0
203_128 203_79 203_154 2 0
203_133 203_52 203_123 2 0
203_26 203_37 203_153 2 0
203_130 203_54 203_117 2 0
203_86 203_42 203_159 1 0
203_119 0 0 2 0
203_61 0 0 1 0
203_149 203_53 203_130 2 0
203_96 203_34 203_109 1 0
203_162 203_43 203_122 2 0
203_23 203_17 203_134 2 0
203_179 203_65 203_128 2 0
203_132 203_66 203_163 2 0
203_44 0 0 1 0
203_148 203_53 203_130 2 0
203_164 203_41 203_111 2 0
203_80 203_49 203_113 1 0
203_90 203_47 203_121 1 0
203_69 203_61 203_101 1 0
203_14 203_80 203_119 1 0
203_72 203_57 203_150 1 0
203_4 203_86 203_26 2 0
203_175 203_9 203_147 2 0
203_13 203_44 203_157 1 0
203_171 203_57 203_150 2 0
203_82 203_77 203_133 1 0
203_187 203_64 203_132 2 0
203_25 203_85 203_141 2 0
203_129 203_40 203_164 2 0
203_89 203_47 203_121 1 0
203_20 203_16 203_140 2 0
203_102 0 0 2 0
203_92 203_96 203_138 1 0
203_6 203_74 203_20 2 0
203_189 203_56 203_148 2 0
203_97 203_69 203_143 1 0
203_71 203_40 203_164 1 0
203_93 203_29 203_102 1 0
203_24 203_35 203_124 2 0
203_172 203_13 203_24 2 0
203_177 203_97 203_22 2 0
203_2 203_11 203_160 2 0
203_178 203_92 203_19 2 0
203_186 203_14 203_23 2 0
203_115 0 0 2 0
203_70 203_90 203_112 1 0
203_131 203_71 203_161 2 0
203_21 203_93 203_105 2 0
203_152 203_46 203_125 2 0
203_166 203_70 203_18 1 0
203_95 203_55 203_152 1 0
203_167 203_48 203_131 1 0
203_181 203_70 203_18 2 0
203_151 203_50 203_115 2 0
203_94 203_29 203_102 1 0
203_12 203_13 203_24 1 0
203_27 203_74 203_20 2 0
203_5 203_82 203_129 2 0
203_185 203_36 203_21 2 0
203_135 203_76 203_151 2 0
203_28 203_94 203_104 2 0
203_88 203_60 203_162 1 0
203_127 203_61 203_101 2 0
203_174 203_89 203_118 2 0
203_173 203_15 203_28 2 0
203_83 203_50 203_115 1 0
203_170 203_88 203_126 2 0
203_169 203_7 203_135 2 0
203_180 203_12 203_25 2 0
203_139 203_63 203_127 2 0
203_183 203_91 203_139 2 0
203_182 203_72 203_27 2 0
203_184 203_95 203_149 2 0
203_146 203_83 203_114 2 0
203_1 203_8 203_146 2 0
PED.oped
chr22.mi
1000
sparse.map22_ERF203
dense.map22
dense.geno
dense.afreq22
2 0.8 0.9
This diff is collapsed.
0.106117
0.935098
1.374910
2.223488
2.751720
3.327371
4.012398
4.698955
5.595782
6.582772
7.285583
7.772545
8.428097
8.957383
10.096150
10.689914
11.521358
12.427468
13.009781
14.031986
14.883067
15.295975
16.250897
17.228832
19.600915
20.012782
20.727488
21.301413
22.092832
23.094871
23.692355
24.237406
24.752107
25.525363
26.215353
26.878636
27.571972
28.228297
29.021260
29.761794
30.516414
31.040493
31.780455
32.316632
32.895796
33.821134
34.607880
35.544922
36.352621
36.872023
37.382162
37.840234
38.389429
38.898678
39.439202
40.347187
41.232283
41.817546
42.381777
43.258947
43.842500
44.616595
45.189721
45.740716
46.199931
46.788105
47.614780
48.264423
48.797115
49.693612
50.238740
50.791114
51.357941
51.905098
52.653642
53.508378
54.164907
55.030176
55.580688
56.265171
57.185881
57.785495
58.651843
59.199892
59.728724
60.338443
60.939911
61.503165
62.106427
62.508852
63.227182
63.840581
64.572788
65.358582
65.957498
66.516165
67.214207
68.137511
68.549775
69.429172
69.994877
70.872549
71.565874
72.164028
72.746686
73.369733
74.002051
74.753092
75.589590
76.185654
76.828762
77.329599
78.130380
79.338008
79.966668
Author: Charles Y K Cheung
new changes in v1.06.1:
Because MORGAN's gl_auto version 3.2 uses a new output Inheritance Vectors file format, we have made changes so GIGI is now compatible with this file format.
The default behavior of GIGI v1.06 is to use this new Inheritance Vectors file format.
In addition, because this new file format no longer requires us to provide GIGI with the meiosis indexes that we used to need from the
console output of gl_auto, GIGI can now directly use the MORGAN pedigree file instead. Therefore, for convenience, users
can either use the pedigree file or the pedigree meiosis file that users have to parse from the console output of gl_auto.
That said, we are aware of the importance of backward compatibility. If you intend to use gl_auto's output from the pre v3.2
and the pedigree meiosis file, you may continue to do so. GIGI will check which version of IV file you are using.
Note: User is required to use the pedigree meiosis file instead of the pedigree file if the old IV format is used.
new change in v1.05:
In version 1.05, I fixed the bug to account for the condition associated with inbred pedigree: if the IV
infers that an inbred individual gets a pair of the same FGL and if the
observed genotype for this individual is heterozygous, this IV
must be inconsistent with the observed data.
Thanks to Dr. Jae-Hoon Sul for identifying this bug!
new changes in v1.04:
Important new function:
-Now GIGI can read dense markers in long format (rows are markers and columns are individuals, similar to the BEAGLE's genotype file format - except there is no "I" column here.) (See the documentation of the specification).
This change allows GIGI to handle very, very dense files in memory efficient manner.
see: example/param_longFormat.txt and "dense.genotypes.t"
- I converted the original example dense marker file from the old format (rows are individuals) to the long format using the script in the utilities diretory: convertGenotypesfromWideToLongFormat.R
- to tell GIGI that the dense genotype file is in the long format, use the -long flag : see documentation.
New changes in v1.03:
Bug fix:
- max ped size was limited to 160... now the number is changed to 5000.
- in the check that that the provided allele frequencies of each marker sums up to 1, if(sumAF==1) is replaced by if( (sumAF-1) > 0.0000001)
- if a line in the allele frequency only has 1 allelic type (monomorphic marker), added a dummy allelic type with frequency 0 to prevent the program from breaking.
- in main(), close the input streams before deallocating some of the variables to ensures output files get written first.
Other change:
- the call method in the example folder "param.txt" is now set to confidence-based calling (t1=0.8, t2=0.9) instead of the most likely genotype. See manuscript.
Rationale: This change is to remind users that calls based on the most likely genotype may not be accurate.
For example, if a parent has a rare allele, GIGI will correctly assign a 50% chance that the child has the rare allele IN THE SITUATION when we cannot figure out which chromosome is transmitted.
If we use the most likely genotype call method, it will make a call for each genotype despite potential high uncertainty in genotype configuration.
Since calls made using the most likely genotypes may be dangerous to use, we change the default call method to confidence-based calling.
Analyses that account for the uncertainty in the imputed results may be more appropriate.
eg. use the imputed probabilities directly or use a summary of imputed probabilities such as dosage.
- A dosage file is generated if all markers to be imputed are di-allelic markers.
Here, dosage is defined as the expected percent of 1 alleles in a genotype:
dosage of a genotype = 1*P(genotype is 1/1) + 0.5*P(genotype is 1/2)
- a binary GIGI file is included in the main uncompressed directory.
New changes in v1.02:
- warn user in the case when the Inheritance Vector file is empty.
e.g. in trios
rationale: Since we cannot infer recombination in trios, gl_auto generates an empty inheritance output.
This is normal and is correctly stated in the pedigree meiosis file. GIGI will still run, but
GIGI will impute only based on the pedigree structure and minor allele frequencies.
Hence, Linkage Disequilibrium-based method can potentially be more powerful than GIGI for Trios.
- include the perl script extractPedMeiosis.pl in the program to extract the pedigree meiosis file from gl_auto's output.
- expand the FAQ section in the documentation file
New changes in v1.01
- make new example files
- improve the user interface
:in main()
:summarize relevant information about each input file after reading
:print progress
- convert to a new format of parameter file
:fewer lines
:in the code: add readImputeParameterFile_GIGI_v1_01()
- implement some error checking routines on input files
- add license
- modify the documentation file
- bug fixes:
:call method #1 now works again
:fix callThreshold_multiAllelic()
:the bug is in the if else statement of method==2. We want the if (method 1), else if (method 2), else ... instead
- add various flags - see documentation file
- add license
Code changes:
readDenseMarkers_byComponent(): check that the number of columns are correct
readMarkerPos_v2(): ensure positions are in ascending order
readAF() has include new changes - ensure each row sums to 1; deallocate variable at the end. shorten the function because it duplicates what is done in readAllelicTypeCount()
readAllelicTypeCount(): deallocate variable at the end
This diff is collapsed.
GNU LESSER GENERAL PUBLIC LICENSE
Version 3, 29 June 2007
Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
This version of the GNU Lesser General Public License incorporates
the terms and conditions of version 3 of the GNU General Public
License, supplemented by the additional permissions listed below.
0. Additional Definitions.
As used herein, "this License" refers to version 3 of the GNU Lesser
General Public License, and the "GNU GPL" refers to version 3 of the GNU
General Public License.
"The Library" refers to a covered work governed by this License,
other than an Application or a Combined Work as defined below.
An "Application" is any work that makes use of an interface provided
by the Library, but which is not otherwise based on the Library.
Defining a subclass of a class defined by the Library is deemed a mode
of using an interface provided by the Library.
A "Combined Work" is a work produced by combining or linking an
Application with the Library. The particular version of the Library
with which the Combined Work was made is also called the "Linked
Version".
The "Minimal Corresponding Source" for a Combined Work means the
Corresponding Source for the Combined Work, excluding any source code
for portions of the Combined Work that, considered in isolation, are
based on the Application, and not on the Linked Version.
The "Corresponding Application Code" for a Combined Work means the
object code and/or source code for the Application, including any data
and utility programs needed for reproducing the Combined Work from the
Application, but excluding the System Libraries of the Combined Work.
1. Exception to Section 3 of the GNU GPL.
You may convey a covered work under sections 3 and 4 of this License
without being bound by section 3 of the GNU GPL.
2. Conveying Modified Versions.
If you modify a copy of the Library, and, in your modifications, a
facility refers to a function or data to be supplied by an Application
that uses the facility (other than as an argument passed when the
facility is invoked), then you may convey a copy of the modified
version:
a) under this License, provided that you make a good faith effort to
ensure that, in the event an Application does not supply the
function or data, the facility still operates, and performs
whatever part of its purpose remains meaningful, or
b) under the GNU GPL, with none of the additional permissions of
this License applicable to that copy.
3. Object Code Incorporating Material from Library Header Files.
The object code form of an Application may incorporate material from
a header file that is part of the Library. You may convey such object
code under terms of your choice, provided that, if the incorporated
material is not limited to numerical parameters, data structure
layouts and accessors, or small macros, inline functions and templates
(ten or fewer lines in length), you do both of the following:
a) Give prominent notice with each copy of the object code that the
Library is used in it and that the Library and its use are
covered by this License.
b) Accompany the object code with a copy of the GNU GPL and this license
document.
4. Combined Works.
You may convey a Combined Work under terms of your choice that,
taken together, effectively do not restrict modification of the
portions of the Library contained in the Combined Work and reverse
engineering for debugging such modifications, if you also do each of
the following:
a) Give prominent notice with each copy of the Combined Work that
the Library is used in it and that the Library and its use are
covered by this License.
b) Accompany the Combined Work with a copy of the GNU GPL and this license
document.
c) For a Combined Work that displays copyright notices during
execution, include the copyright notice for the Library among
these notices, as well as a reference directing the user to the
copies of the GNU GPL and this license document.
d) Do one of the following:
0) Convey the Minimal Corresponding Source under the terms of this
License, and the Corresponding Application Code in a form
suitable for, and under terms that permit, the user to
recombine or relink the Application with a modified version of
the Linked Version to produce a modified Combined Work, in the
manner specified by section 6 of the GNU GPL for conveying
Corresponding Source.
1) Use a suitable shared library mechanism for linking with the
Library. A suitable mechanism is one that (a) uses at run time
a copy of the Library already present on the user's computer
system, and (b) will operate properly with a modified version
of the Library that is interface-compatible with the Linked
Version.
e) Provide Installation Information, but only if you would otherwise
be required to provide such information under section 6 of the
GNU GPL, and only to the extent that such information is
necessary to install and execute a modified version of the
Combined Work produced by recombining or relinking the
Application with a modified version of the Linked Version. (If
you use option 4d0, the Installation Information must accompany
the Minimal Corresponding Source and Corresponding Application
Code. If you use option 4d1, you must provide the Installation
Information in the manner specified by section 6 of the GNU GPL
for conveying Corresponding Source.)
5. Combined Libraries.
You may place library facilities that are a work based on the
Library side by side in a single library together with other library
facilities that are not Applications and are not covered by this
License, and convey such a combined library under terms of your
choice, if you do both of the following:
a) Accompany the combined library with a copy of the same work based
on the Library, uncombined with any other library facilities,
conveyed under the terms of this License.
b) Give prominent notice with the combined library that part of it
is a work based on the Library, and explaining where to find the
accompanying uncombined form of the same work.
6. Revised Versions of the GNU Lesser General Public License.
The Free Software Foundation may publish revised and/or new versions
of the GNU Lesser General Public License from time to time. Such new
versions will be similar in spirit to the present version, but may
differ in detail to address new problems or concerns.
Each version is given a distinguishing version number. If the
Library as you received it specifies that a certain numbered version
of the GNU Lesser General Public License "or any later version"
applies to it, you have the option of following the terms and
conditions either of that published version or of any later version
published by the Free Software Foundation. If the Library as you
received it does not specify a version number of the GNU Lesser
General Public License, you may choose any version of the GNU Lesser
General Public License ever published by the Free Software Foundation.
If the Library as you received it specifies that a proxy can decide
whether future versions of the GNU Lesser General Public License shall
apply, that proxy's public statement of acceptance of any version is
permanent authorization for you to choose that version for the
Library.
GIGI: GIGI
GIGI_v1: GIGI.cpp
g++ GIGI.cpp -o GIGI
all: example testWagner
@ echo "Use 'make test' to test output and speed of MersenneTwister.h"
@ echo "Or run the example program with './example'"
example: example.cpp MersenneTwister.h
g++ -Wall -ansi -o example example.cpp
testWagner: testWagner.cpp MersenneTwister.h
g++ -O3 -Wall -ansi -o testWagner testWagner.cpp
testOrig: testOrig.c mt19937ar.c
gcc -O3 -o testOrig testOrig.c
testCokus: testCokus.c mt19937ar-cok.c
gcc -O3 -o testCokus testCokus.c
testHinsch: testHinsch.cpp mtrand.h mtrand.cc
g++ -O3 -o testHinsch testHinsch.cpp mtrand.cc
testStd: testStd.c
gcc -O3 -o testStd testStd.c
test: testWagner testOrig testCokus testHinsch testStd
@ echo "Testing output and speed of random number generators, please be patient..."
./testWagner > testWagner.out
./testOrig > testOrig.out
./testCokus > testCokus.out
./testHinsch > testHinsch.out
./testStd > testStd.out
./testResults.sh
@ rm -f tmp*
clean:
@ rm -f test*.out bug.out
@ rm -f example testWagner testOrig testCokus testHinsch testStd
@ rm -f tmp*
@ rm -f state.data
@ rm -f core
This diff is collapsed.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta name="Author" content="Richard Joseph Wagner">
<meta name="Description" content="C++ class implementing the Mersenne Twister random number generator">
<meta name="Keywords" content="Mersenne,Twister,MT,random,number,generator,RNG,pseudorandom,PRNG,C++,class,MTRand">
<link href="main.css" rel="stylesheet" type="text/css">
<style type="text/css">
body { background: url("AntBlueMaize.jpg"); }
tr { text-indent: 2em }
</style>
<title>Mersenne Twister Random Number Generator</title>
</head>
<body>
<h1>Mersenne Twister Random Number Generator</h1>
The Mersenne Twister is an algorithm for generating random numbers.
It was designed with consideration of the flaws in various other generators.
The period, 2^19937-1, and the order of equidistribution, 623 dimensions, are far greater.
The generator is also fast; it avoids multiplication and division, and it benefits from caches and pipelines.
See the
<a href="http://www.math.keio.ac.jp/~matumoto/emt.html">inventors' page</a>
for more details.
<p>I have implemented the Mersenne Twister in a C++ class that is fast, convenient, portable, and free.
Take a look at the
<a href="MersenneTwister.h">class</a>
or download the complete package in
<a href="Mersenne-1.0.zip">zip</a>
or
<a href="Mersenne-1.0.tar.gz">tarball</a>
format.
<p>Features:
<ul>
<li>Simple creation of generator with <code>MTRand r;</code>
<li>Convenient access with <code>double a = r();</code>
<li>Generation of integers or floating-point numbers
<li>Easy seeding options
<ul>
<li>Automatically from <code>/dev/urandom</code> or <code>time()</code> and <code>clock()</code>
<li>Single integer
<li>Arrays of any length (to access full 19937-bit range)
</ul>
<li>Ability to save and restore state
<li>Thorough example program
<li>Validation and performance tests
<li>Open source code under BSD license
</ul>
<p>On my system, a Pentium III running Linux at 500 MHz, the performance test gives the following results for generation of random integers:
<table>
<tr><td>MersenneTwister.h</td><td>28.4 million per second</td></tr>
<tr><td>Inventors' C version</td><td>14.3 million per second</td></tr>
<tr><td>Cokus's optimized C version</td><td>16.6 million per second</td></tr>
<tr><td>Standard rand()</td><td>6.8 million per second</td></tr>
</table>
<p>The latest version, v1.0, incorporates several changes released by the Mersenne Twister inventors on 26 January 2002. The seeding algorithm was revised to correct a minor problem in which the highest bit of the seed was not well represented in the generator state. The ability to start with large seeds was extended to seed arrays of arbitrary length. Access was added for 53-bit real numbers in [0,1), matching the accuracy of IEEE doubles. Also, the software license was changed from the GNU Lesser General Public License to a BSD license, making commercial use of this software more convenient.
<p>The v1.0 release includes some other improvements as well. By popular demand, access was added for real numbers from normal (Gaussian) distributions. Safeguards were added to prevent out-of-range number generation on 64-bit machines. Finally, new optimizations yield 25% faster generation overall and 100% faster generation for integers in [0,n].
<!-- counter included only in online version -->
<p><table align=center><tr>
<td><a href="http://www-personal.engin.umich.edu/~wagnerr/index.html">
<img class="nav" src="ArrowHome.gif" alt="^ home" height=16 width=16 border=0>
</a></td>
<td><span class="center"><address>
Rick Wagner (
<a href="mailto:rjwagner@writeme.com">rjwagner@writeme.com</a>
) 15 May 03
</address></span></td>
</table>
</body>
</html>
README for Mersenne Twister distribution
Richard J. Wagner v1.0 15 May 2003
Instructions
------------
The only necessary file for using this Mersenne Twister random number
generator is "MersenneTwister.h". The class name is MTRand.
Usage examples are in "example.cpp". Linux or Unix users can type "make"