1.) Make sure permissions are correct (aka user running the script has access to $project_folder directory)
2.) Make a $project_folder and place the input fastq.gz files in it, or symlinks to them
3.) Run ./Run_Pipeline
1.) Make sure Oracle Java 1.7 is installed, directions here for Ubuntu 14.04 http://askubuntu.com/questions/521145/how-to-install-oracle-java-on-ubuntu-14-04
This is needed for Mutect (Stage 11)
2.) Make sure bwa and samtools are installed and in your path (or change the location to where they are in the Stage 1 file, on Ubuntu 14.04 they may be
installed with apt-get:
sudo apt-get install bwa
sudo apt-get install samtools
3.) Install the multitude of tools referred to in the Stage 1 file and replace those paths with the paths to those tools and files for things like
reference genomes etc...
4.) Make sure permissions are correct (aka user running the script has access to $project_folder directory)
5.) Make a $project_folder and place the input fastq.gz files in it, or symlinks to them
6.) Run ./Run_Pipeline
Installing VEP:
Full API Installation: http://asia.ensembl.org/info/docs/api/api_installation.html
Some additional Perl modules may need to be installed for Bioperl, install anything it complains about missing using CPAN
If you use screen, note that the Perl environment variables set as described to not persist into the screen session without a little more configuration.
Create a file in your home directory called "screenrc" with the conetents "shell -$SHELL" without the quotes. Now screen will load your shell environment
when it starts.
In our case we don't want to have to read out online every time we run VEP, so we need to download some database/cache files as described here:
#GOAL: Get a gene list for use when calculating DepthOfCoverage from GATK
#The gene list is described here: http://gatkforums.broadinstitute.org/gatk/discussion/40/performing-sequence-coverage-analysis
#They describe how you can get a genelist here: http://gatkforums.broadinstitute.org/gatk/discussion/1329/where-can-i-get-a-gene-list-in-refseq-format
#When using the gene list particularly for the DepthOfCoverage it does not need the special format described there (-[arg]:REFSEQ /path/to/refSeq), use it as shown on the 1st page linked
#Make sure to leave the format as "all fields from selected table"
#Unfortunately the processing isn't done, they say: To run with the GATK, contigs other than the standard 1-22,X,Y,MT must be removed, and the file sorted in karyotypic order.
#That is what this R script does. GATK used to provide a Perl script for this but it has not been maintained and is in a broken state.