###
###detection of orthology workshop – 24 Feb 2015
###
###Sebastien Renaut

#A variety of clustering algorithms has been applied to determine Clusters of Orthologous Groups (COGs).
#Proteinortho is a tool to detect orthologous genes within different species.
#For doing so, it compares similarities of given gene sequences and clusters them to find significant groups. The algorithm was designed to handle large-scale data and can be applied to hundreds of species at one.
#more info here:https://www.bioinf.uni-leipzig.de/Software/proteinortho/manual.html###
#Lechner, Marcus, et al. “Proteinortho: Detection of (Co-) orthologs in large-scale analysis.” BMC bioinformatics 12.1 (2011): 124.
###
###1. Make sure you are familiar with the terminal and a few basix unix command
###Tips: a dollar sign indicates a command typed in the terminal. A number sign, is a comment. Do NOT fear an error message..
#what do these commands do?
top, cd, man, ls, cp, less, mv, which, pwd, nano, grep, mkdir
$ ls
$ man ls

###
###2. Make sure prerequisites are installed (blast, perl, python, make)
###
#2.1 You should be running a recent version of MAC OS (>10.9) or Linux
#2.2 Install Make if you don’t have it yet. (Make is part of Xcode, https://itunes.apple.com/ca/app/id497799835?mt=12)
#2.3 Verify that perl is there.
$ perl -v
$ perl
#2.4 python should be there too
$ python -V
$ which python
#2.5 Verify that BLAST is there. If not, get it from: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.30/ncbi-blast-2.2.30+-src.tar.gz
#Note that there is a downloadable image for blast (.dmg), but this may not work. Alternatively, download the latest tar.gz and compile it for your OS.
#2.6 Untar (tar -zxvf tar.gz) and move it in applications directory
#2.7 Compile
$ ./configure
$ make
$ sudo make install
#2.8 Check that all blast executable are installed in /usr/local/bin/
$ which blastx
$ ls /usr/local/bin/*blast*
#2.9 Familiarize yourself with BLAST options
$ blastx -h
$ blastn -h
$ tblastx -h
$ tblastn -h

###
###3.Installing proteinortho
###
#3.1 https://www.bioinf.uni-leipzig.de/Software/proteinortho/proteinortho_v5.11.tar.gz
#3.2 untar and move it to applications directory
#3.3 Compile (Note that you may need to remove binaries if working on MAC, but probably not). #$ rm proteinortho5_clean_edges proteinortho5_clustering
$ make
$ sudo make install

#3.4 familiarize yourself with options
proteinortho5.pl -h
$ mkdir results
proteinortho5.pl -project=results/test_CE -cpus=2 test/C.faa test/E.faa
proteinortho5.pl -project=results/test_CELM -cpus=2 test/C.faa test/E.faa test/L.faa test/M.faa
#what do the outputs look like?

###
###4.Can we build a more complex dataset (E.g.: protein kinases from 4 model species: humans, chimps, pigs, and mice. Enzyme that modifies other proteins by chemically adding phosphate groups to them).
###
#4.1 from http://www.uniprot.org/
download human (homo sapiens) protein kinases from uniprot
download chimp (pan troglodytes) protein kinases
download pigs (sus scrofa) protein kinases
download mouse (mus musculus) protein kinases
#4.2
$ mkdir prk_data
#4.3 move file to prk
#4.4 start a big orthology detection analysis!
$ nohup proteinortho5.pl -project=prk_data/prk -cpus=2 prk_data/*fasta >proteinortho.log&
#4.6 what is actually running during proteinortho5.pl? What is nohup?
#4.7 You can explore the result file and redo an analysis with different options
#4.8 How do you efficiently pull sequences from fasta file? many options… Can use this script… https://raw.githubusercontent.com/rec3141/rec-genome-tools/master/bin/fastagrep.pl
$ nano
#4.9copy content of script into terminal. In nano, save as fastagrep.pl and close.
^x
Y
fastagrep.pl
# 4.10 Make it executable
$ chmod +x fastagrep.pl
# 4.11 You can move it to the bin directory if you want…
$ sudo mv fastagrep.pl /usr/local/bin/.
$ fastagrep.pl ‘P0C0K6′ prk/*Pan* >ortholog1
$ fastagrep.pl ‘P08581′ prk/*homo* >>ortholog1
$ fastagrep.pl ‘Q2QLE0′ prk/*Sus* >>ortholog1
$ fastagrep.pl ‘P16056′ prk/*mus* >>ortholog1
#4.12 You now have a file called ortholog1 with 4 orthologous genes from the 4 species!