User Manual for Life w/o dUTPase workflow
*****************************************

***About***

- Program files in this directory (http://pitgroup.org/static/life_wo_dutpase/) are developed by Csaba Kerepesi.
- These files are free software: you can redistribute and/or modify them. 
- If you have any question about the worflow please feel free to contact me: kerepesi@caesar.elte.hu, Csaba Kerepesi

***Prerequisites****

- Perl 5.14.2 or later
- stand-alone UNIX blast: ncbi-blast-2.2.30+ or later 

***Workflow for finding bacterial genomes that do not contain dUTPase***

1. The bacterial and archaeal genome sequences was downloaded from the NCBI FTP site: ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/all.fna.tar.gz 
2. With the makeblastdb program, databases were generated from the genome sequences.
3. A file list was generated associated all of the genomes: GenAllGenomesFileNames.sh
4. DNA sequences corresponding to plasmids were filtered: perl allgenomes_wo-plasmids.pl > AllGenomeFileNames-wo-plasmids.txt
5. A directory was created and in it we run run-blast.pl against the file of the dUTPase sequences (dUTPase-tri-di1-di2-arch.fasta), then a list file was created from the results: ls *.fna > list
6. As same as above, but against the file of the UNG sequence (UNG.fasta)
7. As same as above, but against the file of the UGI-SAUGI-P56 sequences (UGI-SAUGI-P56.fasta)
8. perl find-nohits.pl > live_wo_di1-di2-tri-arch_dUTPase_E001-unsorted.csv (after the appropriate file names were written in find-nohits.pl )
7. perl sort_result.pl live_wo_di1-di2-tri-arch_E1e-2.csv > live_wo_di1-di2-tri-arch_E1e-2-sorted.csv


***Worflow for generating taxonomic distribution figures using MEGAN5:

1. The file that maps the gi values the Taxonomy IDs was downloaded from the NCBI FTP site: ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz
2. perl Annot-w-TAXID.pl generate the file NC-GI-TAXID-wo-plasmid.csv
3. perl gen-megan.pl
