Constructing bacterial pan-genomes

Pan-genome construction notebook

31/10/2012

65 genome alignment has taken 9 hours to do nucmer alignments, and the MugsyWGA step is still running (~8 hours elapsed), so far about 4Mb of pan-genome available so I figure it’s less than 50% of the way through.

On to the visualisation on some test data:

Example input to R:

Block Sample MeanDepth PercentCoveredByReads
Block1 Sample1 40 99
Block2 Sample2 30 98
Block3 Sample3 40 99

Just try it and see how it works? Exclude gapped characters from % covered by reads.

Options for depth-counting:

Try piping samtools depth into a Python script first.

Script: depth.py

Mugsy finally finished running on the 65 genomes set, final pan-genome size 25Mb, 113k blocks. (is this reasonable?)

Now run all the metagenomics samples across the test pan-genome and depth script and get visualisation working.

Issues: