Generating MLST profiles from short-read data

There are now several available options if you want to call MLST profiles from whole-genome data.

[caption id="attachment_1366" align="alignright" width="300" caption="Result from the DTU MLST web server"][/caption]

DTU MLST Server

The web server at the Center for Genomic Epidemiology at the Danish Technical University is probably the easiest option, with the advantage that it will accept both raw read files and assemblies. It worked well when I tried it, however it was quite slow to return results and if you are uploading large read datasets it will take some time, particularly if you are analysing a large number of samples. It also does not have all of the MLST database listed (I wanted to use C. albicans).

BIGSdb

BIGSdb is a powerful and flexible web server software that can be installed on your local PC or server. It offers the ability to call MLST profiles from assembled genome data, as well as setting up your own typing schemes based on other epidemiologically informative marker genes. But non-bioinformaticians may find it a little tricky to set up.

Update: There is also a hosted version of BIGSdb which lets you cut-and-paste your de novo assembly into the sequence query form and get profiles out, available for a certain subset of the MLST databases (more available on request to Keith Jolley).

SRST

SRST comes from Kat Holt's group in Melbourne. It runs on your local machine and is notable because it calls profiles from short-read data without prior de novo assembly. It gives a confidence score to assignments. As it has some dependencies (BWA, samtools, BLAST) and runs as a Python script it is probably best run on a Linux machine or a Mac.

I found it works quite well on the Illumina data I tried, however there are a few tips for getting it running that are probably worth documenting for other users.

Roll your own (suggested by Anthony Underwood, HPA)

Of course what many people do is first perform a de novo assembly, perhaps with Velvet, and then BLAST the contigs against the MLST allele database. You can then inspect the results manually, or write a little script to collect the results into a profile. If you have one you'd like to share, please post the link in the comments below. Here's my Python script for what it's worth ...