G-Day Arrives!

I was anticipating today like a kid waiting for Santa Claus. What could provide such excitement - nothing less than a brand-new 454 DNA sequencer from Roche of course! This 'high-throughput' sequencer ( 'next-generation' now applies to a new set of instruments looming on the horizon) is the key to infinite potential genetic studies, from the whole genomes of bacteria, to ultra-deep sequencing of viruses, to transcription profiling and whole human genomes. The instrument has been funded by Advantage West Midlands, our local regional development agency with a focus on the emerging discipline of "translational medicine".

The installation went smoothly, thanks very much to Mark and Miles from Roche and looks to have passed all its initial tests. It will be running a further test sequencing run with a control sequence overnight to ensure it is working perfectly.

The sequencer comes in two parts, the instrument itself which is housed in our Genomics Lab, and a self-contained compute cluster to process the signal data and perform data analysis. The compute cluster is a nice piece of kit, a standard rack width unit which houses a head node, 4 compute nodes and a RAID array of 3Tb capacity, plus some network switches and a UPS. All that you connect up is the power (it needs a 16A plug) and the network connection and the rest is 'hands-off'. The sequencer itself sends partially processed image file data to the compute cluster via SSH and remotely executes the analysis pipeline. This is a complex signal processing step which is highly CPU intensive. The several terabytes of image data are eventually reduced to manageable files around a gigabyte in size. Both the sequencer rig and the cluster run variants of Linux (Red Hat and Fedora).

Users then do further downstream analysis such as assembly and mapping using the supplied Roche software, Newbler. This all works over a VNC connection so the cluster can be installed in a remote location (in this case, our purpose built server room). Processed data will then be moved to a storage area network for longer-term storage and archiving.

After staff training, the 454 will be offered to local users whilst we refine our knowledge of the sequencing workflow. Then it will be available to anyone who wants to use it.

Consumables currently cost around £5000 per run, generating between 400-500 megabases of data of average 400 base length. Consequently it is still not cheap to get started in genomics but the expectation is that this will be a standard request for future grant proposals. For those wishing to get started on limiting funds we will be looking at DNA barcoding (not yet available in kit form for 454 Titanium, hopefully ready in October/November) as a way of getting people started for smaller sums of money, by running many strains at the same time (with consequently lower depth of coverage). We plan to have a simple website available soon which will give information as to how samples should be prepared for the major different sequencing types, and give the ability to track them in our job queue.

Exciting times![gallery]