A triumph for Old World genomics

This week's Nature sees publication of a metagenomics survey of the human gut microbiome that is astonishing in scale. Nearly 600 gigabase pairs of raw data from stool samples from 33 Europeans processed to give over three million protein-coding genes! This is around 200 times larger than the most recent comparable study!

Also remarkable is the provenance of the study. If one had been taking bets on who would be the first to make the front cover of Nature with a study like this, the US-led Human Microbiome Project would have been most people’s clear favourites! Instead, this study is the fruit of a collaboration between two ends of the Old World, the European MetaHIT consortium and the world's largest genomics institute, China's BGI. There is life in the Old World yet!

But this is far from the last word on the human gut microbiome. 600 gigabase pairs sounds like a lot, but this could be completed in just three runs of the new Illumina Hi-Seq instrument! It makes you wonder whether future studies will need quite so many authors!

Also questionable is the use of a short-read sequencing approach for this problem. The scale of the assembly achieved in this study is stupendous. But, even so, instead of complete genomes or even gene clusters, we just end up with lots of genes, often unlinked to each other or to any organism. And the lack of any assembled 16S rRNA genes makes it impossible to compare this study with the large body of existing literature on phylogenetic profiling of the gut. Only when we have complete assembled genomes for all the inhabitants of the bowel can we consider the gut microbiome properly sequenced. In fact, it is interesting to speculate how long it will be before today's obsession with short-read technologies, and the analytical problems they bring, will be swept away by the tide of progress towards technologies that combine high throughput with long read lengths.

Still, I am being a bit churlish here. Three million protein sequences cover a huge swathe of protein space and represent a huge resource for future studies. I, for one, look forward to discovering how many flagellar genes or type III secretion genes are hiding in the human gut microbiome! And if a news piece in this week's Nature is anything to go by, we ain't seen nothing yet from the BGI, which now has 128 of the Illumina HiSeq instruments on order—putting the Old World on track to surpass the New World's sequencing capacity by a considerable margin! 中国人从此站立起来了!