VIB NGS Meeting 2015

Whilst following the #PAGXXIII tweets I really noticed that livetweeting - whilst offering intriguing nuggets of information - are really hard to interpret without context. At best this is annoying, at worst it can result in countless clarifications and circular discussions.

I suggested a potentially different way of doing livetweeting which is to use a Google Doc or Etherpad to scribble notes, and then to tweet links to it with salient quotes.

Anyway, I thought I would give it a try for the conference I am attending and speaking at - the VIB NGS 2015 conference in Leuven, Belgium.

First off, it is trivially easy to set up an Etherpad (this one hosted by the folk at mozilla.org and embed it in this permalink. That way when the conference is over I will simply cut-and-paste the text and turn it into a permanent record, but for now my updates are live and subject to change. And if you are at the conference please feel free to add your own notes or clarifications, your text will come up in a different colour.

REVOLUTIONIZING NEXT-GENERATION SEQUENCING: TOOLS AND TECHNOLOGIES
VIB, Leuven, Belgium 15-16 January 2015
http://www.vibconferences.be/event/revolutionizing-next-generation-sequencing-tools-and-technologies

David Jaffe, Broad Institute, Personal Genome Assembly 

Traditional resequencing can miss loci unique to an individual. Define a “personal genome assembly” for $10k/genome. Process: 0.5 microgram DNA input, TruSeq PCR-free, one rapid run flow cell on HiSeq 2500. Output: 60x coverage. Variants detected with DISCOVAR (Nature Genetics). DISCOVAR contig N50 ~126kb (several times better than traditional de novo assemblers). Validation dataset produced using 100 Fosmids (4Mb) to finished quality (sequenced by Illumina and PacBio). Thread haploid sequence through assembly graph. If a path is found then assembly is good. Noticed errors rare: ~50kb between them, usually from long homopolymers.

Demonstrating visualising assembly graphs to find variation. For e.g nebulin - 10kb x 3 copy with evidence of variation between paralogous copies. Can’t call any SNPs with traditional alignment methods. Looking at graph complexity within tumours. Powerful method for detecting mutations that would be missed previously, although haplotyping still difficult. Plan to use this method to look at various experiments: trio studies, cancer, difference between different tissue types, difference between cells and cell lines, ideally do world reference population graph.

Thinking about future methods to improve contiguity: mate-pairs, genome maps (BioNano genomics: read positions of 7-mer nick sites across several hundred kb molecules). Aiming for $20K reference quality human genomes with HiSeq data plus one of these new datasets.

A few thoughts: Why not just use PacBio or Nanopore for long reads? Could DISCOVAR be used for bacterial or metagenomes? Is the graph visualiser easy to play with as the visualisations shown are nice. It’s curious to talk about $10k and $20k human genomes from the Broad when we are fed a diet of the $1000 genome from the HiSeq X Ten, but apples and oranges comparison.

Further reading:

Comprehensive variation discovery in single human genomes
http://www.nature.com/ng/journal/v46/n12/full/ng.3121.html

DISCOVAR online demo
http://broad.io/disco-demo

DISCOVAR blog
http://www.broadinstitute.org/software/discovar/blog/

Max Van Min, Cergentis

Targeted Locus Amplification – one primer pair (2x20bp) required to enrich a region. Uses physical proximity as basis for selection. Cross link DNA. Loci in genes are in close physical proximity compared with the rest of the genome. Inverse primers to generate 2kb amplicons (not sure why 2kb because >2kb can be picked out). Can use paired-end sequencing for haplotyping (or long read sequencing as compatible with any NGS workflow). Demonstrate ability to pick up inter chromosomal gene fusions e.g. in cancers. One primer per direction (?). Can multiplex lots of sequences. The more you sequence the more the coverage of the genome you will get. Coverage ~50-60kb per primer pair. Total time to do protocol about 2 days. Whole-cell product available to buy. DNA & FFPE protocol in testing. Input requirement 10 micrograms, not sure how much you get out for sequencing.

A few thoughts: This seems potentially really useful for a number of microbial applications, e.g. pulling down plasmids or other regions of interest from a very mixed sample. Judicious use of multiplexed primers might allow for nearly whole-genome recovery.

Further reading:

Cergentis website
http://www.cergentis.com/tla-technology

Targeted sequencing by proximity ligation for comprehensive variant detection and local haplotypinglhttp://www.nature.com/nbt/journal/v32/n10/full/nbt.2959.html


Evan Eichler, University of Washigton

Human genome has many duplications that are absent from the reference sequence.  Duplications are often unique to individuals. Segmental duplications are often missing or misassembled, particularly in those from short-read assemblers (e.g. YH). Structural variation largely measured indirectly with short reads (e.g. by read depth analysis, pair analysis, split read analysis). 

Step in long read sequencing on PacBio. Generate 45X sequence coverage of CHM1 (hydatidiform mole) - haploid genome. P5/C3 ~15% error. Did read-based detection of structural variants (Chiasson 2014 Nature). Closed 40 gaps and 50 extensions adding 1.1Mbp. Resulted in 20 additional exons in 12 gene models. 92% of insertions and 60% deletions are novel c.f. 1000 genome project. 

Aout 0.4% of human euchromatin still can’t be assembled with PacBIo shotgun WGS. Use 32 BACs for PacBio sequencing and assembly which added 416kbp of missing reference seq, also eliminated 856kb of sequence. Falcon Assembly and MHAP Assembly N50 is ~5 Mbp. Cost is $50-80,000 for PacBio human reference genome. 

Shotgun sequence assembly and recent segmental duplications within the human genome
http://www.nature.com/nature/journal/v431/n7011/abs/nature03062.html

Genome structural variation discovery and genotyping
http://www.nature.com/nrg/journal/v12/n5/full/nrg2958.html

Resolving the complexity of the human genome using single-molecule sequencing
http://www.nature.com/nature/journal/vaop/ncurrent/full/nature13907.html
http://eichlerlab.gs.washington.edu/publications/chm1-structural-variation/


Paul Schaffer, Roche Diagnostics

I was curious to see whether Roche would finally announce any products after a recent buying spree of interesting genomics companies, coupled with a large marketing agreement with Pacific Biosciences for clinical diagnostic assays. Roche now own bina, ABvitro, Stratos, Genia, KAPA, Foundation Medicine etc. Sadly in a bid to “underpromise and overdeliver” Paul offered “no details of timescales, specifications or features for new platforms”. A leading question from me about the possibility of a benchtop PacBio did not reveal any new information. I do wonder the marketing strategy of giving such talks with no actual technical information in, especially given Roche’s still raw humiliation with their handling of the once-great 454 platform. Also notable for Americanisms such as: “extremely laser focused” and “from soup to nuts” (http://en.wikipedia.org/wiki/Soup_to_nuts).


Mark Akeson, UC Santa Cruz Genomcs Institute

Mark Akeson also identifies as a MinION ‘fanboy’! Good on him. Starts with an introduction to nanopore sequencing. David Deamer and Branton are credited with initial idea. Deamer used to challenge physicists who claimed nanopore wouldn’t work. They said “Impossible? No! It’s just too hard!”. 1) Got to get DNA through a 2nm hole. 2) Need 5 angstrom control of nucleotides. 3) Requires an exquisitely sensitive sensor. 

It works because there are 10^20 particles per mm-2 s-1 – like a lightning bolt. Amplify 10^6-10^7 ions per nucleotide (not sure I follow exactly here).

Several problems to solve:
1) Capture and translocate DNA: Kasianowicz & Deamer proved that DNA would translocate through pore, by assessing movement between two compartments. 
2) Need the right size pore so only a few bases in contact with pore otherwise signal cannot be deconvoluted: Bayley figured out way with alpha-haemolysin and Gundlach with MspA. 
3) Rate control. Initially using Klenow fragment but still too fast, then Akeson discovered Phi29 polymerase keeps ‘dwell time’ of 12.5sec.

Partnership between Akeson and Gundlach labs to test Phi29 DNA pol & mspA. Sequence CAT motif in the ‘CAT’ experiment.

Characterization of individual polynucleotide molecules using a membrane channel
http://www.pnas.org/content/93/24/13770.full

Enhanced translocation of single DNA molecules through α-hemolysin nanopores by manipulation of internal charge
http://www.pnas.org/content/105/50/19720

Characterization of individual polynucleotide molecules using a membrane channel
http://www.pnas.org/content/107/37/16060.full

Automated Forward and Reverse Ratcheting of DNA in a Nanopore at Five Angstrom Precision
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3408072/

Now onto Oxford Nanopore history. AGBT 2012 Quotes: “Without data, how do we know this is not cold fusion?”. That gem from Jonathan Rothberg (where is he now?). MinION weighs 90 grams. Functions as a multiplexed axopatch – each amplifier can assay one of 4 wells at a time (so actually 2000 pores per MinION but only 500 addressed simultaneously).

Another challenge: typical bilayers rupture at low voltage and notoriously hard to reproduce. Relates to Intel’s problem manufacturing chips in the early 70s. When some days they just couldn’t make chips that worked they eventually tracked down to chemicals from crop dusters dusting the apricot harvest in Silicon Valley.

Solved problem of membrane instability via milling complex ‘wells’ using photolithography and a triblock polymer to form film. Resilient to shipping via FedEx.

Another achievement, making 2D reads. Hairpin and motor permits two direction reads – high-quality reads. 

Shows a single 48kb long read which is 87% identity and 90% coverage of phage reference genome – reads this long are rare, more usually around 10kb 

(http://figshare.com/articles/UCSC_Full_Length_Lambda_2D_Read/1209636 (Thanks Lex!)

You might be confused reading the literature about accuracy due to other publications, due to frequent changes in chemistry. Akeson’s results: 66% accuracy R6, 70% R7, 85% R7.3. Now focusing on R7.3 - they get 184-450Mb of bases per run, 17-55Mb of full 2D base reads (i.e. ~10% high quality 2D reads).

Showed that different aligners show different performance indel/mismatch wise (e.g. BLASR – high indel rate, LAST high mismatch rate). Devised EM algorithm to tune alignments and unify results. 99% of 2D reads map to reference.

Multimers of particular nucleotide are hard to sequence. Confusion amtrix shos that some bases more commonly missed than others. A <-> T uncommon G <-> C more common

Very long reads (36-42Kb) covered unassemblable highly repetitive gap of X-chromosome region. Short reads (10kb) suggest 8 CT47 gene copies.

Now planning on modelling modified bases. 5 known base modifications, sometimes differ by just 2 hydrogen atoms. Evidence that discrimination between modified and unmodified bases is possible.

Error rates for nanopore discrimination among cytosine, methylcytosine, and hydroxymethylcytosine along individual DNA strands
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3839712/

Working on protein sequencing – 700 amino acid protein S2-GT through a nanopore. Pull protein through and see features (not single amino acids)

Unfoldase-mediated protein translocation through an α-hemolysin nanopore
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3772521/

Q: What limits read length? Entropy: DNA is a ball at that length and so hard to capture it into pore.
Q: Is there a lower limit on read length? Can do 150bp.
Q: What do we do with the non-2D reads? You can still use them but work ongoing to increase relative yield of them so you don’t need to.