Advert: WG2 hackathon: Extracting strain level variation from shotgun metagenome data
06 Oct 2014Location: Cambridge at the Isaac Newton Institute November 7th -11th 2014
Organisers: Dr Christopher Quince - University of Warwick (c.quince@warwick.ac.uk) and Dr Nick Loman - University of Birmingham (n.j.loman@bham.ac.uk)
Special Guest: Dr Jared Simpson - Ontario Institute for Cancer Research, co-author of ABYSS and the SGA assembler.
Objectives: The objective of the workshop is to build on the success of the earlier COST ES1103 funded hackathon in Lisbon that developed the CONCOCT algorithm for contig clustering (http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.3103.html). CONCOCT uses co-occurrence information to cluster contigs into genome bins in an unsupervised fashion. In this follow-up hackathon we will explore three further avenues of research:
- Extension to strain-level variation: CONCOCT is very successful at extracting species level bins but it does not fully resolve individual strains. We will develop algorithms to extract strains possibly by incorporating co-occurrence information directly into assemblers or through machine-learning based decomposition techniques.
- Long-reads: We will explore the role of long-read data in strain-aware metagenomics assembly by analysis of data produced by new technologies such as Oxford Nanopore, Illumina Synthetic Long Reads (Moleculo) and Pacific Biosciences.
- Integration into metagenomics analysis pipeline: CONCOCT could form the first step in an unsupervised contig-based metagenome analysis pipeline. We will integrate CONCOCT into a complete pipeline for phylogeny, annotation, metabolic reconstruction and visualisation of metagenome bins.
This is a joint programme with the MRC funded Cloud Infrastructure for Microbial Bioinformatics (CLIMB) project and computational resources from the CLIMB computing infrastructure will be available to participants, including access to super high memory virtual machines (1.5-3TB).
Participants: This will be a computationally intensive workshop. We are therefore only requesting participants with bioinformatics software development expertise and/or knowledge of statistics and machine learning. Principally we will be coding in Python, Javascript and C or C++. Later in the year (Birmingham - December 10-12th) a workshop for training in metagenome analysis will be run - this will be suitable for biologists.
To participate please e-mail Chris Quince (c.quince@warwick.ac.uk) before the 13th of October from those that express an interest around eight COST funded individuals will be selected.
Please note a training element in these areas is being organised in December, details will be sent around in due course.