We have been using next generation sequencing (NGS) data since 2008. Barley has been one of the main crops to benefit from this, and our efforts in this area have mainly been focused on variant analysis and transcriptomics.
We have been actively involved in the discovery of new sequence variants from the beginning of our involvement with NGS. The germplasm collections we have mined for this purpose have included barley cultivars, landraces and wild barley (H. spontaneum). One of the main outputs from this was an 9k SNP genotyping chip based on the Illumina iSelect technology (http://bioinf.hutton.ac.uk/iselect/app/). This platform has been used around the world to genotype barley cultivars and other lines. Development of a new genotyping platform is currently under way at the institute, and this will feature substantially greater numbers of variants. This is one of the outputs of the BBSRC UK barley genome sequencing project.
Transcriptome data was particularly useful for gene annotation in the International Barley Sequencing Consortium’s 2012 publication of the barley draft genome. As part of the ongoing effort to produce a second draft sequence of the barley genome, we have recently worked with its collaborators at FLI in Jena and MIPS in Munich to generate a new set of barley transcripts using the PacBio IsoSeq technology (http://www.pacb.com/blog/intro-to-iso-seq-method-full-leng/). This technology is capable of generating sequences several kilobases in length and is thus ideally suited for the sequencing of full length transcripts. This dataset should provide a valuable addition to the existing set of barley resources. It is likely that the combination of longer, but more error-prone reads and shorter, more accurate reads will dominate how we work for the next few years, as the two approaches complement each other well.
A constant feature when working with new sequencing technologies is the volume of data. This is increasing exponentially and will likely continue to do so for the foreseeable future. In practical terms this requires constant expansion of our data processing and data storage facilities. Most of the work we do involves heavy use of our high performance compute cluster where large numbers of jobs can be processed in parallel. In particular, we have dealt with several very large exome capture datasets which have produced raw data and secondary analysis outputs on the terabyte scale.
The new sequencing data has also motivated tool development at the institute. Frustrated by the lack of suitable assembly viewers in the early days of NGS, members of the Hutton’s ICS group (Iain Milne and Gordon Stephen) set out to develop Tablet, a Java based desktop application that allows users to view very large read mappings of effectively any size. Tablet has been the most successful of our software packages to date and has over 30,000 users worldwide.
For further information on this project please contact Micha Bayer (firstname.lastname@example.org) from The James Hutton Institute.