An automated pipeline for construction of Reference Transcript Datasets (RTD) to enable rapid and accurate gene expression analysis in plant species.

Current technologies to measure gene and transcript expression are called RNA sequencing (RNA-seq) which by sequencing millions of transcripts allows RNA levels to be measured on a genome-wide scale. Currently, transcript annotations in many plant species are poor and are further exacerbated by thousands of novel but misassembled transcripts from poorly controlled RNA-seq transcript assemblies.

A high-quality RTD for a plant/crop species of interest allows RNA sequencing data to be analysed rapidly and accurately with state-of-art alignment-free programs (Salmon; Kallisto) giving improved, robust analysis of differential gene expression and differential alternative splicing analysis with reduced false discovery rates. Accurate gene/transcript expression analysis of, for example, biotic/abiotic stress responses, development pathways, genotype-phenotype relationships, mutants etc using RNA-seq data can identify key genes of interest to biotechnologists and plant breeders.

The research team are developing a fully automated pipeline (RTDBox) that can be used by scientists with basic bioinformatics skills or bioinformaticians with little experience in transcriptomics. Such a program would also be designed to allow the incremental improvement of the RTD with the automatic incorporation of any new RNA-seq data (Illumina, PacBio, Nanopore). Within the pipeline, the team are developing a transcript evaluation suite that will provide evaluation metrics to help biologists to identify and remove mis-constructed transcripts from assembly programs as well as understand the quality and completeness of the RTD generated. All their experience and expertise has been brought together to make user-friendly software for plant scientists to measure gene expressions more accurately and thereby improving the exploration of biological processes across the globe.

Project Leader: Dr Runxuan Zhang, James Hutton Institute

Funding: BBSRC Biological and Bioinformatics Resources BB/S020160/1