资 源 简 介
Background. Transcriptomes are one of the first sources of high-throughput genomic data that have benefitted from the introduction of Next-Gen Sequencing. However, the interpretation of the large numbers of short reads and a quantitative estimation of gene expression presents a challenge, particularly in the absence of a reference genome. As sequencing technology becomes more accessible transcriptome sequencing will extend to multiple organisms for which genome sequence is unavailable.
Results. Here we propose a computational workflow for the reconstruction of expressed gene transcripts, functional annotation and a quantitative estimation of transcript abundance that does not require a reference genome sequence and can be tolerant to low coverage. Instead of mapping reads to a reference genome or completely unsupervised clustering of reads we assemble the unknown transcriptome using nearest homologs from a public database as seeds. The workflow includes pre-existing free software a