Assembling vertebrate genomes from short reads

Genome Biology logoNext generation sequencing (NGS) has heralded a new era in genomics, as the output of sequence data is now readily available for a cost that isfive orders of magnitude cheaperthan when the first human genome sequence was released. But one limitation of the data produced by the new sequencing technologies is that they are harder to assemble into a single contiguous sequence representing an entire chromosome, due to the short lengths of their reads and the relatively high error rates in identifying each base.

In a newarticlepublished in theMarch issueofGenome Biology, researchers from theThe Genome Centerat theWashington University School of Medicinewondered whether the shorter contiguous sequences that could be assembled with NGS reads could still be used for meaningful analysis of higher eukaryote genomes. To answer this question, they used two NGS technologies (Illumina and 454) to sequence the same sample of chicken DNApreviously sequencedby the traditional Sanger method.

As they describe in theirGenome Biologyarticle, they found that, although – as expected – Sanger sequencing produced far longer contiguous sequences than Illumina or 454, the NGS reads were able to produce assemblies with a gene coverage as high as 93%. Further, the accuracy of the assembled sequence was the equal of that obtained with the Sanger method. The Illumina and 454 assemblies between them also included over 30 million base pairs absent from the Sanger assembly.

This chicken case study is timely in the light of the recentPNASarticleonALLPATHS-LG, a software tool demonstrated by its developers to generate NGS read assemblies of unprecedented quality. Together, these articles may prompt the genomics field to re-examine the maxim that NGS is not well suited to large genome assembly.

View the latest posts on the On Biology homepage

Comments