Putting together a genome. Part I.
Putting together a genome is like assembling a puzzle. Typically genomes are sequenced in short fragments, maybe only 250 letters (or bases) of sequence data per fragment. Then we use computer algorithms to put millions of these these fragments together to assemble a genome sequence. For well known organisms like humans, fruit-flies or laboratory mice we have a reference to guide the genome assembly. It's a bit like a puzzle where the picture on the box serves as a guide for arranging the peices in the right order.
However, for organisms for which there is no reference sequence we have to peice together the genome from stratch or de novo (de novo is from Latin meaning 'of new'). This is a much more challenging task. It is akin to assembling a puzzle without the benefit of the picture on the box as a guide.
One way to help overcome this challenge is to have two sorts of sequencing reads, the relatively short fragments, like described above, and longer fragments. Sequence data from the longer fragments may serve as a scaffold, a frame to which the shorter reads may be matched. This is especially useful for assembling regions of the genome that have repetive sequences.
The first step in our project is creating a reference genome for Narcissus Flycatcher. Once we have a reliable reference we can use it to assemble genomes from other individuals, just as in done in well known model organisms in biology, like mice and fruit flies. This step is more than half-way completed as we have good short-read sequence data for one individual Narcissus Flycatcher sample, a sample collected from an individual of the migratory subspecies in Japan.

MinION sequencer from Oxford Nanopore Technologies. Tiny but poweful.
MinION sequencing is an ideal part of this approach as it can provide very long sequencing reads, thousands or even tens of thousands of DNA bases. MinION sequencing has already been used to sequence the genome of the European Eel, a similar sized genome to that of the Narcissus Flycatcher. The MinION device uses nanopore sequencing which biochemically feeds DNA molecules though a pore imbedded on a membrane. There is a an electrical potential across the membrane and an open pore completes the circuit, however, if the pore is blocked by a molecule of DNA this changes the current. Different bases in a molecule of DNA being rapidly fed through the pore will alter the current in slight but detectable ways and machine learning algorithms have been employed to translate these changes in current into base reads.
Click HERE for link to nanopore sequencing video.
We currently have the short read genome seqeuncing part of this process complete. Compared to the MinION seqeucning this is a much more laborious and expensive process done on the Illumina HiSeq sequencing platform. We are now conducting some quality control on the short-read Illumina data and so far things look good, but more on that in the next lab note.
Support from you will help us move on to the next steps. Thanks for your interest in our project. Keep following for more updates.
3 comments
MinION sequencing has already been used to sequence the genome of the European Eel
reference to guide the genome assembly