How to sequence a genome
Fireflies are our favorite part of summer, and their magical ability to produce light is used around the world in scientific research, but in this age where the "$1000 genome" is cited commonly in news articles, why has Team Firefly now decided to sequence the firefly genome? How will we do it? and what does it take to sequence a new genome in the "genomic" age of biology?

But lets say theres an interesting critter, maybe a beetle, bacteria, or plant that does something interesting, and you want to sequence its genome to catalog all the genes that might be responsible for this trait. Lets list out what it takes to go from organism -> genome, thereby "digitizing" an organism and letting you carry its full set of genes around on your laptop.
Step 1: Genome size
Genome sizes vary massively in organisms. Microorganisms, such as bacteria and fungi, have smaller genome sizes, whereas multi-cellular organisms, such as humans or plants, have larger genome sizes. For example, the human genome has ~3 billion base pairs (3 gigabase pairs - 3Gbp or 3*10e9) of A,T,C, or G that make up who you are. But actual genes only take up about 2% of the human genome! The other 98% is DNA where the function is less clearly defined, oftentimes repetitive DNA & remnants of dead viruses that have become integrated into the genome. In fact, multicellular organisms have comparable numbers of genes, but their genome size to hold that same set of genes can vary by 100s of times.

Genome sizes in different types of organisms: https://en.wikipedia.org/wiki/Genome_size
The genome size of your critter of choice matters, as the amount of sequencing (and cost) that is necessary is roughly proportional to the genome size. This huge variance in genome size means if you want to sequence a genome, you better check its size, as is could be 5-50x larger than you expect.
So what is the genome size of P.pyralis?
**Figure / raw data for genome size measurement**
Step 2: DNA extraction
Before you can sequence a genome, you first need to extract the genome from the critter. Since genomes are made of DNA

Isolating DNA from single specimens of adult male P.pyralis
Step 3: Sequencing
Once you've got that pure DNA in a test tube, its time to put it on a DNA sequencer. Technically speaking there is a bit more to it than that, as the DNA has to be fragmented to smaller sizes, and prepared into a form known as a "sequencing-library", but we'll skip over that.
Right now there are two dominant players in the DNA sequencing field:
1) Illumina (http://www.illumina.com) - for cheap short-reads (~50 bp - ~300 bp long)
2) PacBio (http://www.pacb.com) - for more expensive, but long-reads (~10 kb+)
Cheaper is better right? Not exactly. Though cheap reads let you do a lot, it is difficult to assembly a new high-quality genome from them. For sequencing new genomes, the standard in the field is now PacBio's sequencing technology.
Step 4: Assembly
Step 5: Annotation
0 comments