DNA submitted for sequencing!
Dear firefly genome fans,
I have good news! High molecular weight DNA has been successfully extracted from fireflies and has been submitted for sequencing on the Pacific Biosciences RSII instrument of the Broad Institute in Cambridge, MA. If this DNA passes quality-control (update: it did!), the PacBio' sequencing for the firefly genome may be complete by the end of September.
Once we get the data back, the fun of assembling the scattered PacBio reads into a contiguous genome, annotating the genes, and mining the firefly genome for scientific discovery will begin! Thank you again for your contribution, and for joining us on this journey of discovery!
The actual PacBio RSII Sequencing instrument that the Firefly genome DNA samples will be sequenced on, at the Broad Institute in Cambridge MA. Currently open as was having maintenance done! (Not shown, PacBio service engineer working behind the instrument)
Inside the PacBio RSII DNA sequencing instrument. Unbelievable technology to determine the DNA sequences of single molecules. I'm told that the whole bottom half of the instrument are the complex optics needed to image the single molecules in the SMRT nanowells.
But wait, you might ask, the crowdfunding project finished in June, it is now… significantly past June. What was going on throughout this summer to get to this point? And why is the DNA being sequenced on the PacBio RSII instrument rather than the PacBio Sequel instrument which was initially planned? Read on to find out the answers!
In the DNA sequencing industry, one constant seems to be that every year the cost of sequencing drops significantly, maybe even by half or more! The PacBio Sequel instrument (released this Spring, but becoming widely available only this Fall) represents this situation, where the per nucleotide (per A,T,C or G) sequencing cost was greatly improved over the last generation RSII instrument. So why aren't we using the newer Sequel instrument? Cheaper is better right? Normally I would agree, but there is a catch.
The Sequel instrument currently requires about 10 times the input DNA of the RSII. If you’re working with a lab rearable organism, such as bacteria, fruit flies, or plants, more DNA isn’t that much of a problem. Simply grow more critters, and up-scale the DNA extraction! In our case, we estimated that this would require ~200 individuals of our target Photinus pyralis fireflies, and when you're working with a wild-caught population like Photinus pyralis, this number gets a little worrying. Actually, catching 200 fireflies isn't an insurmountable barrier, a dedicated person swinging an insect net on a good night can catch ~40 fireflies or so, and if you’re careful with catching only those fireflies with the characteristic flash patterns of Photinus pyralis (flashes at dusk and for ~30 minutes after, a single flash rising “J”, about 0.5s long), you should have mostly P.pyralis.
But what is the experimental worry of working with 200 fireflies for DNA extraction? First off, about 50% of fireflies you catch typically have red parasitic mite infections, which if not removed before DNA extraction, will show up in the data. We're trying to sequencing the firefly genome, not the firefly parasitic mite genome!
A second difficulty is species identification. In a given field there might be 10s of species of firefly, and while our target species Photinus pyralis has a quite distinctive flash pattern, and is physically pretty large as fireflies go, once fireflies are collected and stop their natural flashes it can be quite hard to distinguish different species of firefly. And when you catch 200 firefly specimens, the chance that you get a non Photinus pyralis in the mix gets to be pretty high. Once sequenced even a couple non P.pyralis in the batch could really screw up the genome assembly process. Luckily there are a few tricks to check the species of firefly outside of their flashing patterns:
1) Dissection of the male genitalia and careful observation under the microscope (this is the “gold standard” of insect species identification). Green’s Revision of the Nearctic species of Photinus (Lampyridae: Coleoptera) (1956), is the standard key for when you absolutely positively need to know the species of a firefly.
2) PCR amplification and Sanger sequencing of a small segment of DNA of known sequence to ID the species. The mitochondrial gene COII is a standard gene used for this purpose.
3) Certain morphological traits, namely a margin of ventral unpigmented tissue on the abdomen anterior to the lantern segments, which all Photinus pyralis possess (but so do some other Photinus sp. fireflies!)

The unpigmented ventral section posterior to the lantern segments of Photinus pyralis. If your firefly shows the “J” flash and has this pigmentation (annotated by the red arrow), it is a P.pyralis for sure!
So, to verify the species identity of all 200 fireflies by male genitalia dissection (option 1), which might take 5 - 10 minutes per specimen, would take a daunting ~33 hours total! Not to mention this is a fairly specialized skill. Luckily team members Sara Lewis and Sarah Sander are experts in this, but even still, it would have taken a long time!
To screen each firefly by sequencing, through PCR (polymerase-chain-reaction, which lets you amplify small segments of DNA) and targeted "Sanger" sequencing, might take ~$5 per specimen, so would take $1000 dollars total to verify the species identify of all 200 specimens! This also requires the DNA to be extracted individually from each firefly, which means handling 100s of tubes and solutions without mixing them up.
Similarly, to using option 3: check the vental pigment margin, and call it a day, does give the potential for closely related species of Photinus sp. to sneak into the mix, but if limited numbers are used, the chances are pretty small that a non Photinus pyralis would get in there.
Lastly, another issue is that for assembly of the genome, you want as little genetic variation in the input DNA for sequencing as possible, as genomic assembly is essentially a big jig-saw puzzle, even for one genome. By mixing in multiple similar puzzles (genomes from the same species), the assembly gets a lot harder. What are the levels of genetic variation one might deal with? Roughly it goes: haploid (1 copy of the genome – germline cells of multicellular organisms such as eggs and sperm, and microorganisms such as fungi & bacteria), clonal diploid (2 copies of the genome, but the same at every position between the 2 copies – inbred multicellular laboratory model organisms such as mice and fruit flies), normal diploid (2 copies of the genome, variation between the two copies ranging from 0.1% difference up to ~3% different. People, dogs, fireflies, many plants), polyploid (>2 copies of the genome, which is what some plants have. Here be the dragons of genome assembly). By using 200 fireflies, we certainly wouldn’t be minimizing the genetic heterozygosity!
In short, I think you catch my drift. Extracting DNA from 200 Photinus pyralis for the PacBio Sequel, while possible, has some downsides, such as it being pretty easy to screw up with non P.pyralis species, and having a more heterozygous population of DNA going into the DNA sequencing process, which would mean the $10,000 dollar sequencing built from your contributions wouldn’t be nearly as valuable.
So with that in mind, the decision was made to instead go with the RSII instrument, which gives the largest amount of sequencing data per quantity of DNA extracted. With that, a plan of attack was laid out.
1) This past summer, collect as many P.pyralis fireflies as possible (gotta’ catch em’ all!), on the basis of flash patterns. Collections were made from a single site in New Jersey, in a single year (namely, this year) to reduce genetic heterozygosity.
2) Select the 20 largest fireflies, with clear ventral pigmentation.
3) Dissect the mites off, and wash any exterior bound microorganisms off by an 100% ethanol wash.
4) Extract high-molecular-weight DNA, submit for sequencing.
Happy to say that as of this lab note, the plan has been fully executed. Stay tuned for another lab note which describes this protocol & has some pictures of the process (update: here it is https://experiment.com/u/FfpCl...) !
0 comments