Lab Note #11
Nov 13, 2015

Today is an exciting day indeed! The samples you helped process have now been sequenced! It took a little longer than I hoped to get them ready and on the machine (oh science), but the day has finally come!

As a reminder, we had 15 samples of viruses from 3 populations and 5 time points that we were preparing for Illumina sequencing. Unfortunately I was not able to get one sample processed, so we lost one, but 14 is still a great sample size for this study. These 14 samples all went on the sequencer last week and the data is downloading as I write this. We obtained an incredible 177 million reads of sequencing data! Each one of these reads is 300 nucleotides long and the genomes range from 2000-5000 nucleotides long, so we have a ton of potential genomes for me to sort through!

Currently the reads are not put together with the other reads from the same virus. Imagine buying a bunch of lego kits and before you put them together you mix them all up into one giant box. Now to build the intended kit designs (ie. virus genomes) you need to find all the appropriate pieces out of this massive jumble of pieces, that will give you a good idea of what the data looks like currently.

The first step is called assembly, that is where a computer program (and not me, thankfully!) will sort through all of these pieces and put them together how it thinks the original genomes were. I will then take this and analyze what the computer did and clean up the data by removing any genomes that look as though they were assembled incorrectly, along with some other clean-up steps. At this point I will be able to start comparing the resulting cleaned-up genomes with known genomes from various databases.

As you can see this is going to be a long process, but I will keep you updated as I make progress. The first assembly that I do will just be from one time point and one population. I will then be able to identify the genomes in this one sample and will move forward from there. Once I get this piece of the puzzle sorted out I will send you all another update with how many genomes that sample contains and let you know how I will be proceeding from there.

