Bowen Jiang

Bowen Jiang

Mar 07, 2018

Group 6 Copy 124
0

The (preliminary) results are in! First impressions of new sequence data

Hi everyone,

So, I've spent the last few days doing a surface skim of my newest batch of sequence data. It's not a technical term; for me, this simply entails going through each file one-by-one, giving it maybe a 10-second evaluation on whether or not it's worthwhile to use it in my analysis. The real point of this, so far, is to just get some general ideas about how effective each barcode marker is in terms of universality (what percentage of the total attempted reactions yielded some passable data), as well as what species of algae I might have or which barcodes might be misidentifying.

There's no real other easy way to get into it besides...well, just jumping straight into it.

So, overall, I sent in samples to be sequenced in 183 distinct sequencing reactions, of which around 135 returned passable data. (I say around 135 because what I consider "passable" data is somewhat subjective. The most important qualifier is whether or not the sequence can be successfully aligned with and identified as algal DNA, but then the length and strength of the alignment as well as the cleanliness and uniqueness of the sequence also factor in, and I have to make the call as to whether or not that sequence is worth continuing to work with.) This gives an overall success rate of around 73-74%, which is comparable to my previous large round of sequencing I reported on last year. Now, in terms of the percentage of successful reactions...while it doesn't seem like I've made much improvement, consider that the majority of the reactions I am running this time around were the ones that failed that first time, and many of them did end up yielding some pretty nice-looking data after this round. Some reactions repeatedly failed (both this time and last), and still others were just short or frankly too messy for me to bother editing by hand - but at their most basic level, they still functioned to identify the alga.

25 samples I sent for sequencing were of the UPA 23S rRNA region, and of these, 17 were able to be sequenced successfully and could be used to identify algae. However, the sequences for JIAC14 and JIAC20 were very short and weak (didn't align with much confidence), so really 15 of the UPA sequences are good. 15 out of 25 doesn't sound that good, but it's far better than the original 3 successful reactions out of 34 attempted from my first round of sequencing. I'd mainly attribute the newer success to the fact that the algal DNA templates used for this reaction were prepared from antibiotic media cultures, which greatly cut down on previously rampant bacterial contamination (the UPA region is found in both chloroplast DNA and the bacterial genome). Some nice highlights were JIAC7, confirmed as Geminella sp., and JIAC27, which was sequenced for the first time in this round and turned out to be Chlorella, like I had guessed in a previous lab note.

tufA is another gene found in both prokaryotes and eukaryotes, and as I learned from my first round of sequencing, the primers I am using are not specific to the eukaryotic chloroplast elongation factor TU as I assumed from reading its source paper. It did fare slightly better than the UPA, however (I believe 7 or 8 successful sequences from an attempted 33 reactions), and it also proved to have a higher success rate in this new round of sequencing, with 23 of 26 sequences being of predicted amplicon length and high quality, as well as being of the correct (or a plausible) species. I am particularly happy that JIAC28 sequenced well (identified as a species of Stigeoclonium, like Fila1) and I got my first identification of JIAC30 - as a strain of Kirchneriella, not quite what I had guessed but very closely related to the Selenastrum I initially assuumed it to be.

Although the ITS region has been widely cited for its potential for barcoding of algae, I really haven't gotten much success from it in my current project. This round of sequencing yielded 13 good sequences from an attempted 22 - an additional 2 sequences (from the filamentous isolates Fila1 and OedoSF, a stringy mat-forming alga I collected at Golden Gate Park in San Francisco) yielded good algal matches but were about 400 base pairs too short for incorporation into a full-scale analysis. It is a eukaryote-specific marker, so bacterial contamination certainly wouldn't have played a role in its malfunctions...I suspect either minimal fungal contamination, loss of DNA during the purification step, or (most likely) bad amplification for the underwhelming performance of ITS. It's just not amplified very reliably throughout my experiments, and I have yet to figure out why.

The rbcL marker is my pride and joy, because I designed it myself, and I've generally been very impressed with its consistent universality and specificity. It amplifies well from representatives of both Chlorophyta and Streptophyta, building off of "precursor" primers mentioned in previous papers that worked well for identification within specific taxa. It's also quite informative - the way I am running rbcL, I have one forward and one reverse run pieced together to yield a fragment of around 1350-1450 bp in length spanning nearly the whole gene. I ran 41 reactions, counting forward and reverse runs separately, and of these, close to three dozen I'd call "good". A few of them are a bit on the short side, so I would just worry about successfully being able to piece them together. And a few others were just surprising to me - for example, both the forward and reverse reactions for Fila2 and OedoSF yielded strong matches with Spirogyra, but I know Fila2 and OedoSF are NOT Spirogyra....not only do all of the other DNA sequence data contradict this finding, but Spirogyra are about the only algae I can identify visually with any ease, and Fila2 and OedoSF don't have its characteristic spiraled chloroplasts. Also, the reverse reaction of JIAC26 turned up as Chlorella, which again conflicts with observational and other molecular evidence I've collected so far. One thing to note, the final pieced-together sequence from the longer markers (rbcL, 18S and 26S) may have slightly different matches than the individual parts separately, so exactly what these turn out to be, I'm not quite sure.

Like the ITS region, the 18S rRNA has been suggested and used as a barcode marker for algae, although it is longer and less variable, making analyses more complex and less specific in smaller taxonomic divisions. Also like ITS, I haven't really been able to find much success with 18S, the main reason being nonspecific amplification and incorrectly-sized bands in my PCRs (which signals contamination of some kind or incorrect amplification conditions; considering how many times I've tried to change the latter, I'm suspecting the former). Of 32 reactions I sent to be sequenced in the forward and reverse directions, 16 yielded long, high quality sequences. Generally, the reverse direction performed better than the forward direction in terms of sequence alignment length, which I would guess is due to better primer binding. The most interesting thing about this barcode region was that JIACs 7, 13, 13i and 14 were all strangely identified as amoebae of some kind (including Naegleria species, in the same genus as the infamous brain-eating amoeba). I never noticed any contamination in my cultures with these amoebae (and even if they were there, the algae still outnumber them enough such that amoeboid sequences would have been "drowned out"), so I'm definitely intrigued....if anything, there seems to be an association between the symbiotic Chlorella and this misidentification, but I really don't have enough information to go off of that.

Finally...the 26S rRNA region was not one I expected to have much success, but I have been very pleased with how it's worked throughout this project. I'm not even sure exactly why I picked it at first (from an old paper published in 2001, I believe is where I got most if not all of the primers), seeing as it's not commonly used for algal identification or phylogenetic studies, but it has demonstrated quite impressive universality. The main problems with this marker are nonspecific amplification during PCR and the fact that it is long - regularly over 2000 base pairs for the 5' end I work with - which may make it unfeasible for high-throughput analysis. In addition, it doesn't have a particularly strong library at the moment, which sometimes leads to mismatches between the alga species I have and some DNA sequences from different families. Out of 37 reactions I attempted in both the forward and reverse directions, 27 are good - that number is closer to 30 when I factor in sequences that could be used to identify the alga, but might not be long enough for piecing together the whole sequence.

So, some strain-specific highlights not already covered: JIAC32 and JIAC33, as I predicted, were identified as Dictyosphaerium. However, only about half of the markers I tested had Dictyosphaerium sequences in their library; the others gave the closest match as Chlorella, which is closely related and far more heavily sequenced, so it makes sense that there would be Chlorella sequences for Dictyosphaerium to make a "default" match with. All of the "new Scenedesmus-like" strains (JIAC29, 31, 34, 35, and 36 - the generally larger-celled strains mostly from San Francisco) sequenced well, but identifications often varied between strains I thought were closely related if not identical, as well as between markers. Some identifications even varied between genera (most commonly Scenedesmus, Desmodesmus, Acutodesmus and Tetradesmus). As to why this is...further analysis will help me figure it out, but what I know for sure is that the family Scenedesmaceae has undergone tremendous shifts in the past few decades as what were once thousands of species in a single large genus have been split up into new taxa or eliminated altogether. The main reason for this, of course, is the discovery of phenotypic plasticity, the idea that alga in the family can change their physical appearance dramatically when their culture conditions change or if they detect predators in the ecosystem - however, the use of DNA-based species definitions has definitely had an impact as well, especially in recent years. The main problem is that database sequences are rarely updated to reflect taxonomic changes, so while their core information (the sequence) does not change, it is often confusing to see my specimen line up with names that might be obsolete synonyms, and try to deduce what it is by looking at other sequence matches with a host of other taxonomic names. And finally (after a large digression), what is OedoSF...I have no clue still, but so far the data seem to suggest it's something closely related to Cladophora, which would place it in the Ulvophyceae instead of the Chorophyceae.

So.....lots of information, but the basic gist of it is a majority of the sequences turned out well, and worked as they should! As I work through the editing process and post more weekly or biweekly updates, I'll probably give more in-depth analyses of my sequences, especially the interesting ones and all, and maybe include some "practice" phylogenetic analysis with some close cousins to them I find. So stay tuned. Good stuff.

0 comments

Join the conversation!Sign In

About This Project

The aim of this research is to find and test gene regions in the genome of freshwater green algae which can aid the identification of species in this taxon. These so-called molecular barcodes will be amplified by PCR and compared by sequence analysis, and their successful application will aid greatly in determining the current taxonomy of green algae, as well as conducting environmental surveys, identifying new species, or selecting strains for potential human use and applications.

Blast off!

Browse Other Projects on Experiment

Related Projects

Wormfree World - Finding New Cures

Hookworms affect the lives of more than 400,000,000 men, women and children around the world. The most effective...

Viral Causes of Lung Cancer

We have special access to blood specimens collected from more than 9,000 cancer free people. These individuals...

Cannibalism in Giant Tyrannosaurs

This is the key question we hope to answer with this study. This project is to fund research into a skull...

Backer Badge Funded

Add a comment