Bowen Jiang

Bowen Jiang

Aug 16, 2017

Group 6 Copy 223
1

Sequence Data Finally In! The Good, Bad, and the Ugly

Finally, after all these months of buildup, unbelievably tedious PCRs, and meticulous culturing, some sequence data have finally arrived - currently 192 individual sequences across all six barcodes and 27 strains of algae!

So first, the good news:

Running some initial tests, I've already been able to get some identification information running and processing for some of my strains that were previously unidentified. The large spheres characteristic of JIAC15, 21, and 24 have turned out to be (most likely) Chlorococcum strains, the very strange morphology of JIAC7 matches with the genus Geminella, and JIAC23, initially suspected to be a terrestrial member of the Chaetopeltidales and then later revised to a guess of Klebsormidium, was indeed found to be Klebsormidium.

The bad:

Not every marker performed equally well across the board, and not every strain had accompanying sequence data to identify it with. And closely associated with this idea,

The ugly:

Some bacteria sequences that got unwittingly picked up and detected include those for some really gnarly genera, like Pseudomonas or Stenotrophomonas, which can cause serious infection in humans with weak immune systems and are very difficult to treat with antibiotics. Yuck!

With that being said, let's look at the basic success of each barcode region, so we can get around to answering (at least in part) the original research question or objective of this project, although of course there is a lot more work to be done (and many sequence and, ugh, PCR repeats as well). Success here is defined here as whether or not a sequence put into BLAST, a giant government database and tool for basically matching sequences together, was able to successfully pair my sample sequences with a reasonable guess for the true species identity. Sure, it's subjective to some degree and not very quantifiable, but it's a good test procedure to see if I really have algal DNA, or bacterial DNA or even a sequence that is so mushy and bad-quality that it doesn't look like anything recognizable.

With this in mind, the UPA and tufA markers performed absolutely terribly, both of which amplified mostly bacterial and not algal DNA. That kind of surprised me, because even though these markers were designed to only amplify from photosynthetic organisms (green algae and maybe cyanobacteria), they picked up the sequences of some definitely....regular bacteria. The UPA marker returned 3 algal sequences, and the tufA marker returned 7 good sequences (technically 9 that could identify algae, although two of these were of low quality), even though both technically yielded far more PCR products (they just happened to be bacterial in origin).

ITS, 18S, and 26S genes, on the other hand, are all components found in ONLY eukaryotic organisms, basically meaning that bacterial contamination would not be an issue. None of these sequences were bacterial in origin, although the success rate was still not 100% due to some low-quality sequences that could not be recognized by the BLAST program. 13 ITS sequences passed the first inspection and yielded algal sequence matches, compared to 31 sequences for the 18S marker (in both the forward and reverse directions) and 38 for the 26S marker.

Finally, the rbcL gene performed the best for me (which I'm definitely happy about, seeing as I designed the PCR primers to pick out this gene region myself!). The reason why it worked so well, I think, was that it was designed specifically for the green algae, excluding not only all bacteria but also most all other culture contaminants like molds, yeasts, diatoms or any protozoa. None of the working sequences had any bacterial contamination evident, and although a few of these sequences failed due to low-quality sequencing, 42 other sequences were able to successfully match with green algal sequences in the BLAST database.

Now, I have some more tedious work to do, including editing my sequences (essentially it's like proofreading them - cutting out unnecessary parts and "correcting" a few misspellings), redoing the necessary PCR, and most importantly, finding out how to get rid of the bacterial contamination so that I can give the UPA and tufA markers a fair appraisal. Stay tuned...stuff is getting exciting now!

1 comment

Join the conversation!Sign In
  • Dr Kathryn Hall
    Dr Kathryn HallBacker
    Hi Bowen, I have been watching your study with interest. I am not sure who is advising you, but you are not accurate here with what you are saying. BLAST searching IS quantifiable, and IS objective. It is done algorithmically by a computer - there is absolutely no subjectivity involved and the output provides a number of metrics to assess the similarity of the sequences to others in the database. It is nice to see you having a go, but your work is not visible to peer-reviewers and has not been assessed for its scientific accuracy or the rigour of your interpretations. You really should not presume to be so didactic, especially when you are not correct in what you say. I find myself disagreeing with much of what you write ("unnecessary parts" of sequences...what are those?). Please listen to the advice of more experienced people. Enthusiasm is great, but the tortoise beat the hare in the end. Best wishes, Kathryn
    Aug 16, 2017
  • Bowen Jiang
    Bowen JiangResearcher
    Hello Dr. Hall, Thank you very much for your quick comment and continued interest in my project! Throughout most of my lab notes, I have been struggling with striking a balance between maintaining a readable and interesting vocabulary and note length and going into more detailed or complicated explanations of my methods, so I apologize as I do not think I did a very good job of explaining my use of BLAST and sequence editing software in this lab note. By calling BLAST subjective, I was referring to my making the subjective choices of which sequences to consider editing based off of the algorithmic and quantitative process by which the alignments are generated and scored, not the process of BLAST itself. The “unnecessary sequences” you refer to in your comments are simply the long tails of low-quality uncalled bases in each chromatogram that I think are artefacts of the capillary electrophoresis runs (e.g. for a 700-bp product, I would trim off the additional 700 base “tail” of mostly “N” bases at the end). However, upon rereading my lab note, I can see how my wording would have led to such ambiguity and misinterpretation, so I will work to get that edited as soon as possible.
    Aug 19, 2017
  • Bowen Jiang
    Bowen JiangResearcher
    Working on this project as a whole has definitely given me a greater appreciation for the difficulties of phylogenetic research. Of course, most of my lab notes at this time have been written with my personal assumption that the data presented are preliminary and not as accurate or precise as they will be upon further processing, but I do appreciate that you have noticed that my tone does not always reflect this, something which I will make sure to correct in future lab notes. I am currently trying to establish connections with a local college where I can get more direct and technical support on my methods and data processing, and I also have contact with a few algal phylogeneticists whom I plan to speak to more when I dive deeper into the analysis of my data, so especially as I move beyond the routine algal isolation and PCR I run at home I will definitely search for more assistance on the computerized portions of my project. I was wondering, however, if you would be able to assist me a little as well by perhaps helping me glance over some of my lab notes or speak personally about the goals and aims of my project. I would love to receive more of your input to figure out how I can continue to improve this project as well as word my lab notes so that it is more clearly presented to others. Thanks again for your feedback, and I hope to hear from you more as I continue to work on my research! Best regards, Bowen
    Aug 19, 2017

About This Project

The aim of this research is to find and test gene regions in the genome of freshwater green algae which can aid the identification of species in this taxon. These so-called molecular barcodes will be amplified by PCR and compared by sequence analysis, and their successful application will aid greatly in determining the current taxonomy of green algae, as well as conducting environmental surveys, identifying new species, or selecting strains for potential human use and applications.

Blast off!

Browse Other Projects on Experiment

Related Projects

Wormfree World - Finding New Cures

Hookworms affect the lives of more than 400,000,000 men, women and children around the world. The most effective...

Viral Causes of Lung Cancer

We have special access to blood specimens collected from more than 9,000 cancer free people. These individuals...

Cannibalism in Giant Tyrannosaurs

This is the key question we hope to answer with this study. This project is to fund research into a skull...

Backer Badge Funded

Add a comment