Riaan F. Rifkin

Riaan F. Rifkin

Feb 19, 2017

Group 6 Copy 28
1

Bioinformatic analyses and population controls

Following aDNA sequencing, post-sequencing data processing is required to make sense of the sequenced DNA data. Bioinformatic (computational) analyses are used to piece together sequenced aDNA fragments by mapping the individual reads to reference genome databases. The DNA nucleotide sequences obtained via shotgun sequencing are also identified by matching these to existing sequence databases, with a ‘blasting’ procedure that uses a Basic Local Alignment Search Tool (BLAST). Related genome sequences are however required to detect endogenous DNA and exclude contaminating sequences. Besides recovering more sequences for analysis, a more closely related genome sequence also gives a more complete picture of the ancient genome by avoiding a bias against highly diverged regions. Correspondingly, the absence of comparative sequences derived from close relatives limits the value of a genome project of an extinct species as any sequence comparison will be limited to genomic regions that share sufficient conservation to reliably detect ancient DNA sequences. 

Given the influence of contamination on the reliability of DNA sequences obtained from ancient samples, appropriate analytical and population protocols are required during the analyses of the extracted aDNA sequences. Analytical controls includes the use of software programmes such as MapDamage (Jónsson et al., 2013) which computes nucleotide misincorporation and fragmentation patterns using NGS reads mapped against a reference genome and EAGER which is used to perform quality control, mapping, authentication, contamination estimation and genotyping of NGS data. The EAGER pipeline incorporates methods for paired-end read merging, duplication removal and mapping that are tailored to improve the analysis output for aDNA projects. The PALEOMIX pipeline also supports the quantification of post-mortem DNA damage and standard misincorporation and fragmentation patterns. When several genomes are available, PALEOMIX can reconstruct maximum likelihood phylogenomic trees and reveal the phylogenetic relationships among taxa. Finally, metaBIT is an integrative and automated metagenomic pipeline for analysing microbial profiles from HTS shotgun data. This software can also be used to monitor laboratory contamination and detect microbial species, including pathogens. 

Population controls involve the comparison of aDNA sequences to databases specific to the geographic region or temporal period from which the aDNA derives. To ascertain the validity of obtained sequence reads, and as one would expect to recover aDNA from a specific set of indigenous vertebrate and botanical species, population controls particular to local (i.e. southern African) environmental parameters must be applied. This can be achieved via comparison with databases such as the International Barcode of Life (IBOL) and the African Centre for DNA Barcoding (ACDB). African and non-African human single-nucleotide polymorphism (SNP) data is available at the 1000 Genomes Project website (Auton et al., 2015) and the Online Ancient Genome Repository (OAGR). This enables the comparison of human aDNA sequences with known San hunter-gatherer and Bantu-speaking agro-pastoralist sequences. Whereas specific mutations are related to the San lifestyle, such as the VDR allele associated with higher bone mineral density, UGT1A3 (associated with increased metabolism of endo- and xenobiotics) and ACTN3 (associated with increased sprint and power performance), others, such the Bantu-speaker Duffy null (DARC) malaria-resistance allele, and the European-derived lactase persistence allele and the SLC24A5 allele (associated with light-coloured skin), are indicative of geographically-foreign human populations. 

1 comment

Join the conversation!Sign In
  • Robert Hackman
    Robert Hackman
    Is there convincing evidence that sufficiently intact DNA persists in soil samples from the age range to be studied?
    Feb 20, 2017

About This Project

'Pathocene' integrates the concepts of 'pathogens' and the 'Pleistocene' epoch. The project aims to discover ancient pathogenic DNA (apDNA) from Southern African archaeological caves and rock-shelters spanning the past 100,000 years. While ancient hunter-gatherer groups could not sustain infectious agents like measles and influenza, it is nevertheless from this pre-65,000 year sub-Saharan African ‘Pleistocene disease baseline’ that most modern human diseases derive.

More Lab Notes From This Project

Blast off!

Browse Other Projects on Experiment

Related Projects

Urban Pollination: sustain native bees & urban crops

Bee activity on our crop flowers is crucial to human food security, but bees are also declining around the...

Wormfree World - Finding New Cures

Hookworms affect the lives of more than 400,000,000 men, women and children around the world. The most effective...

Viral Causes of Lung Cancer

We have special access to blood specimens collected from more than 9,000 cancer free people. These individuals...

Add a comment