Identifying patterns in coronavirus datasets using ontologies

Raised of $6,000 Goal
Ended on 12/20/20
Campaign Ended
  • $50
  • 1%
  • Finished
    on 12/20/20

About This Project

Using our recently developed COVID-19 Infectious Disease Ontology in combination with the existing Coronavirus Infectious Disease Ontology, we will annotate 400 articles in PubMed on coronavirus research. The resulting gold standard will be used to train algorithms for automated annotation tasks, which will be used to identify patterns in COVID-19 datasets, with applications including data mining and consolidation, pattern recognition, and hypothesis generation and testing.

Ask the Scientists

Join The Discussion

What is the context of this research?

Researchers have rapidly accumulated data relevant to SARS-CoV-2 and - COVID-19 - from a variety of sources in efforts to combat the virus (see examples in The Pivotal Role of TMPRSS2, Sex Differences in Mortality from COVID-19, Gender Differences in Patients with COVID-19). There is too much data generated to examine manually, and too little time to waste. Pathogenic patterns will invariably be overlooked by even the most attentive researchers. One solution is to use computational support and ontologies to explore patterns across datasets. Ontologies are structured vocabularies, already widely used in bioinformatics (as discussed here, here, and here) and biomedical data standardization, which support data integration, sharing, reproducibility, and automated reasoning.

What is the significance of this project?

Our COVID-19 Infectious Disease Ontology (IDO-COVID-19) - ontology details discussed here; ontologies found on Bioportal here) will enable rapid integration, discoverability, and reproducibility, of COVID-19 data across disparate medical research datasets, and provide needed data infrastructure to support machine learning and automated reasoning over such datasets. This will ensure the data is more easily harvestable for use in public health efforts to combat the COVID-19 pandemic. In that respect, our project has the potential to impact the COVID-19 pandemic similarly to the way the Gene Ontology has been used, with enormous success [], to facilitate data comparability in genome-related science.

What are the goals of the project?

Annotation of heterogeneous datasets using common ontologies links their data together by means of the semantically controlled properties of ontology terms, thus facilitating data integration. IDO-COVID-19 will be used to annotate approximately 400 articles in the National Library of Medicine COVID-19 corpus, which reports COVID-19 clinical trial, epidemiological, and pathogenesis data. This will provide a natural language processing ‘gold standard’ used to train algorithms for automated annotation tasks, which will, in turn, be used to identify useful patterns in COVID-19 datasets, with varying applications including data mining and consolidation, and pattern recognition.


Please wait...

We are in the second stage of a three-stage project to leverage ontologies - structured vocabularies comprised of scientific terms covering various domains - from the Open Biological and Biomedical Ontology (OBO) Foundry for coronavirus data coordination.

In stage one, we constructed ontologies supporting interoperability among OBO ontologies, focusing on COVID-19. Stage two involves manually annotating 400 articles with ontology terms. Stage three involves machine learning based on the results of the preceding stages.

Complete, accurate annotating of a document takes approximately 3 hours. We have trained 2 interns to annotate documents using our ontology. Our budget is to pay them for this labor-intensive project. John will be annotating, and is funded through Northwestern. That leaves 400 hours split between 2 interns, which at $3000 each is $15/hr.

Endorsed by

Ontology is an emerging integrative computer-interpretable AI platform that support data standardization, integration, reasoning, and analysis. This project aims to use ontologies to identify patterns in coronavirus datasets, providing a promising solution for uncovering new insightful patterns among the complex datasets.
This is an extremely valuable project, and Babcock and Beverley do extremely good work.

Project Timeline

With the assistance of ontology interns, over the course of one month we will annotate approximately 400 articles for our NLP 'gold standard'.

Nov 20, 2020

Project Launched

Dec 30, 2020

Complete annotations

Meet the Team

Shane Babcock
Shane Babcock
Dr. Babcock


National Center for Ontological Research
View Profile
John Beverley
John Beverley


Northwestern University
View Profile

Team Bio

Our team is called the "COVID Cats" and is comprised of several graduate and undergraduate students from Northwestern, as well as several undergraduate students from the School of the Art Institute of Chicago, all of which have been trained in data science methods employing ontologies.

Shane Babcock

I am a philosophy professor at Niagara University where I mainly teach Ethics. My work in philosophy is centered around the metaphysics of modality and dispositions.

I am an ontologist affiliated with the National Center for Ontological Research in Buffalo ( I work on infectious disease ontology. I have contributed work to the Infectious Disease Ontology (IDO) and its extensions, including the Virus Infectious Disease Ontology and the COVID-19 Infectious Disease Ontology, of which I was a co-developer along with John Beverley and Barry Smith.

John Beverley

I'm a logician presently working at the intersection of formal logic, applied ontology, and virology. Specifically, I'm developing logically well-structured vocabularies covering the domain of viruses in general and SARS-CoV-2 in particular.

I'm a graduate student at Northwestern University, an adjunct at the School of the Art Institute of Chicago, and a member of the National Center for Ontological Research in Buffalo ( I've contributed to the recent revisions of Infectious Disease Ontology (IDO) and its extensions, and am co-developer of the Virus Infectious Disease Ontology and the COVID-19 Infectious Disease Ontology with Shane Babcock and Barry Smith.

For more information about my projects, see

Additional Information

IDO-COVID-19 is an extension of the Infectious Disease Ontology (IDO) [Smith B, Cowell L, 2010:], which is itself an extension of the Ontology for General Medical Science (OGMS) [21347182] which itself extends from the Basic Formal Ontology (BFO). BFO was recently accepted as an ISO standard [,]

IDO-COVID-19 is freely available on Bioportal: https://bioportal.bioontology.... (as are all the related ontologies mentioned above).

A working version of the ontology is available on our GitHub site:

Related papers:

He, Y. Beverley, J. Smith, B. et al. CIDO, a community-based ontology for coronavirus disease knowledge and data integration, sharing, and analysis. Scientific data. 2020. (7):181.

Babcock, S. Cowell, L. Beverley, J. Smith, B. (2020). The Infectious Disease Ontology in the Age of COVID-19. OSF Preprint. (under review at Journal of Biomedical Semantics)

Beverley, J. Babcock, S. Carvalho, G. Cowell L. Duesing S. Hurley, R. Smith B. Coordinating Coronavirus Research: The COVID-19 Infectious Disease Ontology. OSF Preprint: (under review at PLOSOne)

Some overview of our larger project in the following videos:

Project Backers

  • 4Backers
  • 1%Funded
  • $50Total Donations
  • $12.50Average Donation
Please wait...