About This Project

Familial hypercholesterolemia (FH) is a genetic disorder that causes high concentration of bad cholesterol from birth. While genetic testing is the norm, it is expensive and not always available. We aim to create machine learning algorithms that predict the outcome of a genetic test based on a blood sample and patient's characteristics. We will then assess whether our algorithms have higher predictive powers than clinical scoring systems.

Ask the Scientists

Join The Discussion

What is the context of this research?

Familial hypercholesterolaemia (FH) is a disease that affect around 1 in 250 people, which is around 30 million people worldwide. As it is a genetic condition, it is shared by multiple generations within affected families. It is currently under-diagnosed and treatments are sub-optimal.

Genetic testing is the ideal solution to diagnose patients with FH but, when it is available within a country, it comes with an hefty price. With the help of Machine Learning techniques we can now approximate a genetic test based on patient clinical phenotype and patient family history.

If our algorithms have equal or superior predictive powers than clinical scoring systems, then they would provide a better approximation to genetic testing.

What is the significance of this project?

Over the recent years, many FH patients databases have been compiled and harmonized together. Providing new insights on the disease and how it is managed.

With these new data sets, the advent of big data analysis and the availability of novel machine learning techniques, we are in a good position to provide more accurate prediction algorithms.

If successful, these prediction algorithms, will be instrumental to diagnose more patients regardless of the availability of genetic testing within their countries.

The algorithm will be trained to diagnose patients with different characteristics. Younger patients could be diagnosed earlier and their treatments could start sooner in life. Eventually, We hope to reduce the burden of FH disease on a global scale.

What are the goals of the project?

The alternative hypothesis of our project is that Machine Learning (ML) algorithms perform as well or better than existing scoring systems (H0:predictionML=predictionScoring).

In the first step of the project, we will build, train and cross-validate various ML algorithms. We will then combine these algorithms into one super algorithm using Ensemble ML techniques.

In the last stage of the project, we will assess the performances of our individual and super algorithms against scoring systems on external databases such as UK Biobank. Metrics such as Area Under the Curve, predictive powers, sensitivity and specificity will be used to conduct the assessment.

Budget

Tuition Fee - 1 years (PhD)

$2,000

Additional Research Data on Negative Genetic testing

$500

The biggest portion of the proposed amount, $2,000, will contribute toward my tuition fees for the first year ($2,429/ year for Imperial College London employees). With that contribution, I would be able to start the experiment and build ML models.

While we have access to a database of patients with genetic mutation (>61,000 individual as of July 2019), it does not contain many patients without genetic mutations. The remaining $500 portion, will be crucial to buy additional data sets on patients without genetic mutations from external data sources such as Discover UK. Without these additional data, it will not be possible to train ML algorithms.

Endorsed by

Christophe is a very capable and dedicated researcher. His background in Medical Statistics and Computer Sciences makes him a strong researcher to work on this project. This experiment, if successful, would eventually decrease the burden of cardiovascular disease on the global scale. Given Christophe's knowledge and experience, I am very confident that he would be able to appropriately implement the most efficient algorithms to accurately diagnose FH.

Mansour

Imperial College London

Project Timeline

In the first 6 months, we will redact a statistical and ML analysis plan and a protocol.

In the last 6 months, various models and algorithms will be developed and trained on the test and internal validation data sets.

Sep 26, 2019

Project Launched

Feb 01, 2020

Early Stage assessment: Feasabilty and protocol

Oct 01, 2020

Development of Machine Learning Algorithm/Models

Feb 01, 2021

Publication of peer-reviewed publication on ML models performance. Publication will be shared with backers.

Meet the Team

Christophe AT Stevens

Data Scientist in Healthcare

Affiliates

Imperial College London

View Profile

Christophe AT Stevens

I am an Information Technology professional who was awarded a Bachelor in Computer Science at EPHEC, Belgium. I am also a biostatistician and was awarded a Master in Medical Statistics at the University of Sheffield, United Kingdom.

My area of interest is health and more specifically medicine, biostatistics and machine learning. Recently, I have been working on cardiovascular disease prevention and on the implementation of a global registry of patient with familial hypercholesterolaemia.

My aim is to make medicine and healthcare more efficient via the use of computer technologies and statistics. That's the reason why I participate in data sciences competitions where I usually rank in the top 2%.

Lab Notes

1 Lab Notes Posted

Mid-crowdfunding campaign and exciting news!

October 11, 2019

Project Backers

11Backers
32%Funded
$795Total Donations
$72.27Average Donation