About This Project
Familial hypercholesterolemia (FH) is a genetic disorder that causes high concentration of bad cholesterol from birth. While genetic testing is the norm, it is expensive and not always available. We aim to create machine learning algorithms that predict the outcome of a genetic test based on a blood sample and patient's characteristics. We will then assess whether our algorithms have higher predictive powers than clinical scoring systems.
Ask the ScientistsJoin The Discussion
What is the context of this research?
Familial hypercholesterolaemia (FH) is a disease that affect around 1 in 250 people, which is around 30 million people worldwide. As it is a genetic condition, it is shared by multiple generations within affected families. It is currently under-diagnosed and treatments are sub-optimal.
Genetic testing is the ideal solution to diagnose patients with FH but, when it is available within a country, it comes with an hefty price. With the help of Machine Learning techniques we can now approximate a genetic test based on patient clinical phenotype and patient family history.
If our algorithms have equal or superior predictive powers than clinical scoring systems, then they would provide a better approximation to genetic testing.
What is the significance of this project?
Over the recent years, many FH patients databases have been compiled and harmonized together. Providing new insights on the disease and how it is managed.
With these new data sets, the advent of big data analysis and the availability of novel machine learning techniques, we are in a good position to provide more accurate prediction algorithms.
If successful, these prediction algorithms, will be instrumental to diagnose more patients regardless of the availability of genetic testing within their countries.
The algorithm will be trained to diagnose patients with different characteristics. Younger patients could be diagnosed earlier and their treatments could start sooner in life. Eventually, We hope to reduce the burden of FH disease on a global scale.
What are the goals of the project?
The alternative hypothesis of our project is that Machine Learning (ML) algorithms perform as well or better than existing scoring systems (H0:predictionML=predictionScoring).
In the first step of the project, we will build, train and cross-validate various ML algorithms. We will then combine these algorithms into one super algorithm using Ensemble ML techniques.
In the last stage of the project, we will assess the performances of our individual and super algorithms against scoring systems on external databases such as UK Biobank. Metrics such as Area Under the Curve, predictive powers, sensitivity and specificity will be used to conduct the assessment.
The biggest portion of the proposed amount, $2,000, will contribute toward my tuition fees for the first year ($2,429/ year for Imperial College London employees). With that contribution, I would be able to start the experiment and build ML models.
While we have access to a database of patients with genetic mutation (>61,000 individual as of July 2019), it does not contain many patients without genetic mutations. The remaining $500 portion, will be crucial to buy additional data sets on patients without genetic mutations from external data sources such as Discover UK. Without these additional data, it will not be possible to train ML algorithms.
In the first 6 months, we will redact a statistical and ML analysis plan and a protocol.
In the last 6 months, various models and algorithms will be developed and trained on the test and internal validation data sets.
Sep 26, 2019
Feb 01, 2020
Early Stage assessment: Feasabilty and protocol
Oct 01, 2020
Development of Machine Learning Algorithm/Models
Feb 01, 2021
Publication of peer-reviewed publication on ML models performance. Publication will be shared with backers.
Meet the Team
Christophe AT Stevens
I am an Information Technology professional who was awarded a Bachelor in Computer Science at EPHEC, Belgium. I am also a biostatistician and was awarded a Master in Medical Statistics at the University of Sheffield, United Kingdom.
My area of interest is health and more specifically medicine, biostatistics and machine learning. Recently, I have been working on cardiovascular disease prevention and on the implementation of a global registry of patient with familial hypercholesterolaemia.
My aim is to make medicine and healthcare more efficient via the use of computer technologies and statistics. That's the reason why I participate in data sciences competitions where I usually rank in the top 2%.
- $795Total Donations
- $72.27Average Donation