About
Many drugs that are approved to treat everybody with a certain disease diagnosis, may work well for some people with the diagnosis while providing no benefit for others. One of the reasons for this is that people with the same diagnosis of disease, although they may have similar seeming symptoms or signs, may not have the same cause of the diseases at the level of molecular or cellular changes. This may be because there a several molecular/cellular routes to similar manifestations of disease. This mismatch between diagnosis and molecular/cellular cause, results not only in lack of predictable benefit from marketed drugs but also in difficulty in testing the efficacy of experimental drugs. Additionally this results in some molecular causes never being treated, because they do not fall neatly within a diagnosis.
Here we will attempt, using machine learning, to define disease not by diagnostic codes but by empirical measures such as genotype, biomedical imaging or by more granular symptom self-reporting. We hypothesise that groups of participants with similar molecular causes of disease can be discerned within the UK Biobank cohort. We plan to define these UK biobank participant groups and then compare them to the diagnoses the participants have received. We will compare different ways of building the groups (from DNA sequence alone, or from image or clinical features as well) with how predictive they are of diagnoses. We will also compare what we learn in UK biobank with other ways of subclassifying disease. We will publish these findings and also use them to inform the drug discovery work within BenevolentAI, a UK based biotechnology company.