Abstract
BackgroundOver the past few years, the considerable growth in the availability of population-scale genomic data has provided a significant boost in supporting quantitative, well powered, data-driven approaches to drug target discovery. However, population-scale genomic biobanks often lack comprehensive longitudinal phenotyping and in-depth clinical annotation. In contrast, clinical trial data, rich in phenotypic detail, frequently lack accompanying omics information, hindering mechanistic understanding of clinical findings. To address these shortcomings, we propose a framework called "back-translation" that leverages the strengths of both datasets by translating patient insights from clinical data to biobank context, to enable the discovery of novel insights based on the unique strengths of both data types.MethodsOur framework consists of two main steps. First, we identify a subgroup of interest within the clinical data and construct a classifier (risk score) to accurately identify patients in this subgroup. In the second step, we validate the derived risk score and then transfer it to the biobank data. The risk score serves as a proxy for characterizing the subgroup, which enables us to perform rare and common genetic variant association tests.ResultsWe demonstrate the value of this approach in a pilot study using clinical trial data from the FIDELITY dataset combined with biobank data from the UK Biobank (UKBB) and the German Chronic Kidney Disease (GCKD) cohort, focusing on fast kidney disease progression in patients with Chronic Kidney Disease (CKD). Our results show that the derived risk score accurately identifies high-risk patients in both FIDELITY and GCKD. Our genetic analysis of the clinical risk score in the UKBB identifies multiple genes that may serve as candidates for novel therapeutic target investigation.ConclusionWe propose a generalizable framework for the identification of data-driven targets that is therapeutic area-agnostic. This approach offers a novel and innovative opportunity to integrate clinical data into target identification via "back-translation," utilizing clinical insights previously underutilized in a research context. By bridging clinical and genetic data, our framework enhances the potential for discovering novel therapeutic targets and for advancing precision medicine.Trial registrationNCT02540993, NCT02545049</p>