Abstract
PURPOSE: The aims of this study were to train and evaluate deep learning models for automated segmentation of abdominal organs in whole-body magnetic resonance (MR) images from the UK Biobank (UKBB) and German National Cohort (GNC) MR imaging studies and to make these models available to the scientific community for analysis of these data sets.</p>
METHODS: A total of 200 T1-weighted MR image data sets of healthy volunteers each from UKBB and GNC (400 data sets in total) were available in this study. Liver, spleen, left and right kidney, and pancreas were segmented manually on all 400 data sets, providing labeled ground truth data for training of a previously described U-Net-based deep learning framework for automated medical image segmentation (nnU-Net). The trained models were tested on all data sets using a 4-fold cross-validation scheme. Qualitative analysis of automated segmentation results was performed visually; performance metrics between automated and manual segmentation results were computed for quantitative analysis. In addition, interobserver segmentation variability between 2 human readers was assessed on a subset of the data.</p>
RESULTS: Automated abdominal organ segmentation was performed with high qualitative and quantitative accuracy on UKBB and GNC data. In more than 90% of data sets, no or only minor visually detectable qualitative segmentation errors occurred. Mean Dice scores of automated segmentations compared with manual reference segmentations were well higher than 0.9 for the liver, spleen, and kidneys on UKBB and GNC data and around 0.82 and 0.89 for the pancreas on UKBB and GNC data, respectively. Mean average symmetric surface distance was between 0.3 and 1.5 mm for the liver, spleen, and kidneys and between 2 and 2.2 mm for pancreas segmentation. The quantitative accuracy of automated segmentation was comparable with the agreement between 2 human readers for all organs on UKBB and GNC data.</p>
CONCLUSION: Automated segmentation of abdominal organs is possible with high qualitative and quantitative accuracy on whole-body MR imaging data acquired as part of UKBB and GNC. The results obtained and deep learning models trained in this study can be used as a foundation for automated analysis of thousands of MR data sets of UKBB and GNC and thus contribute to tackling topical and original scientific questions.</p>