Comparison of Brain Age Algorithms in Bipolar Disorder

Hui Xin Ng
Lisa Eyler

0 evaluations Published on Nov 21, 2025

This article on Sciety

Abstract

Advances in computational methods have accelerated the application of machine learning to analyze large complex biological data. By applying machine learning algorithms to neuroimaging data, researchers have estimated the "biological age of the brain" i.e., brain age, and used it as a composite metric for indexing brain health, as opposed to using individual features of the brain extracted from neuroimaging data. These machine learning algorithms/models, often known as "brain age" algorithms/models, may take supervised or unsupervised approaches and may utilize one or many imaging modalities during training. We applied 3 regression-based algorithm and 1 neural network-based algorithm trained on varying sample sizes of healthy comparison (HC) participants to estimate the brain age of 73 HC and 44 individuals with bipolar disorder (BD) in our neuroimaging study. Out of the four, 3 were pre-trained off-the-shelf algorithms and 1 was developed and trained on multimodal neuroimaging data from a local cohort. The multimodal algorithm was trained on 51 age-matched HCs and tested on the remaining 22 HCs and 44 BDs. The brain predicted age difference (brain-PAD) score was calculated by subtracting the chronological age from the predicted age. Across four brain age prediction algorithms evaluated in HC, BrainageR and DenseNet demonstrated the highest predictive accuracy (r = 0.83; 0.89) and lowest mean absolute errors (MAE = 5.94; 7.26). However, PHOTON (r = 0.65, MAE = 7.71) showed greatest sensitivity to BD as demonstrated by our logistic regression model where the PHOTON brain-PAD was a significant predictor (beta = 0.064, p < 0.05) of BD. Analyses using ICC revealed that agreement levels varied, with PHOTON achieving the highest ICC with DenseNet (0.78) and BrainageR (0.73), which suggests they may pick up similar brain features as opposed to the multimodal algorithm (0.17- 0.43) These results suggest that regularized linear models trained on large samples that explicitly exclude individuals with psychiatric diagnoses (i.e., PHOTON in this case) may be most sensitive to case-control differences despite having lower predictive accuracy. Our findings can serve as a starting point and quantitative reference for future efforts for researchers working with datasets that are similarly constrained by sample size but include unique combinations of imaging modalities.

Related articles are currently not available for this article.