Parsimony and Machine Learning in Neuroimaging

Recommended citation: Nino Migineishvili, Peter J. Molfese, John A. Lee, Peter A. Bandettini, Phillip Shaw, Adam G. Thomas, and Dylan M. Nielson; Parsimony and Machine Learning in Neuroimaging https://osf.io/kdt68

Abstract

The disparity between an individual’s brain age and their chronological age can be an indicator for various neurological disorders. A previous brain-age prediction study investigated the ability of multimodal brain imaging data to predict age, relying on anatomical and functional brain data to build a machine learning model with over 10,000 features. In our preregistered study, we used anatomical MRI data from the NIMH/NHLBI Data Sharing Project (NNDSP) dataset to compare accuracy in prediction of age for a complex machine learning model with a large number of features to a simple machine learning model with only four features: white matter fraction, grey matter fraction, CSF fraction and intracranial volume, chosen a priori. With samples from a large lifespan sample (N=441, age 5-77) as our training and test data we found that the predictive ability of the complex model was similar to the predictive ability of the simple model on out of sample data. We also tested the generalizability of each model to novel data from the Human Connectome Project (HCP) databank (N=895, age 22-37) and found that the complex model outperformed the simple model. Both the simple and complex model performed worse than chance in predicting age on the HCP data, which is likely attributable to the limited age range of the data and our stringent, preregistered definition of chance performance. In our exploratory analysis, we tested the generalizability of each model on the Nathan Kline Institute (NKI) dataset (N=907, age 6-85) and again found no significant difference between the predictive ability of the simple and complex models. The performance of the simple age prediction model illustrates some of the trade offs between parsimonious vs complex models for predicting brain age. Given the comparable performance of the approaches, the more rapid parsimonious approach using only a few features is generally advantageous.