Language models accurately infer correlations between psychological items and scales from text alone

Björn Erik Hommel
Ruben C. Arslan

0 evaluations Published on Apr 3, 2025

This article on Sciety

Abstract

Many behavioural scientists do not agree on core constructs and how they should be measured. Different literatures measure related constructs, but the connections are not always obvious to readers and meta-analysts. Many measures in behavioural science are based on agreement with survey items. Because these items are sentences, computerised language models can make connections between disparate measures and constructs and help researchers regain an overview over the rapidly growing, fragmented literature. Our fine-tuned language model, the SurveyBot3000, accurately predicts the correlations between survey items, the reliability of aggregated measurement scales, and intercorrelations between scales from item positions in semantic vector space. In our pilot study, the out-of-sample accuracy for item correlations was .71, .89 for reliabilities, and .89 for scale correlations. In our preregistered validation study using novel items, the out-of-sample accuracy was slightly reduced to .59 for item correlations, .84 for reliabilities, and .84 for scale correlations. The synthetic item correlations showed an average prediction error of .17, with larger errors for middling correlations. Predictions exhibited generalizability beyond the training data and across various domains, with some variability in accuracy. Our work shows language models can reliably predict psychometric relationships between survey items, enabling researchers to evaluate new measures against existing scales, reduce redundancy in measurement, and work towards a more unified behavioural science taxonomy.

Related articles are currently not available for this article.