Reconstructing Voice Identity from Noninvasive Auditory Cortex Recordings

Charly Lamothe
Etienne Thoret
Régis Trapeau
Bruno L. Giordano
Julien Sein
Sylvain Takerkart
Stéphane Ayache
Thierry Artières
Pascal Belin

10 evaluations Published on Jul 10, 2025

This article on Sciety

Abstract

The cerebral processing of voice information is known to engage, in human as well as non-human primates, “temporal voice areas” (TVAs) that respond preferentially to conspecific vocalizations. However, how voice information is represented by neuronal populations in these areas, particularly speaker identity information, remains poorly understood. Here, we used a deep neural network (DNN) to generate a high-level, small-dimension representational space for voice identity—the ‘voice latent space’ (VLS)—and examined its linear relation with cerebral activity via encoding, representational similarity, and decoding analyses. We find that the VLS maps onto fMRI measures of cerebral activity in response to tens of thousands of voice stimuli from hundreds of different speaker identities and better accounts for the representational geometry for speaker identity in the TVAs than in A1. Moreover, the VLS allowed TVA-based reconstructions of voice stimuli that preserved essential aspects of speaker identity as assessed by both machine classifiers and human listeners. These results indicate that the DNN-derived VLS provides high-level representations of voice identity information in the TVAs.

Related articles are currently not available for this article.