How do we recognise who is speaking?

Search

Frontiers in Bioscience-Scholar (FBS) is published by IMR Press from Volume 13 Issue 1 (2021). Previous articles were published by another publisher on a subscription basis, and they are hosted by IMR Press on imrpress.com as a courtesy and upon agreement with Frontiers in Bioscience.

1 Jan 2014Review

How do we recognise who is speaking?

Samuel R. Mathias ^1,2,*, Katharina von Kriegstein ^1,3

Affiliations

Article Info

¹ MPRG Neural Mechanisms of Human Communication, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstrasse 1, 04103 Leipzig, Germany

² Center for Computational Neuroscience and Neurotechnology, Boston University, 677 Beacon Street, Boston, MA 02215, USA

³ Department of Psychology, Humboldt University of Berlin, Rudower Chaussee 18, 12489,Berlin, Germany

*Author to whom correspondence should be addressed.

Abstract

The human brain effortlessly extracts a wealth of information from natural speech, which allows the listener to both understand the speech message and recognise who is speaking. This article reviews behavioural and neuroscientific work that has attempted to characterise how listeners achieve speaker recognition. Behavioural studies suggest that the action of a speaker's glottal folds and the overall length of their vocal tract carry important voice-quality information. Although these cues are useful for discriminating and recognising speakers under certain circumstances, listeners may use virtually any systematic feature for recognition. Neuroscientific studies have revealed that speaker recognition relies upon a predominantly right-lateralised network of brain regions. Specifically, the posterior parts of superior temporal sulcus appear to perform some of the acoustical analyses necessary for the perception of speaker and message, whilst anterior portions may play a more abstract role in perceiving speaker identity. This voice-processing network is supported by direct, early connections to non-auditory regions, such as the visual face-sensitive area in the fusiform gyrus, which may serve to optimize person recognition.

Keywords

Speaker Recognition
Voice Perception
Psychophysics
Neuroimaging
Review