- Academic Editor
Background: The perception of basic emotional sounds, such as crying and laughter is associated with effective interpersonal communication. Difficulties with the perception and analysis of sounds that complicate understanding emotions at an early development age may contribute to communication deficits. Methods: This study focused on auditory nonverbal emotional perception including emotional vocalizations with opposite valences (crying and laughter) and neutral sound (phoneme “Pᴂ”). We conducted event-related potential analysis and compared peak alpha frequencies (PAFs) for different conditions in children with autism spectrum disorder (ASD) and typically developing (TD) children aged 4 to 6 years old (N = 25 for each group). Results: Children with ASD had a higher amplitude of P100 and lower amplitude of N200 for all types of sounds and higher P270 in response to neutral phoneme. During the perception of emotional sounds, children with ASD demonstrated a single P270 electroencephalography (EEG) component instead of a P200–P300 complex specific to TD children. However, the most significant differences were associated with a response to emotional valences of stimuli. The EEG differences between crying and laughter were expressed as a lower amplitude of N400 and higher PAF for crying compared to laughter and were found only in TD children. Conclusions: Children with ASD have shown not just abnormal acoustical perception but altered emotional analysis of affective sounds as well.
Individuals with autism spectrum disorder (ASD) have trouble recognizing and responding to the emotional and psychological states of other people [1, 2]. People with ASD tend to ignore emotional prosody and focus solely on semantics [3]. Their emotional reactions are usually immature [4] and thus hard to interpret. Different modalities of emotional expression, such as voice pitch, and facial and body gestures, are often incongruent [5]. People with ASD differ from their neurotypical peers in their ability to process and understand emotional states through both facial gesture and voice intonation. Such differences have been found in both children [6] and adults [7]. The ability to properly understand emotional context is essential for adequate social perception and communication. Recent studies have shown that the nonverbal auditory component of a complex multimodal message prevails over the visual component during the perception of interlocutor emotional states in the processing of social information [8]. Adults with ASD not only have abnormal processing of emotional prosody [9] but also have difficulty expressing basic emotions in speech [10]. Peculiarities in auditory perception in children with ASD have been shown in neurophysiological research [11, 12]. In particular, event-related potential (ERP) studies have shown altered sensory processing in people with ASD [13]. Children with ASD also have larger variability in auditory ERPs compared with typically developing (TD) peers aged 3 to 9 years [14] and demonstrate impairment in perception of emotional speech prosody [15], altered perception of nonverbal emotional sounds [16], and difficulty with an emotional assessment of musical fragments [17]. Other findings have confirmed that both late and early components of ERPs may be an ideal tool to investigate emotional perception and thus could be successfully applied to clinical populations [18, 19, 20, 21]. Previous electroencephalography (EEG) studies have also demonstrated that the frequency characteristics of EEG can be used to assess the perception of different valences of emotional sounds [22] and are sensitive to a subject’s emotional state during emotional stimulation [23]. Some clinical studies evaluating the association between peak alpha frequency (PAF) and emotional states have shown that PAF is significantly related to depressive symptomatology [24] (attributed to the subjective perception of tonic pain) [25].
The study had several goals. First, to identify the EEG traits related to the nonspecific auditory perception of emotional and neutral stimuli between children with ASD and TD children. Second, to identify the EEG traits related to differences in emotionally significant stimuli. Third, to study differences in the ability to discriminate among the emotional stimuli of different valences in children with ASD and their TD peers. The analysis approaches and stimuli were selected in accordance with the aforementioned goals.
We recruited 25 children with ASD and 25 TD children (see Table 1). ASD was diagnosed according to the International Classification of Diseases, 10th revision criteria by a clinical psychologist to exclude cognitive or mental impairment. Parents were asked to complete the Childhood Autism Rating Scale (CARS) with the assistance of a clinical psychologist to exclude ASD. All children in the ASD group were diagnosed with early childhood autism. The severity was assessed by the CARS. Children within the ASD group showed mainly motor, verbal, and play stereotypic behavior but less social and dimensional stereotypes. None of the subjects had a history of epilepsy or other seizures. The Wechsler Intelligence Scale for Children, fourth edition was used to classify intellectual disability (see Table 1).
Group | n | Age | Sex | CARS | WISC-IV | ||||||
min | max | mean | SD | m | f | mean | SD | mean | SD | ||
ASD | 25 | 4 | 5 | 4.84 | 1.8 | 14 | 11 | 40.4 | 9.3 | 87.2 | 10.2 |
TD | 25 | 4 | 6 | 5.25 | 1.9 | 13 | 12 | 15.4 | 4.1 | 90.1 | 9.5 |
ASD, autism spectrum disorder; TD, typically developing; CARS, Childhood Autism Rating Scale; SD, Standard deviation; WISC-IV, Wechsler Intelligence Scales for Children, fourth edition.
The inclusion criteria for participation in the study included a CARS score
The stimuli included a recording of infant crying and laughter vocalizations, which were purchased from internet sound databases (Sound Jay, Sound Library, Freesound, Soundboard). The raw audio files were downsampled to a rate of 44.1 kHz to mono waveform (WAV) files. All files were normalized to a root mean square (RMS) amplitude and were modified with respect to the stimulus length with WavelLab 10.0 (Steinberg, Hamburg, Germany).
Twenty-four original audio files (13 crying and 11 laughing vocalizations) were selected for pilot perceptual assessment by nineteen adults (students) in the pilot experiment (average age, 20.1 years; standard deviation = 3.7; range = 19–25; 10 females; none of these subjects participated further in the study). They were asked to rate each of the 24 stimuli presented in random order using the following scales (0–10): “unpleasant-pleasant”, “calming-arousing”, and “hardly recognized—well recognized”. After the pilot study, we removed hardly recognized stimuli and selected sounds with the highest rates of pleasantness (laughter) and unpleasantness (crying) with similar rates of arousal and physical characteristics (duration, pitch, and loudness). The phoneme “Pᴂ” was also selected after the same pilot study, as it was the easiest to recognize and the most emotionally neutral prosody. One stimulus of each kind was used in all the participants.
Finally, we presented crying, laughter, and phoneme vocalizations with the following physical parameters: “crying” had a duration of 751 ms, average pitch of 973 Hz, average loudness of 39.8 dB (RMS), maximum loudness of 45.0, and minimum loudness of 25.7 dB (RMS). Laughter was 755 ms long and had an average pitch of 961 Hz, average loudness of 41.2 dB (RMS), maximum loudness of 47.1 dB (RMS), and minimum loudness of 26.1 dB (RMS). Phoneme “Pᴂ” had a duration of 403 ms, average pitch of 967 Hz, loudness of 40.5 dB (RMS), maximum loudness of 45.2, and minimum loudness of 35.9 dB (RMS) (obtained with WaveLab 6; Steinberg Media Technologies GmbH, Hamburg, Germany). The sounds were presented using Presentation 22.0 (Neurobehavioral System, Inc., Berkeley, CA, USA).
Crying, laughter, and phoneme sounds were presented in a randomized sequence. Each stimulus was presented 50 times. The interval between stimuli was randomized in the range of 1500–3000 ms. The sound stimuli were presented with dynamics; the eyes of all participants were opened. The whole procedure took about 20 min.
Participants were placed in a sitting position in an acoustically and electrically isolated chamber during the recording. In the resting-state session, they were instructed to close their eyes, remain calm, and avoid falling asleep or engaging in any movement, speech, or other activity. EEG was recorded using a 19-channel Encephalan EEG amplifier (Medicom MTD, Taganrog, Russian Federation). The amplifier bandpass filter was nominally set to 0.05–70 Hz. Continuous EEG was recorded with 19 AgCl electrodes located according to the International 10–20 system with an average mastoid reference. The sampling rate was 250 Hz, with impedances below 10 kilohms. Eye movements were recorded with additional electrodes located above and below the left eye (for vertical eye movements) and lateral to both lateral canthi (for horizontal eye movements).
An independent component analysis (ICA)-based algorithm with the EEGLAB [26] plugin for MatLab 7.11.0 (MathWorks, Natick, MA, USA) was used to filter eye movement artifacts out of the continuous EEG corresponding to the resting-state session of each subject. Muscle artifacts were removed with manual data inspection. Finally, we analyzed the data of 50 children. The continuous resting-state EEG of each subject was filtered with a band-pass filter set to 0.5–30 Hz. Then the artifact-free EEG epochs underwent the Fast Fourier transform (FFT), which was used to calculate the power spectral density (PSD).
The next stage included analysis with EEGLAB 14 (a Matlab toolbox). The data were filtered with a 1.6 Hz high-pass filter, 30 Hz low-pass filter, and 50 Hz notch filter. The reference electrode was changed to a common average reference. Ocular and muscular artifacts were removed with ICA. The EEG was segmented to 1000 ms epochs starting from 200 ms before the stimulus onset. Individual ERP component traits (e.g., latency and amplitude) were extracted for further analyses. We measured and analyzed the amplitudes and latencies of the following ERP components: P100, N200, P200, P270, late positivity (LP), and N400. Each component was selected for each subject based on the topographical distribution of the grand-averaged ERP activity. In cases where the peak was not detected for the adjusted electrodes and latencies, we considered that the participant did not have the ERP component. Fz, F3, F4, Cz, C3, and C4 electrodes were used for the analysis of the P100 component (with the latency of 50–150 ms), N200 component (120–220 ms), P200 component (180–300 ms), and P3a (250–400 ms). Cz, C3, C4, Pz, P3, and P4 electrodes were selected for analysis of the LP (450–650 ms) and N400 components (400–600 ms). The electrodes chosen as the ERP components of interest usually have fronto-central or central-parietal-occipital localizations [27, 28].
We could easily visualize the LP components in individual ERPs of TD children
only for crying sounds; however, we could not visualize the individual LP
components for sounds of laughter and phoneme in the TD group or for any sound in
children with ASD. To analyze the difference between the LP component, we
calculated the square under the curve (S
To calculate the PAF, we selected 1.25 s EEG fragments beginning from stimuli
onset and finally received 47.8
PAF identification was conducted using FFT. The PAF was estimated as a value of frequency with maximal PSD from the range of 8–13 Hz based on the frequency discretization data. If no peak was present, it was not counted. Due to the absence of differences between resting-state PAFs (rsPAFs) calculated for resting-state EEG fragments with different durations (p = 0.92), we calculated the mean rsPAFs for each subject and used them for further analysis.
Statistical analyses were conducted with STATISTICA version 13 (StatSoft Inc,
Tulsa, OK, USA). Differences between groups were assessed with repeated measures
analysis of variance (ANOVA) with Tukey’s post hoc comparison (p
The ERPs of the TD and ASD groups had similar structures; however, the ERP components of children with ASD and TD children had some specific differences (Fig. 1). The differences in ERP components between children with ASD and TD children were most pronounced in the central, occipital, and parietal areas (see Fig. 1). The positive component P100 had a significantly higher amplitude in TD children for all types of stimuli (F(1, 47) = 13.982, p = 0.00009). The N200 had a significantly higher amplitude in the ASD group (F(1, 47) = 12.874, p = 0.00088). For latency of 200–400 ms, both groups of children had positive components; however, for the emotional sounds, TD children had two positive components P200 (P200, P3a), whereas children with ASD had a single P200 component (F(1, 47) = 13.272, p = 0.00012). During phoneme presentation, both groups had a single P3a; however the amplitude was significantly higher in TD children (F(1, 47) = 14.025, p = 0.00006). There were no significant differences in peak latencies. Detailed information is presented in Table 2.
ERPs of children with ASD and TD children for
both types of stimuli: phonemes (A) and emotional sounds (B). ERPs for Pz
electrodes are averaged over all conditions (laughter and crying). Scalp maps
indicate localizations of significantly different electrodes for N200 (A2) and
P300 (A3, B2) components. Stars indicate significant group differences (ANOVA)
between ERP components’ amplitude. **p
Group | Stimuli | ERP | P100 | N200 | P200 | P3а | LP | N400 | S |
S |
S |
TD | phoneme | Amp | 2.23 |
–3.17 |
- | 6.22 |
- | –0.95 |
7.3 |
725 |
377 |
Lat | 101 |
178 |
- | 308 |
- | 543 | |||||
crying | Amp | 1.07 |
–2.31 |
5.09 |
3.88 |
1.62 |
–0.12 |
116.4 |
718 | ||
Lat | 98 |
169 |
269 |
347 |
540 |
589 | |||||
laughter | Amp | 0.99 |
–2.82 |
4.47 |
3.71 |
- | –4.86 |
1.8 |
755 | ||
Lat | 103 |
181 |
272 |
342 |
- | 521 | |||||
ASD | phoneme | Amp | 0.02 |
–5.08 |
- | 1.84 |
- | –1.12 |
1.6 |
–12.6 |
164 |
Lat | 99 |
175 |
- | 307 |
- | 556 | |||||
crying | Amp | 0.39 |
–3.72 |
- | 3.33 |
- | –3.90 |
1.3 |
487 | ||
Lat | 122 |
171 |
- | 310 |
- | 549 | |||||
laughter | Amp | 0.79 |
–3.38 |
- | 3.18 |
- | 3.87 |
0.9 |
491 | ||
Lat | 119 |
168 |
- | 302 |
- | 562 |
Amp, Amplitude; Lat, Latency; ERP, event-related potential; LP, late positivity.
We have plotted ERP curves for emotionally different stimuli i both groups (Fig. 2). TD children had a significantly higher amplitude of the N400 component during laughter compared to crying (F(1, 47) = 12.119, p = 0.00051) located in the left frontal and temporal areas. Children with ASD did not have significant differences in N400 amplitude between crying and laughter. At the same time, children with ASD showed a significantly higher amplitude of the N400 component for emotional sounds compared to neutral phonemes. Similar differences were found in the TD group between laughter and phonemes (F(1, 47) = 15.836, p = 0.00002). At the same time, the emotional sound of crying induced the LP component (or other equivalent) on latencies from 450 to 650 ms only in TD children, whereas for other sounds (phonemes and laughter), the positive peak could hardly be detected. We also could not visualize the individual LP components in children with ASD.
ERPs for crying and laughter in two groups of subjects: (A)
children of the control group (B) children with ASD. The significant differences
(ANOVA) between sounds of crying and laughter were found for the amplitude of the
P300 component; the localization of differences is depicted in maps A2 and B2.
Stars indicate significant group differences (ANOVA) between ERP components’
amplitude. **p
To analyze the difference between LP components for different sounds and groups,
we calculated the square under the curve for a latency of 450–650 ms and y = 0
(S
The results showed that PAF significantly increased during emotional stimuli
perception only in TD children, whereas children with ASD did not show any
significant difference between rest and stimulation (condition(3)
Peak alpha frequency (PAF) between groups for different
conditions and its topography. (A) the group values of PAF averaged over all
cites. The significant differences (Student’s t-test) were calculated
inside each group and were marked with curly brackets (**p
We also found significant differences between PAFs for crying and laughter (F(2, 94) = 10.9754, p = 0.00009) only in TD children.
During the assessment of individual ERPs, we found that in the case when a child
had pronounced complex P2–P3a, the difference between crying and laughter at the
late latencies was also pronounced. We hypothesized the relationship between
positivity on the latency of 150–450 ms and the differences in processing
between crying and laughter on the later latencies. The results showed that
S
All children | TD group | ASD group | |||||||
N | R | p-level | N | R | p-level | N | R | p-level | |
S |
48 | 0.66 | 25 | 0.72 | 23 | 0.21 | 0.09 | ||
S |
29 | 0.61 | 25 | 0.69 | - | - | - | ||
S |
50 | 0.58 | 0.001 | 25 | 0.64 | 0.001 | 25 | 0.32 | 0.03 |
S
The results showed that compared to TD children, children with ASD were characterized by specific features of perception of sound stimuli, some of which were nonspecific and concerned with both emotionally significant and neutral stimuli, whereas others were associated with peculiarities of perception of laughter and crying sounds. In particular, children with ASD had a lower amplitude of the P100 component and a higher amplitude of the N200 component for all stimuli types (both emotionally significant and neutral) compared to the control group, which corresponded with previously identified features of perception of emotionally significant stimuli and phonemes in children with ASD [16, 29] and indicated specific features of sound perception in individuals with autism [30]. Changes in the amplitude and latency of P100 and N200 components in subjects with ASD have been extensively studied by researchers in the context of features of perception of complex stimuli that require a considerable amount of cognitive effort from children with autism, particularly activation of attention and memory processes [31, 32]. As previously shown, the differences in amplitude of these components identified in our study were more likely related to nonspecific features of sensory stimulus analysis than to emotional perception [32].
The other peculiarity of EEG in children with ASD was attributed to the valence of the nonspecific response to emotional stimuli. In particular, careful analysis of individual ERPs revealed that a complex of positive components was detected only in TD children presented with emotionally significant stimuli, which we labeled P200-P3a, whereas children with ASD had only one component when presented with emotionally significant stimuli. It was difficult to unequivocally distinguish between P200 and P3a in ASD individuals, but we considered P3a to be dominant over P200. Such an effect was previously shown in an oddball paradigm and is considered to be linked to challenged stimulus recognition [33]. In our paradigm, all stimuli had the same frequency, so this effect could be explained with challenged recognition of the stimuli (i.e., TD children recognized the same sounds of crying and laughter as repetition, while ASD children found it to be complicated). This was confirmed by previous works showing greater trial-to-trial variability (thus, complicated perception of repetitive stimuli) in individuals with ASD [34]. According to some studies, the presence of a double-positive complex in TD children can be regarded as a nonspecific response to emotional stimuli and is a consequence of the activation of cognitive processes necessary for analysis of the stimulus [33, 35]. The emotional nature of the double peak in children in TD children can also be confirmed by our results, according to which a single-positive complex was detected when a neutral phoneme was presented to both TD children and children with ASD.
The presence of the P2–P3a complex in children in the control group appeared to be closely related to the formation of later components, such as the LP and N400, which is associated with the analysis of the valence of emotionally significant stimuli [36]. In particular, it was found that differences between laughter and crying sounds were observed only in TD children and manifested in a greater amplitude of the LP component and a smaller amplitude of the N400 component for crying sounds compared to laughter. Our results are consistent with previous findings showing that a higher N400 amplitude is associated with the processing of emotionally incongruent stimuli [37] and emotional vocal expressions [21]. The results of these studies indicate that increases in N400 amplitude are typically observed when analyzing more complex emotional stimuli that have either some incongruent or verbal component or require analysis of subtle social relationships. Laughter versus crying is just such an emotion [38]. Regarding the differences between the valences of the two emotional states, it has been shown that an increase in LP can be explained by a direct response to the unpleasantness of an emotionally significant sound stimulus [19, 39], that is the involvement of an emotional response to the stimulus.
Regarding the relationship between the presence of a double-positive P2–P3a peak
and the S
The study had some limitations. First, it is difficult to communicate with children with ASD, so the extent of their compliance with the instructions to avoid thinking of anything specific and just listen to the sounds (that were not interesting by themselves) remains unknown. Second, the phoneme [p] itself may be quite different from the emotional stimulus; thus, it may be useful to use another control stimulus in further studies.
We compared the auditory emotional perception in children with ASD and TD children with similar IQ levels and used ERP analysis to study the sounds of crying, laughter, and neutral phonemes. Our findings indicated three levels of differences between the TD and ASD groups associated with a latency of ERP response. First, children with ASD had a lower P100 and higher N200 during the perception of both emotional and nonemotional sounds. Second, positivity on the latency of 150–450 ms was significantly more pronounced in TD children, and their ERP response to emotional sounds consisted of two components, P200 and P3a, unlike ASD children. Finally, the difference in ERP response between crying and laughter was found only in TD children and was associated with the amplitudes of late components (LP and N400) and PAF. We also found a correlation between higher positivity in the period of 150–450 ms and differences between valences of emotional stimuli.
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
GP has designed the study, analyzed the data, and produced the first draft. IS has participated in data collection and text revision. LM has participated in data analysis and text revision. All authors have read and approved the final manuscript. All authors have participated sufficiently in the work and agreed to be accountable for all aspects of the work.
The study has been approved by the Institute of High Nervous Activity and Neurophisiology of RAS (Protocol No. 2 from 20.04.2016). Participants’ parents provided written informed consent for study participation.
Not applicable.
The study was supported by grant of the Russian Science Foundation, project № 22-15-00324, “Social tactile contacts and their role in psycho-emotional rehabilitation”. https://rscf.ru/en/project/22-15-00324/.
The authors declare no conflict of interest.
Publisher’s Note: IMR Press stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.