The Levels of Auditory Processing during Emotional Perception in Children with Autism

Background : The perception of basic emotional sounds, such as crying and laughter is associated with effective interpersonal communication. Difficulties with the perception and analysis of sounds that complicate understanding emotions at an early development age may contribute to communication deficits. Methods : This study focused on auditory nonverbal emotional perception including emotional vocalizations with opposite valences (crying and laughter) and neutral sound (phoneme “Pᴂ”). We conducted event-related potential analysis and compared peak alpha frequencies (PAFs) for different conditions in children with autism spectrum disorder (ASD) and typically developing (TD) children aged 4 to 6 years old (N = 25 for each group). Results : Children with ASD had a higher amplitude of P100 and lower amplitude of N200 for all types of sounds and higher P270 in response to neutral phoneme. During the perception of emotional sounds, children with ASD demonstrated a single P270 electroencephalography (EEG) component instead of a P200–P300 complex specific to TD children. However, the most significant differences were associated with a response to emotional valences of stimuli. The EEG differences between crying and laughter were expressed as a lower amplitude of N400 and higher PAF for crying compared to laughter and were found only in TD children. Conclusions : Children with ASD have shown not just abnormal acoustical perception but altered emotional analysis of affective sounds as well.


Introduction
Individuals with autism spectrum disorder (ASD) have trouble recognizing and responding to the emotional and psychological states of other people [1,2]. People with ASD tend to ignore emotional prosody and focus solely on semantics [3]. Their emotional reactions are usually immature [4] and thus hard to interpret. Different modalities of emotional expression, such as voice pitch, and facial and body gestures, are often incongruent [5]. People with ASD differ from their neurotypical peers in their ability to process and understand emotional states through both facial gesture and voice intonation. Such differences have been found in both children [6] and adults [7]. The ability to properly understand emotional context is essential for adequate social perception and communication. Recent studies have shown that the nonverbal auditory component of a complex multimodal message prevails over the visual component during the perception of interlocutor emotional states in the processing of social information [8]. Adults with ASD not only have abnormal processing of emotional prosody [9] but also have difficulty expressing basic emotions in speech [10]. Peculiarities in auditory perception in children with ASD have been shown in neurophysiological research [11,12]. In particular, event-related potential (ERP) studies have shown altered sensory processing in people with ASD [13]. Children with ASD also have larger variability in auditory ERPs compared with typically developing (TD) peers aged 3 to 9 years [14] and demonstrate impairment in perception of emotional speech prosody [15], altered perception of nonverbal emotional sounds [16], and difficulty with an emotional assessment of musical fragments [17]. Other findings have confirmed that both late and early components of ERPs may be an ideal tool to investigate emotional perception and thus could be successfully applied to clinical populations [18][19][20][21]. Previous electroencephalography (EEG) studies have also demonstrated that the frequency characteristics of EEG can be used to assess the perception of different valences of emotional sounds [22] and are sensitive to a subject's emotional state during emotional stimulation [23]. Some clinical studies evaluating the association between peak alpha frequency (PAF) and emotional states have shown that PAF is significantly related to depressive symptomatology [24] (attributed to the subjective perception of tonic pain) [25].
The study had several goals. First, to identify the EEG traits related to the nonspecific auditory perception of emotional and neutral stimuli between children with ASD and TD children. Second, to identify the EEG traits related to differences in emotionally significant stimuli. Third, to study differences in the ability to discriminate among the emotional stimuli of different valences in children with ASD and their TD peers. The analysis approaches and stimuli were selected in accordance with the aforementioned goals.

Participants
We recruited 25 children with ASD and 25 TD children (see Table 1). ASD was diagnosed according to the International Classification of Diseases, 10th revision criteria by a clinical psychologist to exclude cognitive or mental impairment. Parents were asked to complete the Childhood Autism Rating Scale (CARS) with the assistance of a clinical psychologist to exclude ASD. All children in the ASD group were diagnosed with early childhood autism. The severity was assessed by the CARS. Children within the ASD group showed mainly motor, verbal, and play stereotypic behavior but less social and dimensional stereotypes. None of the subjects had a history of epilepsy or other seizures. The Wechsler Intelligence Scale for Children, fourth edition was used to classify intellectual disability (see Table 1).
The inclusion criteria for participation in the study included a CARS score <25 (average score, 17.8 ± 2.4) and age from 4 to 5 years old. The exclusion criteria included a history of neurological or mental disorders other than ASD, a history of brain injury or other comorbid conditions, active psychopharmacotherapy, and epileptic activity on EEG. The research protocols were approved by the ethics commission of the Institute of High Nervous Activity and Neurophysiology of RAS (Protocol No. 2 from 20.04.2016). Participants' parents provided written informed consent for study participation.

Stimuli
The stimuli included a recording of infant crying and laughter vocalizations, which were purchased from internet sound databases (Sound Jay, Sound Library, Freesound, Soundboard). The raw audio files were downsampled to a rate of 44.1 kHz to mono waveform (WAV) files. All files were normalized to a root mean square (RMS) amplitude and were modified with respect to the stimulus length with WavelLab 10.0 (Steinberg, Hamburg, Germany).
Twenty-four original audio files (13 crying and 11 laughing vocalizations) were selected for pilot perceptual assessment by nineteen adults (students) in the pilot experiment (average age, 20.1 years; standard deviation = 3.7; range = 19-25; 10 females; none of these subjects participated further in the study). They were asked to rate each of the 24 stimuli presented in random order using the following scales (0-10): "unpleasant-pleasant", "calmingarousing", and "hardly recognized-well recognized". After the pilot study, we removed hardly recognized stimuli and selected sounds with the highest rates of pleasantness (laughter) and unpleasantness (crying) with similar rates of arousal and physical characteristics (duration, pitch, and loudness). The phoneme "Pᴂ" was also selected after the same pilot study, as it was the easiest to recognize and the most emotionally neutral prosody. One stimulus of each kind was used in all the participants.
Finally, we presented crying, laughter, and phoneme vocalizations with the following physical parameters: "crying" had a duration of 751 ms, average pitch of 973 Hz, average loudness of 39.

Procedure
Crying, laughter, and phoneme sounds were presented in a randomized sequence. Each stimulus was presented 50 times. The interval between stimuli was randomized in the range of 1500-3000 ms. The sound stimuli were presented with dynamics; the eyes of all participants were opened. The whole procedure took about 20 min.

EEG Registration
Participants were placed in a sitting position in an acoustically and electrically isolated chamber during the recording. In the resting-state session, they were instructed to close their eyes, remain calm, and avoid falling asleep or engaging in any movement, speech, or other activity. EEG was recorded using a 19-channel Encephalan EEG amplifier (Medicom MTD, Taganrog, Russian Federation). The amplifier bandpass filter was nominally set to 0.05-70 Hz. Continuous EEG was recorded with 19 AgCl electrodes located according to the International 10-20 system with an average mastoid reference. The sampling rate was 250 Hz, with impedances below 10 kilohms. Eye movements were recorded with additional electrodes located above and below the left eye (for vertical eye movements) and lateral to both lateral canthi (for horizontal eye movements).

EEG Preprocessing
An independent component analysis (ICA)-based algorithm with the EEGLAB [26] plugin for MatLab 7.11.0 (MathWorks, Natick, MA, USA) was used to filter eye movement artifacts out of the continuous EEG corresponding to the resting-state session of each subject. Muscle artifacts were removed with manual data inspection. Finally, we analyzed the data of 50 children. The continuous resting-state EEG of each subject was filtered with a bandpass filter set to 0.5-30 Hz. Then the artifact-free EEG epochs underwent the Fast Fourier transform (FFT), which was used to calculate the power spectral density (PSD).

ERP Analysis
The next stage included analysis with EEGLAB 14 (a Matlab toolbox). The data were filtered with a 1.6 Hz highpass filter, 30 Hz low-pass filter, and 50 Hz notch filter. The reference electrode was changed to a common average reference. Ocular and muscular artifacts were removed with ICA. The EEG was segmented to 1000 ms epochs starting from 200 ms before the stimulus onset. Individual ERP component traits (e.g., latency and amplitude) were extracted for further analyses. We measured and analyzed the amplitudes and latencies of the following ERP components: P100, N200, P200, P270, late positivity (LP), and N400. Each component was selected for each subject based on the topographical distribution of the grand-averaged ERP activity. In cases where the peak was not detected for the adjusted electrodes and latencies, we considered that the participant did not have the ERP component. Fz, F3, F4, Cz, C3, and C4 electrodes were used for the analysis of the P100 component (with the latency of 50-150 ms), N200 component (120-220 ms), P200 component (180-300 ms), and P3a (250-400 ms). Cz, C3, C4, Pz, P3, and P4 electrodes were selected for analysis of the LP (450-650 ms) and N400 components (400-600 ms). The electrodes chosen as the ERP components of interest usually have frontocentral or central-parietal-occipital localizations [27,28].
We could easily visualize the LP components in individual ERPs of TD children only for crying sounds; however, we could not visualize the individual LP components for sounds of laughter and phoneme in the TD group or for any sound in children with ASD. To analyze the difference between the LP component, we calculated the square under the curve (S LP ) for latency 450-650 ms and y = 0. If the value under the curve and upper ordinate (y = 0) could not be found, it was estimated to be 0. To calculate ERP dif-ferences between crying and laughter, we also calculated the square between curves for crying and laughter on the latency of 400-650 ms (S diff ). If the curve on the latency of 400-650 ms was more positive for laughter than crying (e.g., in some children with ASD), the S diff was negative. S P150-450 was calculated using the square under the curve for a latency of 150-450 ms and y = 0. When the ERP components were discriminated for each participant (or if the subject did not have an ERP component), we also calculated the latency and amplitude of each component in each electrode to evaluate the topography of differences.

Peak Alpha Frequency (PAF)
To calculate the PAF, we selected 1.25 s EEG fragments beginning from stimuli onset and finally received 47.8 ± 1.4 EEG fragments (trials) for laughter and 46.9 ± 1.6 EEG fragments for crying. In a similar way, we selected 48.2 ± 1.2 s EEG fragments for phonemes. After visual inspection from 42 to 50 trials for each type of stimuli, each participant was used for further analysis. The difference in duration between emotional sounds and phonemes contributed to the attempt to reduce the possible effect of the stimuli duration. We also selected 50 ± 1.25 s and 50 ± 0.9 s resting-state EEG fragments. The calculations were made with MATLAB (MathWorks, Natick, MA, USA) using the hamming window with 50% overlap between contiguous sections for each trial separately, and then averaged.
PAF identification was conducted using FFT. The PAF was estimated as a value of frequency with maximal PSD from the range of 8-13 Hz based on the frequency discretization data. If no peak was present, it was not counted. Due to the absence of differences between resting-state PAFs (rsPAFs) calculated for resting-state EEG fragments with different durations (p = 0.92), we calculated the mean rsPAFs for each subject and used them for further analysis.

Statistical Analyses
Statistical analyses were conducted with STATIS-TICA version 13 (StatSoft Inc, Tulsa, OK, USA). Differences between groups were assessed with repeated measures analysis of variance (ANOVA) with Tukey's post hoc comparison (p < 0.05). Repeated measures ANOVA on the amplitude and latency of each component was conducted with emotional vocalizations (crying and laughter) and phonemes. Degrees of freedom for F-ratios were corrected according to the Bonferroni method. To con-duct statistics on the PAF, repeated measures ANOVA for merged PAF values were applied. Correlation analysis between EEG parameters (S diff , S P150-450 , S LP , PAF) was conducted for all children and for each group separately to assess the effect. Spearman's rank correlation was used to evaluate the relationships between EEG values (p < 0.05). The post hoc comparisons were adjusted for multiple comparisons by Bonferroni correction. All analytical steps were performed with STATISTICA version 13 and scripts implemented in MATLAB R2018b.

Group Differences in ERPs
The ERPs of the TD and ASD groups had similar structures; however, the ERP components of children with ASD and TD children had some specific differences (Fig. 1). The differences in ERP components between children with ASD and TD children were most pronounced in the central, occipital, and parietal areas (see Fig. 1). The positive component P100 had a significantly higher amplitude in TD children for all types of stimuli (F(1, 47) = 13.982, p = 0.00009). The N200 had a significantly higher amplitude in the ASD group (F(1, 47) = 12.874, p = 0.00088). For latency of 200-400 ms, both groups of children had positive components; however, for the emotional sounds, TD children had two positive components P200 (P200, P3a), whereas children with ASD had a single P200 component (F(1, 47) = 13.272, p = 0.00012). During phoneme presentation, both groups had a single P3a; however the amplitude was significantly higher in TD children (F(1, 47) = 14.025, p = 0.00006). There were no significant differences in peak latencies. Detailed information is presented in Table 2.

Differences between Emotional Tones of Stimuli
We have plotted ERP curves for emotionally different stimuli i both groups (Fig. 2). TD children had a significantly higher amplitude of the N400 component during laughter compared to crying (F(1, 47) = 12.119, p = 0.00051) located in the left frontal and temporal areas. Children with ASD did not have significant differences in N400 amplitude between crying and laughter. At the same time, children with ASD showed a significantly higher amplitude of the N400 component for emotional sounds com-pared to neutral phonemes. Similar differences were found in the TD group between laughter and phonemes (F(1, 47) = 15.836, p = 0.00002). At the same time, the emotional sound of crying induced the LP component (or other equivalent) on latencies from 450 to 650 ms only in TD children, whereas for other sounds (phonemes and laughter), the positive peak could hardly be detected. We also could not visualize the individual LP components in children with ASD.
To analyze the difference between LP components for different sounds and groups, we calculated the square under the curve for a latency of 450-650 ms and y = 0 (S LP ). The results showed that S LP in TD children was significantly higher for crying sounds compared to laughter and phonemes (F(1, 47) = 14.239, p = 0.00006). In the ASD group, S LP values for different sounds did not differ statistically. S diff was significantly higher in TD children compared to children with ASD (F(1, 47) = 17.016, p < 0.00001). S P150-450 was significantly higher in TD children compared to children with ASD for each type of sound: phoneme (F(1, 49) = 12.064, p = 0.00049), crying and laughter (F(1, 47) = 16.228, p < 0.00001; F(1, 47) = 18.354, p < 0.00001). No significant differences were found for any of the stimuli between boys and girls within groups.

Peak Alpha Frequency (PAF)
The results showed that PAF significantly increased during emotional stimuli perception only in TD children, whereas children with ASD did not show any significant difference between rest and stimulation (condition(3) × group effect F(2, 94) = 9.675, p = 0.0001, post hoc Bonferroni p < 0.0072 in TD children and p > 0.19 in children with ASD). The PAFs also did not differ between the ASD and TD groups during the resting state and phoneme perception (Fig. 3). We also found significant differences between PAFs for crying and laughter only in TD children (condition(2) × group effect F(1, 49) = 10.975, p < 0.0001, post hoc Bonferroni p < 0.0035 in TD children).
We also found significant differences between PAFs for crying and laughter (F(2, 94) = 10.9754, p = 0.00009) only in TD children.

Correlation between EEG Parameters
During the assessment of individual ERPs, we found that in the case when a child had pronounced complex P2-P3a, the difference between crying and laughter at the late latencies was also pronounced. We hypothesized the relationship between positivity on the latency of 150-450 ms and the differences in processing between crying and laughter on the later latencies. The results showed that S P150-450 was positively correlated with S diff and S LP and the differences in PAFs between crying and laughter. The results of the correlation analysis are presented in Table 3.

Discussion
The results showed that compared to TD children, children with ASD were characterized by specific features of perception of sound stimuli, some of which were nonspecific and concerned with both emotionally significant and neutral stimuli, whereas others were associated with peculiarities of perception of laughter and crying sounds. In particular, children with ASD had a lower amplitude of the P100 component and a higher amplitude of the N200 component for all stimuli types (both emotionally significant and neutral) compared to the control group, which corresponded with previously identified features of perception of emotionally significant stimuli and phonemes in children with ASD [16,29] and indicated specific features of sound perception in individuals with autism [30]. Changes in the amplitude and latency of P100 and N200 components in subjects with ASD have been extensively studied by researchers in the context of features of perception of complex stimuli that require a considerable amount of cognitive effort from children with autism, particularly activation of attention and memory processes [31,32]. As previously shown, the differences in amplitude of these components identified in our study were more likely related to nonspecific features of sensory stimulus analysis than to emotional perception [32].
The other peculiarity of EEG in children with ASD was attributed to the valence of the nonspecific response to emotional stimuli. In particular, careful analysis of individual ERPs revealed that a complex of positive components was detected only in TD children presented with emotionally significant stimuli, which we labeled P200-P3a, whereas children with ASD had only one component when presented with emotionally significant stimuli. It was difficult to unequivocally distinguish between P200 and P3a in ASD individuals, but we considered P3a to be dominant over P200. Such an effect was previously shown in an oddball paradigm and is considered to be linked to challenged stimulus recognition [33]. In our paradigm, all stimuli had the same frequency, so this effect could be explained with challenged recognition of the stimuli (i.e., TD children recognized the same sounds of crying and laughter as repetition, while ASD children found it to be complicated). This was confirmed by previous works showing greater trialto-trial variability (thus, complicated perception of repetitive stimuli) in individuals with ASD [34]. According to some studies, the presence of a double-positive complex in TD children can be regarded as a nonspecific response to emotional stimuli and is a consequence of the activation of cognitive processes necessary for analysis of the stimulus [33,35]. The emotional nature of the double peak in children in TD children can also be confirmed by our results, according to which a single-positive complex was detected when a neutral phoneme was presented to both TD children and children with ASD. Amp, Amplitude; Lat, Latency; ERP, event-related potential; LP, late positivity.

Table 3. Spearman's rank correlation coefficient (N -number of subjects, R -Spearman's rank correlation coefficient) in all children
(over all subjects), the TD group and the ASD group.
All children TD group ASD group The presence of the P2-P3a complex in children in the control group appeared to be closely related to the formation of later components, such as the LP and N400, which is associated with the analysis of the valence of emotionally significant stimuli [36]. In particular, it was found that differences between laughter and crying sounds were observed only in TD children and manifested in a greater amplitude of the LP component and a smaller amplitude of the N400 component for crying sounds compared to laughter. Our results are consistent with previous findings showing that a higher N400 amplitude is associated with the processing of emotionally incongruent stimuli [37] and emotional vocal expressions [21]. The results of these studies indicate that increases in N400 amplitude are typically observed when analyzing more complex emotional stimuli that have either some incongruent or verbal component or require analysis of subtle social relationships. Laughter versus crying is just such an emotion [38]. Regarding the differences between the valences of the two emotional states, it has been shown that an increase in LP can be explained by a direct response to the unpleasantness of an emotionally significant sound stimulus [19,39], that is the involvement of an emotional response to the stimulus.
Regarding the relationship between the presence of a double-positive P2-P3a peak and the S diff value reflecting differences between stimulus valences at the ERP level, we suggest that the process of analyzing emotionally significant sound stimuli is a sequential activation of different brain structures and any change in this sequence may be accompanied by disturbances in the perception of emotionally significant sounds. As a result, deviations in the early stages of stimulus analysis, which we see in changes in early ERP components [40], lead to disturbances in the later stages of stimulus analysis and, consequently, to activation of other brain structures. As a result, analysis of sound stimuli of different valence in subjects with ASD leads to the activation of other neural networks. In particular, functional magnetic resonance imaging studies have shown significant differences between neural networks during the processing of sad and happy auditory stimuli in individuals with ASD and typical participants [41,42]. Early impairments in the processing of emotionally significant stimuli that form in the early stages of child development result in features of auditory emotional perception that can be observed even in highly functional autistic individuals [43]. However, by beginning intervention in the early stages of perceptual formation, we may be able to modify the incorrect stimulus analysis process and influence the formation of later and more specific emotional perception processes.
The study had some limitations. First, it is difficult to communicate with children with ASD, so the extent of their compliance with the instructions to avoid thinking of anything specific and just listen to the sounds (that were not interesting by themselves) remains unknown. Second, the phoneme [p] itself may be quite different from the emotional stimulus; thus, it may be useful to use another control stimulus in further studies.

Conclusions
We compared the auditory emotional perception in children with ASD and TD children with similar IQ levels and used ERP analysis to study the sounds of crying, laughter, and neutral phonemes. Our findings indicated three levels of differences between the TD and ASD groups associated with a latency of ERP response. First, children with ASD had a lower P100 and higher N200 during the perception of both emotional and nonemotional sounds. Second, positivity on the latency of 150-450 ms was significantly more pronounced in TD children, and their ERP response to emotional sounds consisted of two components, P200 and P3a, unlike ASD children. Finally, the difference in ERP response between crying and laughter was found only in TD children and was associated with the amplitudes of late components (LP and N400) and PAF. We also found a correlation between higher positivity in the period of 150-450 ms and differences between valences of emotional stimuli.

Availability of Data and Materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Author Contributions
GP has designed the study, analyzed the data, and produced the first draft. IS has participated in data collection and text revision. LM has participated in data analysis and text revision. All authors have read and approved the final manuscript. All authors have participated sufficiently in the work and agreed to be accountable for all aspects of the work.

Ethics Approval and Consent to Participate
The study has been approved by the Institute of High Nervous Activity and Neurophisiology of RAS (Protocol No. 2 from 20.04.2016). Participants' parents provided written informed consent for study participation.