1 School of Software, South China Normal University, 528200 Foshan, Guangdong, China
2 Research Center for Brain–Computer Interface, Pazhou Laboratory, 510330 Guangzhou, Guagngdong, China
Abstract
Background: Sleep spindles have emerged as valuable biomarkers for assessing cognitive abilities and related disorders, underscoring the importance of their detection in clinical research. However, template matching-based algorithms using fixed templates may not be able to fully adapt to spindles of different durations. Moreover, inspired by the multiscale feature extraction of images, the use of multiscale feature extraction methods can be used to better adapt to spindles of different frequencies and durations. Methods: Therefore, this study proposes a novel automatic spindle detection algorithm based on elastic time windows and spatial pyramid pooling (SPP) for extracting multiscale features. The algorithm utilizes elastic time windows to segment electroencephalogram (EEG) signals, enabling the extraction of features across multiple scales. This approach accommodates significant variations in spindle duration and polarization positioning during different EEG epochs. Additionally, spatial pyramid pooling is integrated into a depthwise separable convolutional (DSC) network to perform multiscale pooling on the segmented spindle signal features at different scales. Results: Compared with existing template matching algorithms, this algorithm’s spindle wave polarization positioning is more consistent with the real situation. Experimental results conducted on the public dataset DREAMS show that the average accuracy of this algorithm reaches 95.75%, with an average negative predictive value (NPV) of 96.55%, indicating its advanced performance. Conclusions: The effectiveness of each module was verified through thorough ablation experiments. More importantly, the algorithm shows strong robustness when faced with changes in different experimental subjects. This feature makes the algorithm more accurate at identifying sleep spindles and is expected to help experts automatically detect spindles in sleep EEG signals, reduce the workload and time of manual detection, and improve efficiency.
Keywords
- sleep spindle
- multiscale
- elastic time window
- spatial pyramid pooling
Sleep spindles are sinusoidal periodic pulse trains that occur during sleep with a frequency range of 11–16 Hz. The American Academy of Sleep Medicine (AASM) [1] noted that sleep spindles are the hallmark feature of non-rapid eye movement (NREM) sleep [2, 3]. Identifying spindles is of utmost importance in the fields of sleep staging research and clinical disease diagnosis [4, 5]. Extensive studies have revealed significant correlations between the quantity and density of spindles and a range of diseases, including obstructive apnea-hypopnea syndrome [6] and schizophrenia [7, 8]. Moreover, sleep spindles are closely related to sleep maintenance and learning and memory [9, 10, 11]. Electroencephalograms (EEGs) carry important information about brain electrical activity and may reveal many pathologies. EEG analysis has proven to be a powerful tool in sleep research and neurological disease diagnosis [12], with different frequencies broken down into several waveforms, such as alpha [13] and beta [14], as well as events, such as K-complexes [15] and sleep spindles. Currently, extracting appropriate high-level features from EEG signals and detecting sleep spindles based on these features can be roughly divided into three methods.
In clinical diagnosis, spindle detection relies mainly on experienced doctors identifying spindles by counting the number of times the oscillation waveform reaches a peak within a period of time through naked eye observation based on the definition of spindle waves [16], which is the so-called golden rule. However, manual detection is costly and time-consuming, and the accuracy of detection relies heavily on the subjective experience of doctors [17]; therefore, detecting spindles using the golden rule is highly challenging.
Presently, the predominant approach for automatic spindle detection relies on template matching rules [18]. These rules involve identifying the amplitude or time-frequency characteristics of spindle events and employing fixed or adaptable thresholds for prediction purposes. Scafa et al. [19] proposed a personalized sleep spindle wave detection program (PSSD). By determining the optimal set of spindle wave features, using a support vector machine algorithm to distinguish between spindle waves and nonspindle waves and verifying the results on the DREAMS dataset, the sensitivity and specificity reached 98.6% and 98.1%, respectively. Kinoshita et al. [20] proposed a sleep spindle detection method that combines the wavelet synchronized squeezing transform (SST) and random undersampling boosting (RUSBoost). The sensitivity on the MASS dataset is 76.9%, and the accuracy is 61.2%. Wang et al. [21] proposed an automatic spindle detection algorithm based on Matching Pursuit (MP) and least squares boosting (LSBoost). The accuracy on the DREAMS dataset is 94.7%, which is highly sensitive.
In recent years, deep learning has become increasingly prominent in analyzing big data tasks. Although deep learning methods are highly dependent on data and have low interpretability [22], they require optimal algorithm selection, model parameter adjustment, and network layer setting. It takes considerable time and requires developers to have a large knowledge reserve. However, the use of deep learning to solve optimization classification problems is still mainstream in today’s society because deep learning methods have strong generalization capabilities and adaptive capabilities [23]. You et al. [24] proposed a spindle detection method with an attention module (SpindleU-Net) based on the U-Net framework. The sensitivity was 86.6%. With the exploration and development of convolutional neural network (CNN) technology, CNNs have been widely used in speech recognition processing, image processing, machine vision, natural language and other fields. In automatic sleep spindle detection, Kulkarni et al. [25] fused a CNN and a Recurrent Neural Network (RNN) and proposed a deep learning method (SpindleNet) for real-time sleep spindle detection. The sensitivity of the MASS dataset was 90.07%, and the specificity was 96.19%.
Using template matching rules to detect spindles [26, 27], a fixed-length time window is usually used to extract EEG signals, but due to the uncertainty and individual variability of spindles in duration and formation, they are fixed. A long window may not be able to accurately locate and completely segment all spindles, resulting in missed or false detections. Using elastic time windows to extract multiscale features [28] and processing inputs of arbitrary length [29] can adaptively adjust the length of the time window according to the local characteristics of the signal, decompose the signal into subsignals of different scales, and extract corresponding features at each scale, thereby providing a more comprehensive description of the time-frequency characteristics of the signal and improving the sensitivity and accuracy of spindle wave detection. In addition, spindle signals are often rare, and nonspeckle signals account for the majority of spindle signals [30]. The automatic detection of sleep spindles using deep learning methods may be affected by imbalanced data [31], resulting in poor spindle detection results. The multiscale pooling of spatial pyramid pools [32] can increase the attention given to spindle wave signals and improve the accuracy of spindle wave detection. The multiscale depthwise separable convolutional network structure [33] has a strong representation learning ability and can automatically learn more discriminative feature representations from data.
Huo et al. [34] used the multiscale depthwise separable convolution (DSC)-spatial pyramid pooling (SPP) method for smoke image detection and achieved good results. Therefore, they believe that the multiscale DSC-SPP method shows excellent performance in the field of image recognition. After this research, we applied the multiscale DSC-SPP method to spindle detection and proposed a spindle detection algorithm that combines elastic time windows and spatial pyramid pooling to extract multiscale features. An elastic time window is utilized to accommodate the widely varying durations of spindles in EEG. This enables the extraction of regulated deep features from EEG epochs with variable lengths using DSC with spatial pyramid pooling. The effectiveness of the proposed method for automatic sleep spindle detection has been demonstrated through experimental results on the publicly available DREAMS dataset. Specifically, our contributions in this article can be summarized as follows.
• We designed elastic time windows to extract multiscale features of the EEG signals to reduce the damage to the overall structural information of the spindles.
• We integrate spatial pyramid pooling into depthwise separable convolutions and perform multiscale pooling on spindle features to enhance representation to solve the problem of spindle number imbalance in EEG signals.
• We propose the Multiscale DSC-SPP method as a solution for automatic sleep spindle detection, and its efficiency has been validated on the widely used DREAMS dataset.
In this section, we comprehensively introduce a detailed overview of the multiscale DSC-SPP architecture, which consists of several interrelated modules that work together to extract EEG signal features after multiscale feature input based on spatial pyramid pooling. The DSC effectively extracts the deep features of sleep spindles and achieves excellent spindle detection performance.
The proposed multiscale DSC-SPP architecture is shown in Fig. 1. First, multiscale feature extraction is performed on the preprocessed EEG signals through 0.5 s, 0.75 s and 1 s elastic time windows, and the signals are subsequently input into a depthwise separable convolution based on spatial pyramid pooling to obtain the depth features of signals of different scales. The fused features generated by the concatenate layer are subsequently fed to the sigmoid layer for spindle detection. In Depthwise Separable Net (DSNet), we incorporate dense connections, which help to minimize the gap between the input and output layers. This approach facilitates smoother propagation of gradients and feature information throughout the network. In addition, when there is less training data, dense connections also play a role in regularizing the model and reducing the risk of overfitting. The following sections demonstrate the proposed fusion architecture and its modules in detail.
Fig. 1.The framework of the proposed multiscale DSC-SPP model. The network first consists of three different DSC networks, running different context input sizes at the same time. The output of each part is then passed through the maximum pooling layer and regularization. The SPP layer is installed between the convolutional layer and the fully connected layer. Then, the three different feature outputs are combined at the end of the network to obtain rich spindle features. Finally, the sigmoid function is installed in the last fully connected layer to determine whether spindles exist in the feature. DSC, depthwise separable convolutional; SPP, spatial pyramid pooling.
The sliding window is a common data processing technique that is easy to implement and understand. By defining the window size and sliding step, the spindle signal can be segmented into multiple windows for processing. Feature extraction algorithms can be applied to each window to extract spindle-related features such as frequency, amplitude, and duration. However, when using sliding windows for spindle detection, it is important to select an appropriate window size and sliding step, as the duration of spindles can vary, typically ranging from 0.5 seconds to 3 seconds. This ensures that the windows adequately cover the time period of the spindle. The computational complexity of sliding windows increases with increasing window size.
In time series data, the elastic time window [35, 36] can be viewed as a sliding window of variable length. Unlike traditional fixed-size windows, elastic time windows can adaptively adjust their length according to the characteristics of the data, thereby better capturing key information in the data. This approach allows the use of different strides to split consecutive time windows in the same dataset, thus creating multiple scale datasets. The calculation formula for the elastic time window is as follows:
where M and S are the fixed window size and step size, respectively, and K is the coefficient adjusted according to different scales.
Multiple datasets can be generated by slicing and segmenting the raw signal and labels using different step sizes and window sizes. Specifically, for the i-th scale, the window size is M * i, and the step size is S * i.
In the process of data slicing and scaling, the time interval between windows is controlled by adjusting the step size, thereby achieving elastic partitioning of the dataset. Different step sizes will affect the degree of overlap between windows and thus affect the training effect of the model. The elastic time window can help the model capture more time series features and improve its generalizability.
According to recent research, multiscale feature learning [37] has demonstrated significant potential in diverse applications, including scene parsing and medical diagnosis [38, 39]. The fundamental concept behind multiscale feature learning involves building multiple neural networks simultaneously with varying context input sizes. The features extracted from these models are subsequently combined in a fully connected layer [40]. By analyzing kernels at different scales, multiscale feature learning aims to capture a broader range of pertinent features and estimate the spatial map associated with the input image [41].
To construct feature maps of different scales input to the network, this paper uses elastic time windows to segment the EEG signals. Slices with different step sizes (0.5 s, 0.75 s, and 1 s) were used to generate three training sets of different scales. The window sizes of each training set are 100, 150 and 200.
Depthwise separable convolution has proven to be a successful technique in neural image classification because it helps eliminate redundant features and significantly reduces the number of parameters required [42]. Unlike standard convolution, depthwise separable convolution breaks down the feature extraction process into two simpler steps: depthwise convolution and pointwise convolution [43]. The entire process is illustrated in Fig. 2.
Fig. 2.Depthwise separable convolution. DConv, Depthwise Convolution; PConv, Pointwise Convolution.
Convolutional layers that operate in a sliding window fashion are adaptable to varying input sizes, whereas fully connected layers mandate fixed-size inputs to function properly. Due to this limitation, traditional DSC usually requires that the input images be the same size when processing image data. To meet this requirement, images are typically resized using resampling operations such as compression or stretching so that they have the same dimensions [44]. However, this operation introduces certain errors and leads to the loss of useful information in the original image, which may affect the model’s recognition of the image.
To solve the above problems, this paper improves the DSC network structure by adding an SPP layer before the fully connected layer. The SPP layer can extract spatial feature information at different scales, increasing the robustness of the model to the spatial layout and object deformation of the image.
By combining the design of elastic time windows and SPP layers, our model can adapt to input images of different sizes, better handle time series data, and improve the performance of the model in the spindle detection task.
In terms of object recognition tasks, the SPP [45, 46] has shown significant advantages in practice. SPP can generate a fixed-length output regardless of the size and dimensions of the input image. Compared with the traditional sliding window method, SPP applies a multi-level spatial pyramid, which can consider object information at different scales and improve the performance of target detection and recognition, especially with good adaptability to targets of different scales. At the same time, the SPP network allows us to generate images for testing from images of any size and supports inputting images of different sizes and ratios during the training process. Therefore, by training with variable-sized input images, the model’s invariance to the input image size can be improved, and the risk of overfitting can be reduced.
Furthermore, the SPP can combine features obtained at variable scales with the flexibility of the input scale. By embedding SPP [47] before the fully connected layer in DSC, the input feature map is divided into several parts, and features are extracted from each part. As shown in Fig. 3, any given feature map can be split into n * n subsets using spatial bins of size n, and a fixed-size vector is generated by selecting the maximum value from each spatial bin. Each feature map is then pooled multiple times, and its output vectors are concatenated to produce a one-dimensional output vector of the feature map. The key principle is to assign different numbers of spatial bins to input feature maps of different sizes. The spatial bin-pooled features from all the filters are flattened and connected to create a final feature representation of consistent length. This approach enables the model to generate fixed-length representations irrespective of the size or scale of the input features [36].
Fig. 3.Spatial pyramid pooling diagram. The SPP module captures information from various subregions at different scales. By fusing the information from various subregions within these receptive fields, a more robust representation can be obtained. Here, 64 refers to the number of filters in the final convolutional layer.
Assuming that
In the SPP operation, the input feature map is divided into multiple subregions, and each subregion is pooled independently to form a fixed-size output vector.
As shown in Fig. 4, this paper uses three SPPs with sizes of 1 * 1, 2 * 2 and 4 * 4. For an input image of size N * N, LN’ * N’ feature maps are obtained in the last convolutional layer, where L is the number of filters [49]. To process input images of different sizes, the SPP layer method is used. In the SPP layer, the feature map is divided into three different levels of spatial bins (1 * 1, 2 * 2, and 4 * 4) and processed through max pooling of the corresponding sizes. Then, a representation vector of length pL is generated as the output of the SPP layer, where L and p are predefined hyperparameters. Therefore, regardless of the size of the input image, a fixed-length pL vector can be generated as the input of the fully connected layer. It has been proven in the literature that the SPP layer is not sensitive to the performance changes exhibited by different settings of the spatial bin [50].
Fig. 4.DSC network structure with the SPP layer. In the SPP layer, the feature map is partitioned into three levels of spatial bins with sizes of 1 * 1, 2 * 2 and 4 * 4. Each bin is then subjected to max pooling with the corresponding size. The SPP layer produces a representation vector of length pL as its output.
We validated our approach on the DREAMS dataset (https://zenodo.org/records/2650142#.YRtw6o4zY2w) from the Sleep Laboratory of
Andr vsamsale Hospital in Belgium [51]. The DREAMS dataset comprises 30-minute
polysomnography (PSG) excerpts obtained from eight subjects, including four males
and four females aged 45.88
| Tagged channel | Length (s) | Sampling rate (Hz) | Nr. Spindles labeled by expert 1 | Nr. Spindles labeled by expert 2 | Nr. Interannotator-agreed spindles | |
| Excerpt 1 | C3-A1 | 1800 | 100 | 52 | 115 | 135 |
| Excerpt 2 | CZ-A1 | 1800 | 200 | 60 | 52 | 77 |
| Excerpt 3 | C3-A1 | 1800 | 50 | 5 | 44 | 44 |
| Excerpt 4 | CZ-A1 | 1800 | 200 | 44 | 25 | 63 |
| Excerpt 5 | CZ-A1 | 1800 | 200 | 56 | 86 | 103 |
| Excerpt 6 | CZ-A1 | 1800 | 200 | 72 | 87 | 117 |
| Excerpt 7 | CZ-A1 | 1800 | 200 | 18 | - | - |
| Excerpt 8 | CZ-A1 | 1800 | 200 | 48 | - | - |
The table shows that each excerpt in the DREAMS dataset is 1800 * 200. In our work, we resampled excerpt 1 and excerpt 3 to 200 Hz using a configurable sampling algorithm for t-wise interaction sampling [52]. For training and validation, we utilized the first six excerpts from the dataset.
In the suggested framework, the Adam optimizer is used to optimize the model by starting with a learning rate of 0.0001 and a batch size of 64 [53]. Additionally, dropout and early stopping techniques are implemented to prevent overfitting. The dropout strategy has demonstrated superior performance compared to other regularization methods [54]. A specified percentage of the input units in each layer (excluding the first layer) is randomly set to zero. Throughout our experiments, we empirically established a dropout rate of 0.2.
In our approach, we adopt a subject-independent methodology for training and testing the model. This implies that the data in the training set and the data in the test set are sourced from different subjects. To ensure nonoverlapping testing, we employed a 6-fold cross-validation scheme in which each fold involved the use of data from different subjects for testing, while the remaining subjects’ data were used for training. This process allowed us to thoroughly evaluate the model’s performance across all subjects in the dataset. As part of the 6-fold cross-validation, we divided the dataset such that 70% of all the samples were used for training and the remaining 30% were used for validation in each fold. During training, we also included 30% of the data as independent validation samples for early stopping purposes. We then averaged the results obtained from each fold to evaluate the overall performance of the model. This approach is described in detail in third order cumulant function (ToC) [55].
The model utilizes the sigmoid function to produce a probability ranging from 0 to 1, indicating whether a given EEG epoch belongs to the main axis. During the training phase, the probability obtained from the model is applied to each data point within the epoch. In the testing phase, the EEG signal is divided into segments three times, with each segment having a different time window. The time windows for segmentation wesssre 0.5 seconds, 0.75 seconds, and 1 second. As a result, each data point will have three distinct probabilities. For a given excerpt x consisting of n data points, the final output for each data point is determined by selecting the highest probability value out of the three distinct probabilities associated with that data point:
where
The occurrence of spindles in the DREAMS dataset was distributed across stage 2 or stage 3 NREM (N2, N3) sleep. The density and duration of these spindles varied depending on the subject and their respective sleep pathology. Due to the different lengths of the spindles, the predicted spindles and the annotated spindles are not accurately matched. When the overlap between the predicted spindles and the annotated spindles significantly exceeded a specific threshold, the study treated them as spindles. Intersection-over-union (IoU) is utilized to quantify the overlap between predicted and annotated spindles, providing a measure of their similarity.
where
Fig. 5.A simplified diagram illustrating the ground truth and detection process. The “ground truth” corresponds to the “annotated”, while “detection” corresponds to the “predicted”. The intersection of the ground truth and detection data is referred to as an “union”, while the union of the two is called an “intersection”. Predicted spindle waves with an IoU greater than the threshold are labeled “TP”. TP, true positive; IoU, intersection-over-union.
We marked true positives by calculating the IoU overlap between the predicted spindle waves and expert-labeled spindle waves. Therefore, our model outputs label the predicted spindle waves and outputs the number of predicted spindle waves rather than a spindle length identifier.
When working with imbalanced datasets, a high accuracy may not necessarily indicate superior performance of the spindle detection model due to the larger proportion of negative samples. We calculate six commonly used evaluation indicators, accuracy, precision, F1-score and negative predictive value (NPV), expressed as follows:
(1) Advanced Performance Solutions: In recent years, many scholars have proposed effective spindle detection algorithms based on the DREAMS dataset. Likewise, we further propose a new spindle detection method for multiscale feature extraction. Compared with the solutions of other scholars, our proposed solution has the best overall performance.
(a) Sun et al. [57] proposed a multi-instance learning framework based on convolutional neural networks, CNN- Multiple Instance Learning (MIL). This framework assumes that only a portion of each labeled spindle segment contains real spindle patterns and learns spindle-related features by distinguishing informative instances in the feature learning stage.
(b) Jiang et al. [58] proposed a two-stage method. In the predetection stage, the Teager energy operator with adaptive parameters is used to discover as many candidate sleep spindles as possible, and a classifier is used to further identify the true sleep spindles among all the candidate spindles.
(c) Kulkarni et al. [25] proposed a deep learning strategy, SpindleNet. The fixed-scale features learned by the CNN are further passed to the RNN, and the RNN output (from 50 time steps) of subnet 1 is combined with the RNN output feature of subnet 2 to achieve spindle multiscale feature fusion.
(d) Chen et al. [36] proposed a method of mixing EEG signal features using an elastic time window to adapt to significant changes in spindle duration and mixing depth features with the information entropy of EEG signals for spindle detection.
(2) Baseline models: For more effective verification, we implemented three baseline models to further compare with published solutions and to benchmark our proposed solutions.
(a) In the pre-detection stage, we use 0.5 s, 0.75 s and 1 s elastic time windows to segment the EEG signals to obtain EEG signals of different scales, which will be used to obtain multiscale characteristic spindles.
(b) We add the SPP layer in the feature fusion stage to fuse inputs of different scales into a fixed-size feature matrix. Specifically, three spatial bins of different sizes (1 * 1, 2 * 2, and 4 * 4) are set in the SPP layer, generating a 21 * N output.
(c) We use the DSC variant as the basic model of the CNN. The network consists of three repeated DSC layers, a concatenate layer, an SPP layer, and a fully connected layer. Specifically, for each DSC, 3 * 3 kernels are used, ReLU activation is used with a stride of 1, and the output range is set between 0 and 1 through the sigmoid function.
To gain a better understanding of the contributions of each component in our framework, we conducted a series of ablation and analysis experiments. These experiments aimed to determine the impact of different modules on the performance of the final model. Overall, our research findings provide evidence for the effectiveness of the proposed framework and emphasize the importance of considering multiple scales in spindle wave detection.
The performance of a model utilizing an elastic time window design is evaluated and compared to that of a model with a fixed-length window (0.5 s) in this series of experiments. As shown in Table 2, compared to the method using a fixed window, the model using an “elastic” time window has higher accuracy.
| Accuracy (%) | Sensitivity (%) | Precision (%) | F1-score (%) | p value | |
| DSC | 92.84 | 49.90 | 48.46 | 37.17 | 0.1093 |
| DSC+Multiscale | 95.28 | 49.61 | 49.08 | 49.31 | 0.0352 |
| DSC+SPP | 94.87 | 49.99 | 49.85 | 49.46 | 0.0462 |
| DSC+Multiscale+SPP | 95.75 | 50.02 | 50.14 | 49.47 | - |
Moreover, to observe the impact of the IoU threshold on performance, we used
fixed windows and elastic windows to test the spindle proportion detected under
different thresholds. As shown in Fig. 6, when the threshold is greater than 0.3,
most spindles can be predicted, and compared with that of the fixed window in the
elastic time window, the proportion of spindles with IoU
Fig. 6.Distribution of IoU on the predicted spindle epochs. The pink bar represents the elastic window, while the yellow bar represents the fixed window. The proportions of detected spindle waves were tested under different IoU ranges.
In these experiments, the performance of spindle multiscale pooling at the SPP layer is evaluated and compared with that of the DSC model without the SPP. As shown in Table 2, compared with a single DSC model, the DSC model based on spatial pyramid pooling can enhance the focus on spindle wave signals and improve the overall detection accuracy.
Finally, we verify the effectiveness of the spindle detection model based on DSC with an elastic time window and the SPP in this paper. The model simultaneously considers both the temporal and spectral features, captures complete spindle wave signal segments through the elastic time window, and enhances the focus on spindle wave signals using spatial pyramid pooling, thereby improving detection accuracy.
The results are shown in Table 2. Compared with the single DSC, the improved model has better performance in terms of accuracy, precision and F1-scores, which are increased by 3%, 2% and 12%, respectively. The effectiveness of the elastic time window and SPP in the automatic spindle detection model was preliminarily verified.
To validate the significant differences in the improved algorithm, we employed the Friedman test, a nonparametric statistical method recommended by Demiar [59]. The Friedman test was used to compare the differences among multiple groups in paired samples. By calculating the ranks of each group, we determined whether there were significant differences among the evaluation metrics, indicating the presence of significant differences.
We calculated the p value for each row of DSC, DSC+Multiscale, and DSC+SPP compared to DSC+Multiscale+SPP separately. Table 2 shows that the p values for DSC+Multiscale and DSC+SPP are both less than 0.05, indicating that the improvements are statistically significant.
Fig. 7 illustrates the outcomes of the detection approach by employing expert-annotated events as the ground truth in the DREAMS dataset. Fig. 7a shows that within the 20-second raw EEG segment, the experts identified a total of 3 spindles. On the other hand, Fig. 7b shows the spindle candidates detected using the elastic time window after data filtering (the brown area represents the detected spindles). Compared with the spindle waves (red rectangular area segments) marked by experts, 5 spindle waves were detected in this step. Fig. 7c shows the final recognition result after the SPP combines the spindle wave features of each candidate. The final detection results are approximately consistent with the spindles annotated by experts. A comparison of the results in Fig. 7b,c reveals that multiscale feature extraction in elastic time windows can be used to search for ambiguous candidate spindles, whereas SPP performs multiscale feature fusion to identify accurate results.
Fig. 7.The detection results of each step are analyzed using the proposed method. (a) The blue data represent a sample of unfiltered 20-second raw EEG segments, where the red rectangular window refers to the expert-annotated spindles. (b) The blue data represent filtered EEG data, the brown rectangle represents a large number of sleep spindle candidate data points searched in the elastic time window, and the red rectangle represents the real spindle. (c) The brown rectangle represents the sleep spindle identified by the SPP, and the red rectangle represents the real situation. EEG, electroencephalogram.
To assess the performance of our proposed method, we compare its performance with those of four state-of-the-art methods that were previously mentioned. The compared methods include the multi-instance learning framework (CNN-MIL), a deep learning strategy (SpindleNet), a two-stage method (two-stage), and label noise identification (Labelwix).
When using other machine learning-based methods, all training, validation, and test sets in each experiment were the same as those used by this model to ensure a fair comparison. Table 3 (Ref. [25, 57, 58, 60]) shows the evaluation results of our proposed method and the abovementioned methods on the DREAMS dataset.
| Paper | Method | Cross-validation | Accuracy (%) | Precision (%) | NPV (%) | F1-score (%) |
| Proposed Method | Multiscale DSC-SPP | Sixfold | 95.75 |
50.14 |
96.55 |
49.47 |
| Sun et al. [57] | CNN-MIL | Fivefold | 95.38 |
47.65 |
98.36 |
47.28 |
| Kulkarni et al. [25] | SpindleNet | Fivefold | 95.37 |
44.26 |
98.12 |
44.02 |
| Jiang et al. [58] | Two-Stage | Sixfold | 95.11 |
45.37 |
98.07 |
42.46 |
| Atkinson and Metsis [60] | Labelfix | - | 95.24 |
47.60 |
97.67 |
40.09 |
NPV, negative predictive value; CNN, convolutional neural network; MIL, multi-instance learning framework.
Fig. 8 shows several examples of detection results from different methods. In each subfigure, the first row is a 20 s EEG signal, and the shaded rectangle represents the real situation. As shown in Fig. 8a,b, the predictions in this article include all true spindles, and the polarization positioning of spindle waves is more consistent with the real situation than is that of other methods.
Fig. 8.Visual comparison of the spindle detection results with existing methods. The shaded rectangle and the first black line represent the adopted ground truth (GT), which comes from expert annotations. The colored lines correspond to the occurrence of spindles predicted by the automatic method. (a) The EEG signal between 7935s and 7955s. (b) The EEG signal between 13,560s and 13,580s.
In this study, we develop a multiscale feature extraction method for spindle detection and validate it on the DREAMS dataset with simulated and real annotations. First, our method can more comprehensively capture the characteristics of spindle signals by using elastic time windows to extract multiscale features from filtered EEG data. Then, multilevel pooling is performed in the spatial pyramid pool to enhance the attention given to the spindle signal, which helps to better capture the temporal and spatial variations in spindle signals. This multiscale feature extraction strategy allows us to perform more accurate cuts of spindle signals. Subplots (b) and (c) in Fig. 7 show the complementary advantages of this work. Comparing the two figures, the elastic time window detects more spindle wave candidates but has a higher rate of false positives. To ensure the reliability of spindle identification, the SPP technique is used to eliminate nonspecle waves. This helps in filtering out false positives and enhancing the accuracy of spindle detection.
Considering that spindles have uncertainty and individual differences in
duration and deflection positioning, a fixed-length time window may not be able
to accurately locate and completely segment all spindles, resulting in missed or
false detections. Therefore, the advantage of multiscale feature extraction of
spindle waves is effective. We compared the impacts of the fixed time window and
the elastic time window on the ground truth proportion under different IoU
thresholds. The results are shown in Fig. 6. Compared with that of the fixed
window, the elastic time window has an IoU
In addition, our method is universal across disciplines because we trained our method on 6 subjects in DREAMS and evaluated the model through sixfold cross-validation to prevent model overfitting and underfitting, thereby eliminating the adverse effects caused by unbalanced data partitioning during a single partition. As shown in Table 3, compared to the state-of-the-art spindle wave detection algorithm based on template matching, our model combines multiscale and SPP techniques to fully utilize the multilevel feature information in the data. It also incorporates an elastic time window approach to capture complete spindle wave signal segments, thereby avoiding false detections or missed detections caused by incomplete signal fragments. This approach achieves a high accuracy rate of 95.75% and a higher F1-score.
Our method can be used in clinical applications, such as detecting other EEG signature waves and predicting epilepsy. In these cases, labels divided due to fixed windows may be mistaken for learned ineffective features. Through our method, labels of features at different scales can be learned, allowing more complete and accurate features to be extracted.
Although multiscale DSC-SPP has shown promising results, the individual differences in spindle waves and the various factors that can influence these differences (e.g., age, sex, sleep stage, and sleep quality) limit the adaptability of the algorithm. Our study is based on the DREAMS dataset, which still has a relatively small sample size, further limiting the adaptability of the algorithm. Therefore, to improve the adaptability and robustness of the algorithm, our next step could be to validate it on both self-collected experimental data and online datasets. In addition, spindles in subjects with sleep diseases or sleep disorders may differ greatly in duration and frequency, which is also an issue worth exploring in the future when determining the threshold for continuous data points on spindles.
We propose a spindle detection algorithm that combines an elastic time window and spatial pyramid pooling to extract multiscale features. The elastic time window allows for the extraction of multiscale features from the EEG signal, addressing the issue of fixed time windows being unable to fully segment spindle waves. The multiscale pooling of spatial pyramid pooling increases the focus on spindle wave signals and improves the accuracy of spindle wave detection. Additionally, the depthwise separable convolutional network architecture has strong representation learning capabilities, as it automatically learns more discriminative feature representations from the data. The experimental results on the DREAMS dataset for automatic spindle wave detection demonstrate that the proposed method is comparable to state-of-the-art approaches. In the future, this method is expected to be applied to clinical sleep research related to spindle characteristics.
DSC, depthwise separable convolution; SPP, spatial pyramid pooling; EEG, electroencephalogram; EOG, electro-oculogram; EMG, electromyography; NPV, Negative Predictive Value; IoU, Intersection-over-union; NREM, non-rapid eye movement.
The datasets generated or analyzed during the current study are available: https://zenodo.org/records/2650142#.YRtw6o4zY2w.
YO, JP and FW designed and conceptualized the research project. BF and LT analyzed and interpreted the data. YO, BF and LT drafted the manuscript. JP and FW played a supportive role throughout the study, offering professional optimization and guidance related to the manuscript. All the authors contributed to editorial changes in the manuscript. All the authors read and approved the final manuscript. All the authors have participated sufficiently in the work and agreed to be accountable for all the aspects of the work.
Not applicable.
We sincerely appreciate the publicly available DREAMS dataset provided by the sleep laboratory of Andr vsamsale Hospital in Belgium. We thank all of the subject except the authors from the School of Software in South China Normal University.
This research was funded by the Guangdong Basic and Applied Basic Research Foundation (Grant No. 2021A1515011853), the National Natural Science Foundation of China (Grant No. 61906019 and 62006082), STI 2030-Major Projects 2022ZD0208900.
The authors declare no conflict of interest. Jiahui Pan is serving as one of the Guest editors of this journal. We declare that Jiahui Pan had no involvement in the peer review of this article and has no access to information regarding its peer review. Full responsibility for the editorial process for this article was delegated to Gernot Riedel.
References
Publisher’s Note: IMR Press stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.








