IMR Press / JIN / Volume 24 / Issue 6 / DOI: 10.31083/JIN39023
Open Access Original Research
EEG-Based Classification of Parkinson’s Disease With Freezing of Gait Using Midfrontal Beta Oscillations
Show Less
Affiliation
1 Biomedical and Translational Sciences, University of South Dakota, Vermillion, SD 57069, USA
2 Neuroergonomics and Cognitive Engineering Lab, Oklahoma State University, Stillwater, OK 74078, USA
3 Department of Psychology, University of South Dakota, Vermillion, SD 57069, USA
4 Department of Neurology, Ludwig Maximilian University, 81377 Munich, Germany
5 Applied AI Research Lab, Department of Computer Science, The University of South Dakota, Vermillion, SD 57069, USA
6 Department of Neuroscience, Sanford School of Medicine, University of South Dakota, Sioux Falls, SD 57104, USA
*Correspondence: arun.singh@usd.edu (Arun Singh)
J. Integr. Neurosci. 2025, 24(6), 39023; https://doi.org/10.31083/JIN39023
Submitted: 14 March 2025 | Revised: 11 April 2025 | Accepted: 27 April 2025 | Published: 20 June 2025
(This article belongs to the Special Issue Advancing Neurological Care with AI and Digital Twins)
Copyright: © 2025 The Author(s). Published by IMR Press.
This is an open access article under the CC BY 4.0 license.
Abstract
Background:

Freezing of gait (FOG) is a debilitating motor symptom of Parkinson’s disease (PD) that significantly affects patient mobility and quality of life. Identifying reliable biomarkers to distinguish between PD patients with freezing of gait (PDFOG+) and those without FOG (PDFOG–) is essential for early intervention and treatment planning. This study investigates the potential of electroencephalographic (EEG) signals, focusing on well-studied midfrontal beta oscillatory feature, to classify PDFOG+ and PDFOG– using machine learning (ML) and deep learning (DL) approaches.

Methods:

Resting-state EEG data were collected from the midfrontal ‘Cz’ and nearby channels (Cz-cluster) from 41 PDFOG+ and 41 PDFOG– subjects. A range of ML and DL models, including logistic regression (LR), random forest (RF), extreme gradient boosting (XGBoost), categorical boosting (CatBoost), and long short-term memory (LSTM) models were evaluated using leave-one-subject-out (LOSO), 10-fold, and stratified cross-validation (CV).

Results:

Outcomes demonstrate that while LR achieved an area under the receiver-operating characteristic (AUC-ROC) score of 0.63, LSTM outperformed all models, achieving an AUC-ROC of 0.68 and accuracy of 0.63, particularly with the Cz-cluster configuration.

Conclusions:

These findings support the potential of midfrontal beta oscillations, particularly in combination with LSTM temporal modeling, a promising EEG-based biomarker for distinguishing PDFOG+ from PDFOG–. This work contributes to the development of more effective diagnostic tools and treatment strategies for PD-related gait impairments.

Keywords
Parkinson disease
freezing of gait
electroencephalography
beta rhythm
machine learning
deep learning
1. Introduction

Freezing of gait (FOG) is a debilitating lower-extremity motor symptom of Parkinson’s disease (PD) in which people with PD feel their feet glued to the ground and lose their ability to step forward [1]. FOG in people with PD can be seen at the early stage of the disease and in up to 60% of people with PD in the advanced stages of the disease [2, 3]. It is difficult to investigate FOG in PD patients due to its paroxysmal nature. Normally, FOG episodes can be noticed most often during turning, passing through doorways, and when performing dual tasks with high cognitive load [4]. PD patients exhibit severe gait abnormalities with the progression of disease; and at the advanced stage of the disease, levodopa and deep brain stimulation therapies become less efficient at improving gait dysfunction including FOG [5, 6]. Normally, in the clinical setting, the accuracy of differentiating PD patients with FOG (PDFOG+) from those without FOG (PDFOG–) depends on clinical features such as the presence of FOG events during gait and/or 360° turning characteristics [7, 8]. However, FOG events and turning issues may not be seen in patients at the early stage of the disease or may not be present during the clinical testing day. Also, some limitations can be noted in these studies such as small sample size, analysis of the turning phases, turning in the preferred direction, or using fewer inertial sensors. In addition, these methods may not include some of the variables that can be most sensitive to disease progression. Notably, these studies implemented clinical assessments that require patients to perform gait or turning tasks and may induce freezing episodes with higher probability of falls. Therefore, electrophysiological recordings during the resting state could be a relevant alternative approach to classify PDFOG+. Moreover, it is crucial to improve the diagnosis of PDFOG+ for prognostic and therapeutic management viewpoints with additional simpler methodologies such as electroencephalographic (EEG) recordings along with clinical examinations or assessments.

A previous study has classified people with PD and healthy age-matched controls based on different machine learning (ML) methods [9]. Classification methods using linear features from EEG signals in different frequency bands show up to 82% accuracy [10], however, nonlinear measures from EEG datasets of PD and healthy controls yield up to 95% accuracy [11]. A previous study also reported a linear-predictive-coding EEG algorithm for PD and this method detected PD versus healthy controls in a computationally fast manner with 85% accuracy [9]. While these previous studies have reported classifying people with PD from EEG signal, classification algorithms on EEG signals have not been applied to differentiate PDFOG+ versus PDFOG– extensively. EEG has been shown to be an effective method for understanding the pathophysiological features of PD in clinical settings since it is cost effective, easier to perform, and available in all neurology clinics.

Although FOG is a movement phenomenon, recent studies suggest that resting-state midfrontal beta-band activity reflects disruptions in executive and cognitive-motor integration networks involved in FOG [12, 13]. Resting EEG can therefore serve as a neurophysiological marker to distinguish FOG subtypes even in the absence of overt movement. While wearable inertial measurement units (IMUs) are effective for detecting FOG during gait tasks, our focus was to identify intrinsic neural biomarkers that may predict FOG risk independent of movement. Recent work demonstrated high FOG prediction accuracy using multimodal wearable sensors [14]. Our approach complements this by leveraging EEG to explore underlying neurophysiological contributors to FOG beyond behavioral manifestations. Particularly, EEG provides insights into cortical dynamics, particularly in midfrontal regions, and allows early detection opportunities that behavioral sensors may miss.

Overall, we hypothesize that resting-state EEG could be essential for the reliable classification of PDFOG+ and PDFOG–, and we anticipate that the cortical oscillatory dynamics may offer a novel therapeutic intervention for the prediction and alleviation of severe gait dysfunction in PDFOG+. Interestingly, studies have suggested that PDFOG+ exhibits abnormal beta oscillations in the prefrontal and midfrontal cortical and subcortical regions when performing motor and dual tasks [5, 12, 15, 16]. Thus, the resting-state cortical oscillations in these frequencies may be categorized to advance the classification methods of PDFOG+ from PDFOG–. Therefore, the aim of the current study was to examine key aspects of classification of PDFOG+ from resting-state EEG recordings. Based on our previous reports [12, 13, 15], we tested our hypothesis that midfrontal beta oscillations may be suitable features for ML models.

2. Material and Methods
2.1 Participants

For this study, we used dataset from 82 subjects with PD (41 PDFOG+ and 41 PDFOG–). Resting-state EEG signals in these subjects has been reported in previous studies [9, 17]. Dataset can also be found online http://predict.cs.unm.edu/downloads.php. All procedures were authorized in accordance with the Helsinki Declaration. Participants were recruited as part of a broader PD study conducted at the University of Iowa. All participants were clinically diagnosed with idiopathic PD based on the UK PD Society Brain Bank criteria and were classified as either PDFOG+ or PDFOG– by movement disorder specialists using standardized FOG assessment protocols. The selection process ensured demographic and clinical balance between groups. Exclusion criteria included the presence of atypical parkinsonism, comorbid neurological or psychiatric conditions, and poor EEG signal quality. All participants with PD were tested while ON their usually prescribed dose of anti-parkinsonian medication, approximately 60 to 90 minutes from the last dose because fall risk and motor instability are higher in unmedicated PDFOG+ subjects and the ON state is how people with PD are in daily life [4, 12]. The motor portion of Unified Parkinson’s Disease Rating Scale (mUPDRS) [18] and the FOG questionnaire were used to evaluate the disease severity and status of FOG in PD subjects, respectively [19]. Similar to our earlier research [4, 12, 16], participants who reported they experienced issues starting, stopping, and turning while walking and those whose freezing of gait questionnaire (FOGQ) (number 3) score was >0, indicating at least one FOG episode in the previous month, were classified as PDFOG+. Furthermore, FOG in PD subjects was confirmed by the expert movement disorder specialist. In Table 1, all individual clinical characteristics are listed.

Table 1. Clinical characteristics.
PDFOG– (n = 41) PDFOG+ (n = 41)
Age (yrs.) 68 ± 1.20 69 ± 1.30n.s.
Disease duration (yrs.) 4 ± 0.48 6 ± 0.68*
LEDD (mg/day) 722 ± 60.24 1003 ± 70.88**
mUPDRS 10 ± 0.78 17 ± 1.00**
FOGQ score 2 ± 0.20 11 ± 0.65**
MoCA 28 ± 10 24 ± 0.61*

LEDD, Levodopa Equivalent Daily Dose; mUPDRS, Motor part of Unified Parkinson’s Disease Rating Scale; FOGQ, Freezing of Gait Questionnaire; MoCA, Montreal Cognitive Assessment; PDFOG+, PD patients with freezing of gait; PDFOG–, PD patients without FOG.

n.s. = not significant, Chi-square test.

* p < 0.05, independent t-test.

** p < 0.01, independent t-test.

2.2 EEG Recordings and Processing

EEG cap (actiCAP, EasyCap, Inc., Herrsching, Germany) with 64-channel was used to collect signals during a resting-state task in which participants sat and stared forward with their eyes open for 120–180 seconds. This data was collected using a 0.1 Hz filter and was sampled at 500 Hz with Pz as the reference electrode. EEG recordings were processed using EEGLAB 2022.1(Swartz Center for Computational Neuroscience, La Jolla, CA, USA) [20]. Due to potential contamination of the data from muscle artifacts, electrodes that are especially susceptible to movement-based and muscle artifacts were removed (TP9, TP10, FT9, FT10, Fp1, and Fp2). Data were re-referenced to the average and epoched into 3-second consecutive epochs across the entire dataset. Bad epochs and artifacts were removed using a combination of the FASTER and ADJUST algorithms, as well as the “pop_rejchan” function in MATLAB R2021a (The MathWorks, Inc., Natick, MA, USA) [21, 22].

EEG recordings of 90 seconds in duration were segmented into 30 non-overlapping epochs, each lasting 3 seconds. For each individual 3-second epoch, power spectral features using fast Fourier transform were extracted separately. These features were not averaged across epochs, allowing the model to utilize the variability present across different segments of the data. This approach preserves the temporal dynamics within the 90-second window and supports robust feature representation for classification tasks.

We focused our analyses on the midfrontal beta band (13–30 Hz). Spectral power values in the beta frequency band were extracted from a midfrontal Cz electrode and cluster of electrodes surrounding midfrontal Cz including Fz, FC1, FCz, FC2, and Cz. These values were then utilized in different ML and deep learning (DL) models.

2.3 Feature Transformation

We used the spectral power of the beta frequency band in thirty 3-seconds epoched data obtained from signals recorded at electrodes (Cz, and cluster of electrodes surrounding Cz including Fz, FC1, FCz, FC2, and Cz) of interest as features. To capture midfrontal beta oscillations with greater reliability, we focused on the “Cz-cluster”—comprising electrode Cz and its immediate neighbors (Fz, FC1, FCz, and FC2). Signals from these electrodes were averaged to form a composite midfrontal channel. This approach was adopted to reduce channel-specific variability and enhance signal-to-noise ratio, while retaining spatial specificity to the midfrontal cortical region, which has been previously associated with cognitive and motor dysfunctions in PD [12, 16]. Thus, for each condition (PDFOG+ or PDFOG–), there were 2 separate feature sets, Cz and Cz-cluster, per participant. We scaled the MinMaxScaler function in the Python sklearn (version 1.2.2, Scikit-learn, Python Software Foundation, Wilmington, DE, USA) preprocessing module to scale the features for each participant’s data, transforming the beta features to a range between 0 and 1. This ensured that all features contributed equally to model development, preventing bias towards features with larger ranges. Thus, for each electrode of interest (Cz or Cz-cluster) there were 1230 (30 epochs × 41 participants) data points per condition (PDFOG+ or PDFOG–). Then, for each electrode of interest, we aimed to distinguish between PDFOG+ and PDFOG– based on a total of 50,430 data points (1230 data points for beta frequency band × 41 participants).

All data preprocessing and statistical analyses were performed using Python (version 3.7.6, Python Software Foundation, San Francisco, CA, USA), with relevant packages including NumPy 1.18.1 (https://pypi.org/project/numpy/1.18.1/) and SciPy 1.4.1 (https://pypi.org/project/scipy/1.4.1/). Deep learning analyses were conducted using TensorFlow 2.1.0 (https://pypi.org/project/tensorflow/2.1.0/).

2.4 Model Development

Our approach employs ML and DL models to classify PDFOG+ and PDFOG–, with hyperparameter optimization using Grid Search and model evaluation through leave-one-subject-out cross-validation (LOSO-CV), 10-fold-CV and stratified-CV. This methodology consists of several key stages (Fig. 1). We chose logistic regression (LR), Random Forest (RF), extreme gradient boosting (XGBoost), categorical boosting (CatBoost) as ML algorithms and long short term memory (LSTM) as DL algorithm for this study because prior studies have successfully applied them to PD datasets [23, 24, 25, 26, 27, 28, 29, 30]. We employed a parameter grid to fine-tune each model by experimenting with various hyperparameter values.

Fig. 1.

Flow diagram of the methodology for classifying PDFOG+ and PDFOG– using EEG data, covering participant enrollment, EEG preprocessing, feature extraction, data transformation, model development, and evaluation. EEG, electroencephalographic; LR, logistic regression; RF, random forest; XGBoost, extreme gradient boosting; CatBoost, categorical boosting; LSTM, long short term memory; AUC-ROC, area under the receiver-operating characteristic; LOSO, leave-one-subject-out; CV, cross-validation.

2.4.1 LR

LR hyperparameters include penalty and solver. The penalty term specifies the type of regularization used, such as L1 (Lasso), L2 (Ridge), or Elastic Net, which determines how the model’s coefficients are penalized. The solver optimizes the LR model by finding the coefficients that minimize the loss function. The liblinear solver, which supports both L1 and L2 regularization and is well-suited for small to medium-sized datasets, has been used in prior study to discriminate cognitive status in PD [31]. In this study, the parameter grid for LR included regularization strength (0.1, 1, 10, 100), penalty (L1, L2, Elastic Net), and solver (liblinear).

2.4.2 RF

Key hyperparameters in an RF model include the number of decision trees (n_estimators), the maximum number of features considered at each split (max_features), the maximum depth of each tree (max_depth), and the criterion used to evaluate the quality of splits. n_estimators sets the number of decision trees in the forest, max_features defines the number of features considered at each split, max_depth controls the maximum depth of each tree, and criterion specifies the function used to assess the quality of a split. In this study, the parameter grid for RF included n_estimators (100, 200, 300), max_features (sqrt, log2), max_depth (None, 4, 6, 8, 10), and criterion (gini, entropy).

2.4.3 XGBoost

XGBoost hyperparameters include maximum depth of trees (max_depth), subsample ratio (subsample), column subsampling (e.g., colsample_bytree), and learning rate (eta). max_depth restricts the maximum depth of each tree, subsample specifies the portion of training data used for each tree, colsample_bytree sets the fraction of features randomly chosen for each tree, and eta regulates the impact of each tree on the final model. In this study, we used a parameter grid that included n_estimators (100, 200, 300), max_depth (3, 4, 5, 6), learning_rate (0.01, 0.1, 0.2), subsample (0.6, 0.8, 1.0) and colsample_bytree (0.6, 0.8, 1.0).

2.4.4 CatBoost

CatBoost is another gradient boosting algorithm that handles categorical features exceptionally well. It is designed to provide high performance with minimal hyperparameter tuning, which is beneficial for complex datasets like EEG signals [26]. CatBoost hyperparameters include the number of boosting iterations (iterations), tree depth (depth), learning rate, and L2 regularization for leaf values (l2_leaf_reg). The iterations parameter determines the number of boosting rounds, where more iterations allow for increased accuracy at the risk of potential overfitting. The depth parameter controls the complexity of individual trees, allowing the model to capture intricate relationships in the data. Learning rate adjusts the contribution of each tree to the final model, balancing the trade-off between convergence speed and stability. Finally, the l2_leaf_reg parameter adds a regularization term to reduce overfitting by penalizing overly complex models. In this study, the parameter grid for CatBoost included iterations (100, 200, 300), depth (3, 4, 5, 6), learning rate (0.01, 0.1, 0.2), and l2_leaf_reg (1, 3, 5, 7, 9).

2.4.5 LSTM

LSTM networks are a type of recurrent neural network (RNN) specifically designed to model long-term dependencies in sequential data. Given the time-dependent nature of EEG signals, LSTMs are highly suitable for capturing the temporal patterns that may indicate differences between PDFOG+ and PDFOG– in PD. To prepare the EEG data sequences for the LSTM model, for each subject, the beta bands data was organized into a sequence of 30 data points, forming an input suitable for time-series analysis with LSTM. The sequences were then reshaped to meet the LSTM input requirements, structured as (samples, time steps, feature), with each sequence representing a single subject. The target labels (PDFOG+ and PDFOG–) were encoded into binary values (0 and 1) and then one-hot encoded to make them compatible with the LSTM’s output requirements for binary classification. The LSTM model used in this study includes a layer with 32 units and a tanh activation function, which effectively manages the input sequence’s temporal information. This is followed by a dropout layer with a 0.2 dropout rate to mitigate overfitting by randomly deactivating neurons during training. The final dense layer has 1 unit with a sigmoid activation function to perform binary classification into PDFOG+ and PDFOG– (Fig. 2). The parameter grid for LSTM included units (50, 100), dropout rates (0.2, 0.3), batch sizes (16, 32), and epochs (50, 100), enabling optimization through grid search during LOSO-CV, 10-fold-CV and stratified-CV.

Fig. 2.

Architecture of the long short-term memory (LSTM) network used for classifying PDFOG+ and PDFOG– based on EEG data. The model takes input sequences shaped as (samples, 30, 1) representing 30 time steps with 1 feature each. It includes an LSTM layer with a tanh activation function to capture temporal dependencies, followed by a Dropout layer to prevent overfitting. A Dense layer further processes the extracted features, and the output layer with a sigmoid activation function performs binary classification into PDFOG+ and PDFOG– groups.

2.5 Model Evaluation

We used the grid search technique to explore a range of hyperparameter values that would optimize each model’s performance. We compared three different strategies to assess a trained model’s prediction outcomes on unseen data: LOSO-CV, 10-fold-CV, and stratified-CV.

2.5.1 LOSO-CV

LOSO-CV ensures that data from one subject is completely isolated during testing, which minimizes overfitting and evaluates the model’s ability to generalize across different individuals. Previous studies have employed LOSO-CV for EEG-based PD classification [32, 33, 34]. In this study, we divided the data into 82 subsets, each containing all the data from a single participant. Each model was trained on the data from 81 participants, leaving out the data from 1 participant as the test set. This process was repeated 82 times, with each participant taking a turn as the test set. For each model, after all 82 iterations, the performance metric from each test set was averaged to provide an overall assessment of that model’s ability to generalize across all participants.

2.5.2 10-fold-CV

We randomly split the entire dataset into 10 approximately equal folds [35]. For each of the 10 folds: we used one fold as the test set, combined the remaining 9 folds to form the training set, trained a model using the training set, and evaluated the model’s performance on the test set. Each fold was used as the test set exactly once. This approach is less computationally intensive compared to LOSO-CV.

2.5.3 Stratified-CV

We determined the proportions of PDFOG+ and PDFOG– in the dataset. In this study, the proportion was 50%:50%. We chose k = 10 as the number of folds for cross-validation. Then, we split the dataset into 10 folds such that each fold maintained the 50:50 ratio of PDFOG+ and PDFOG. In each iteration, one-fold was used as the test set, and the remaining k-1 folds were combined as the training set. We repeated this process k times, with each fold serving as the test set once. We trained each model on the training folds and evaluated its performance on the test fold.

We employed 5 performance metrics in this study. They are accuracy, precision, sensitivity, F1 score, and area under the receiver-operating characteristic (AUC-ROC) curve. We computed accuracy, which measures the proportion of correctly classified PDFOG+ and PDFOG– out of all instances, as ((true positive (TP) + true negative (TN))/(TP + false positive (FP) + false negative (FN) + TN)), where TP, TN, FP, and FN denote true positive, true negative, false positive, and false negative, respectively. We computed precision, which measures how many of the instances predicted as PDFOG+ actually are PDFOG+, as (TP/(TP + FP)). We computed sensitivity (or recall), which measures how many of the actual PDFOG+ instances were correctly identified by a model, as (TP/(TP + FN)). We calculated the F1 score, a metric that combines precision and sensitivity into a single value to balance both, as (2TP/(2TP + FP + FN)). Further, the AUC-ROC measures a model’s ability to discriminate between PDFOG+ and PDFOG– across various classification thresholds.

3. Results

This section summarizes the classification outcomes for distinguishing PDFOG+ from PDFOG– subjects using EEG data. The analysis was conducted for both the Cz channel and the Cz-cluster, across the beta frequency band, with four ML models and one DL model: LR, RF, XGBoost, CatBoost, and LSTM.

For the Cz channel, as shown in Table 2, in the beta band, LR achieved an AUC-ROC score of 0.63 using LOSO-CV, with an accuracy of 0.40, precision of 0.34, and F1-score of 0.34. However, the relatively lower sensitivity (0.40) suggests a trade-off between precision and recall. The performance metrics for 10-fold and stratified-CV were lower, with AUC-ROC scores of 0.50. RF, XGBoost, and CatBoost exhibited comparable performance, with AUC-ROC values ranging between 0.44 and 0.50, showing limited discriminatory power. LSTM outperformed all traditional models, achieving the highest AUC-ROC of 0.55 and an accuracy of 0.51 in both LOSO and 10-fold cross-validation.

Table 2. Overall results for all models based on midfrontal channel Cz for beta band to classify PDFOG+ versus PDFOG–.
Model Cross-validation Accuracy Precision Sensitivity F1-score AUC-ROC
LR LOSO 0.40 0.34 0.40 0.34 0.63
10-fold 0.51 0.51 0.51 0.49 0.50
Stratified 0.50 0.50 0.50 0.50 0.50
RF LOSO 0.45 0.45 0.45 0.45 0.47
10-fold 0.49 0.49 0.49 0.48 0.49
Stratified 0.50 0.50 0.50 0.50 0.49
XGBoost LOSO 0.46 0.46 0.46 0.46 0.46
10-fold 0.49 0.49 0.49 0.48 0.49
Stratified 0.49 0.49 0.49 0.49 0.50
CatBoost LOSO 0.43 0.43 0.43 0.43 0.44
10-fold 0.49 0.49 0.49 0.49 0.50
Stratified 0.50 0.50 0.50 0.50 0.49
LSTM LOSO 0.51 0.51 0.51 0.51 0.55
10-fold 0.51 0.51 0.51 0.51 0.55
Stratified 0.49 0.49 0.49 0.49 0.48

In the Cz-cluster configuration, as seen in Table 3, LR demonstrated an AUC-ROC of 0.47 using LOSO, with accuracy, precision, and F1-score all at 0.47. The performance improved slightly under 10-fold (AUC-ROC = 0.51, accuracy = 0.51) and stratified-CV (AUC-ROC = 0.51, accuracy = 0.51). RF, XGBoost, and CatBoost showed moderate performance, with AUC-ROC values ranging from 0.48 to 0.51. LSTM, however, achieved the best performance with an AUC-ROC of 0.68 and accuracy of 0.63 in LOSO-CV. The performance of LSTM further improved under 10-fold-CV, with AUC-ROC of 0.62 and accuracy of 0.57, demonstrating its superior ability to handle temporal dependencies in EEG data.

Table 3. Overall results for all models based on midfrontal channel Cz-cluster values for beta band to classify PDFOG+ versus PDFOG–.
Model Cross-validation Accuracy Precision Sensitivity F1-score AUC-ROC
LR LOSO 0.47 0.47 0.47 0.47 0.47
10-fold 0.51 0.51 0.51 0.51 0.51
Stratified 0.51 0.51 0.51 0.51 0.51
RF LOSO 0.47 0.47 0.47 0.47 0.50
10-fold 0.49 0.49 0.49 0.49 0.50
Stratified 0.49 0.49 0.49 0.49 0.50
XGBoost LOSO 0.47 0.47 0.47 0.46 0.48
10-fold 0.50 0.50 0.50 0.50 0.51
Stratified 0.48 0.48 0.48 0.48 0.49
CatBoost LOSO 0.44 0.44 0.44 0.44 0.41
10-fold 0.49 0.49 0.49 0.49 0.50
Stratified 0.50 0.50 0.50 0.50 0.51
LSTM LOSO 0.63 0.65 0.63 0.63 0.68
10-fold 0.57 0.58 0.57 0.56 0.62
Stratified 0.54 0.54 0.54 0.53 0.57

Figs. 3,4 illustrate the confusion matrices and AUC-ROC curves for LR and LSTM models under LOSO-CV for the Cz channel and Cz-cluster configurations, respectively. In Fig. 3, the performance of LR on the Cz channel in the beta band is shown, with an AUC-ROC of 0.63. The confusion matrix reveals a higher number of false positives (364) and false negatives (1115), reflecting difficulty of model in distinguishing PDFOG+ from PDFOG–. In contrast, Fig. 4 shows performance of LSTM on the Cz-cluster with an AUC-ROC of 0.68, where the confusion matrix indicates better performance with more true positives (32) and fewer false positives (21), supporting improved discriminative ability of LSTM.

Fig. 3.

Classification performance of LR on midfrontal Cz beta band. (a,b) show the confusion matrix and AUC-ROC curve for LR with leave-one-subject-out cross-validation on the midfrontal (Cz) beta band. The confusion matrices illustrate the distribution of true positives, false positives, true negatives, and false negatives for PDFOG+ and PDFOG– groups. The AUC-ROC curves demonstrate ability of model to discriminate between PDFOG+ and PDFOG– across varying classification thresholds, with the AUC indicating the models’ discriminative performance (0.63 for LR).

Fig. 4.

Classification performance of LSTM on midfrontal Cz-cluster beta band. (a,b) show the confusion matrix and AUC-ROC curve for LSTM with leave-one-subject-out cross-validation on the midfrontal Cz-cluster beta band. The confusion matrices illustrate the distribution of true positives, false positives, true negatives, and false negatives for PDFOG+ and PDFOG– groups. The AUC-ROC curves demonstrate ability of model to discriminate between PDFOG+ and PDFOG– across varying classification thresholds, with the AUC indicating the models’ discriminative performance (0.68 for LSTM).

Overall, the midfrontal beta oscillatory features from Cz and Cz-cluster provided moderate discriminatory power. LR achieved the highest AUC-ROC of 0.63 but was limited by low accuracy and F1-scores. LSTM outperformed the traditional models, particularly in the Cz-cluster configuration, achieving an AUC-ROC of 0.68 and accuracy of 0.63. These results highlight the importance of temporal dependencies in EEG analysis and suggest that the midfrontal beta band may serve as a possible key biomarker for PDFOG+ classification.

4. Discussion

The results of this study demonstrate that LSTM was the most effective model for distinguishing between PDFOG+ and PDFOG– subjects using EEG data from the midfrontal Cz and Cz-cluster signals, particularly in the beta band. LSTM achieved the highest AUC-ROC of 0.68 for the beta band using the Cz-cluster configuration, with an accuracy of 0.63, precision of 0.65, and F1-score of 0.63. These results highlight ability of LSTM to capture temporal dependencies in EEG data, offering superior performance compared to traditional ML models. In contrast, LR despite achieving a relatively high AUC-ROC of 0.63 for the Cz configuration, showed much lower accuracy (0.40), precision (0.34), and F1-score (0.34). This discrepancy emphasizes the challenge of translating high-ranking probabilities into effective classifications at a fixed threshold, where the high AUC-ROC score did not align with the model’s true classification accuracy, particularly in the case of LR.

The relatively lower performance metrics for LR, RF, XGBoost, and CatBoost (AUC-ROC values ranging from 0.44 to 0.50) suggest that while these models capture some discriminatory information from the beta band, they are less effective at distinguishing between PDFOG+ and PDFOG– subjects. These models had moderate performance, with AUC-ROC values indicating only limited discriminatory power. This could be due to overlapping distributions between the two groups, which makes it harder for these models to find clear decision boundaries. However, LSTM, by leveraging the temporal structure of the data, outperformed these traditional models, particularly with the Cz-cluster data, which likely enabled it to capture more complex patterns and interactions in the EEG signals. The inclusion of the Cz-cluster, as seen in Table 3, provided some improvement, with LR’s F1-score increasing to 0.47 compared to 0.34 with Cz alone. This suggests that using a cluster of nearby electrodes improves the ability of model to detect subtle spatial patterns associated with PD with FOG classification. This is consistent with prior research showing that cluster-based approaches can enhance the stability of EEG signals by averaging activity across neighboring electrodes, reducing noise, and improving generalizability [10, 12]. However, the improvements were modest, and the performance of traditional ML models remained limited compared to LSTM.

LSTM demonstrated an AUC-ROC of 0.68 and accuracy of 0.63 with the Cz-cluster, reflecting its ability to capture the temporal dynamics of EEG signals more effectively. The improvement of LSTM’s performance with temporal modeling highlights the importance of understanding the sequential nature of EEG data in clinical contexts. To further validate the reliability of the observed AUC-ROC value, we conducted a post-hoc power analysis. The analysis confirmed that our sample size (n = 82) provided 83% power to detect an AUC of 0.68 at α = 0.05, exceeding the conventional 80% threshold for statistical power and supporting the robustness of this result. While beta oscillations may not be as effective as other frequency bands in distinguishing PDFOG+ and PDFOG– in resting-state EEG, LSTM models have the potential to leverage both the spatial and temporal dynamics of EEG to enhance classification accuracy [36].

Additionally, the use of LOSO-CV in this study helped to account for subject-specific variability, a common challenge in EEG-based classification. The high AUC-ROC values observed in LOSO-CV confirm that the models, especially LSTM, generalize well across different individuals. Despite the modest performance metrics for some models, the findings underscore the need for multimodal approaches that integrate EEG with other modalities like functional near-infrared spectroscopy (fNIRS) or eye-tracking to capture complementary information and improve classification accuracy [37, 38]. The relatively lower performance of traditional models like LR could be due to misalignment between the AUC-ROC score and the default classification threshold. Since AUC-ROC measures the ranking ability of the model across various thresholds, performance at a fixed threshold (e.g., 0.5) may not align with the model’s probability distribution. Adjusting the classification threshold or using threshold optimization techniques could improve precision and recall, mitigating the trade-off observed in models like LR. Furthermore, LSTM models highlight the importance of optimizing architectures for EEG data. While LSTM outperformed traditional methods, it still requires further refinement to address issues like subject-specific variability and performance consistency. Hybrid models, combining Convolutional Neural Networks with LSTMs, could be explored to better capture both spatial and temporal features, potentially enhancing classification accuracy [39].

Several limitations should also be noted. One limitation of the current study is the absence of confidence intervals for AUC values. While AUC was used to assess model performance, the limited sample size may result in unstable or misleading interval estimates. Future work will incorporate confidence intervals and larger sample sizes to better quantify the robustness and generalizability of classification outcomes. The dataset size may also have constrained the generalizability of results, particularly for deep learning models. Larger datasets with greater variability are needed to validate these findings. The EEG data used in this study were collected at the University of Iowa and are fully independent of the current laboratory’s internal datasets. The dataset has been previously published [9, 17] and is publicly accessible (http://predict.cs.unm.edu/downloads.php), ensuring full transparency and reproducibility. Although external datasets were not used for validation in this study, we recognize the importance of generalizability and plan to validate the current findings using independent datasets in future research. Another notable limitation of this study is that all EEG recordings were conducted while participants were in the ON-medication state. Levodopa and related dopaminergic medications are known to alter cortical oscillatory activity, and their effects can vary significantly between individuals with and without FOG. As such, limiting data collection to the ON state may reduce the ability to capture pathological neural signatures that emerge or intensify during the OFF-medication state, potentially introducing bias into our findings. However, this design choice was made to ensure participant safety and comfort, as OFF-state assessments can increase the risk of falls and other adverse events during data acquisition. Moreover, the ON state still provides clinically relevant insights, especially in cases where FOG persists despite medication. In future studies, we plan to include both ON- and OFF-state recordings to better understand state-dependent neural dynamics and to enhance the robustness of FOG classification models. Though healthy controls were not included in the present classification task, comparative analyses involving healthy individuals will be performed in future studies to isolate disease-specific EEG signatures. Therefore, future research can leverage three-way comparisons (PDFOG+, PDFOG–, age-matched healthy controls) to better isolate pathological neural activity and enhance clinical interpretation. The reliance on resting-state EEG may limit the applicability to task-based paradigms where neural dynamics are more pronounced. Future research should explore the inclusion of lower-limb motor task-based EEG recordings to assess whether dynamic neural responses enhance classification performance. We should also aim to acquire EEG during active FOG episodes using task-based paradigms, gait simulations, or wearable-triggered recordings. Integrating additional features such as functional connectivity or spectral entropy could further improve discrimination. Finally, integrating multimodal data (e.g., fNIRS, motion capture, eye-tracking) with EEG may also yield synergistic insights into FOG dynamics, and such approaches can be explored in future studies [40, 41, 42]. Future work should also consider threshold optimization and advanced metrics to better align with clinical objectives, including techniques like Generative Adversarial Networks to augment datasets and reduce biases [43]. In this study, we focused on power spectral features due to their relevance in FOG-related beta oscillations [12, 13, 44]. However, future work could also expand this approach to include nonlinear EEG features. Additionally, while we demonstrate that LSTM, a deep learning approach, can be effective even with limited data, further data collection will support exploration of more complex deep architectures. Moreover, future analyses can also explore the integration of other frequencies features (such as theta, alpha, and gamma) with beta-band data to improve classification accuracy and capture broader neurophysiological patterns associated with FOG. Statistical and transformation-based fusion methods, including principal component analysis (PCA), kernel fusion, or neural attention layers, can also be considered to enhance multi-band integration.

5. Conclusion

This study demonstrates the potential of using EEG data, particularly from the midfrontal beta frequency band, to classify PDFOG+ and PDFOG– subjects. The results conclude that while traditional ML models such as LR, RF, XGBoost, and CatBoost provide some discriminatory power, LSTM, a DL model, outperformed all other models, particularly with the Cz-cluster configuration. The capability of LSTM to capture temporal dependencies in EEG signals proved crucial in achieving the highest performance, with an AUC-ROC of 0.68 and accuracy of 0.63. These findings suggest that the midfrontal beta oscillations can hold promise as a biomarker for distinguishing PDFOG+ from PDFOG– subjects, especially when combined with the temporal modeling capabilities of LSTM. The study emphasizes the importance of integrating both spatial and temporal features to improve classification accuracy and generalizability, making LSTM a significant tool for this task. While the results are promising, further optimization of model architectures, including hybrid models combining convolutional networks with LSTM, and the incorporation of multimodal data, may lead to even better performance in clinical applications. Future research should focus on refining the models to handle subject-specific variability more effectively and explore task-based EEG recordings to capture more dynamic neural responses. These improvements could enhance the reliability and clinical applicability of EEG-based classification for PD with FOG and potentially improve diagnostic tools for PD.

Availability of Data and Materials

The datasets and code generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Author Contributions

SR, JN, and AS designed, analyzed, and wrote the manuscript. TJB and RB organized the data and preprocessed the data. MS, TK, and KS analysed the codes and reviewed the manuscript. TJB and RB also reviewed the manuscript. All authors read and approved the final manuscript. All authors have participated sufficiently in the work and agreed to be accountable for all aspects of the work.

Ethics Approval and Consent to Participate

Not applicable.

Acknowledgment

Not applicable.

Funding

This work was supported by SSOM Biomedical AI and Computation Idea Competition (BACIC).

Conflict of Interest

The authors declare no conflict of interest.

References
[1]
Giladi N, Kao R, Fahn S. Freezing phenomenon in patients with parkinsonian syndromes. Movement Disorders: Official Journal of the Movement Disorder Society. 1997; 12: 302–305. https://doi.org/10.1002/mds.870120307.
[2]
Giladi N, Treves TA, Simon ES, Shabtai H, Orlov Y, Kandinov B, et al. Freezing of gait in patients with advanced Parkinson’s disease. Journal of Neural Transmission. 2001; 108: 53–61. https://doi.org/10.1007/s007020170096.
[3]
Bloem BR, Hausdorff JM, Visser JE, Giladi N. Falls and freezing of gait in Parkinson’s disease: a review of two interconnected, episodic phenomena. Movement Disorders: Official Journal of the Movement Disorder Society. 2004; 19: 871–884. https://doi.org/10.1002/mds.20115.
[4]
Scholl JL, Espinoza AI, Rai W, Leedom M, Baugh LA, Berg-Poppe P, et al. Relationships between Freezing of Gait Severity and Cognitive Deficits in Parkinson’s Disease. Brain Sciences. 2021; 11: 1496. https://doi.org/10.3390/brainsci11111496.
[5]
Vercruysse S, Devos H, Munks L, Spildooren J, Vandenbossche J, Vandenberghe W, et al. Explaining freezing of gait in Parkinson’s disease: motor and cognitive determinants. Movement Disorders: Official Journal of the Movement Disorder Society. 2012; 27: 1644–1651. https://doi.org/10.1002/mds.25183.
[6]
Nutt JG, Horak FB, Bloem BR. Milestones in gait, balance, and falling. Movement Disorders: Official Journal of the Movement Disorder Society. 2011; 26: 1166–1174. https://doi.org/10.1002/mds.23588.
[7]
Snijders AH, Haaxma CA, Hagen YJ, Munneke M, Bloem BR. Freezer or non-freezer: clinical assessment of freezing of gait. Parkinsonism & Related Disorders. 2012; 18: 149–154. https://doi.org/10.1016/j.parkreldis.2011.09.006.
[8]
Park H, Shin S, Youm C, Cheon SM, Lee M, Noh B. Classification of Parkinson’s disease with freezing of gait based on 360° turning analysis using 36 kinematic features. Journal of Neuroengineering and Rehabilitation. 2021; 18: 177. https://doi.org/10.1186/s12984-021-00975-4.
[9]
Anjum MF, Dasgupta S, Mudumbai R, Singh A, Cavanagh JF, Narayanan NS. Linear predictive coding distinguishes spectral EEG features of Parkinson’s disease. Parkinsonism & Related Disorders. 2020; 79: 79–85. https://doi.org/10.1016/j.parkreldis.2020.08.001.
[10]
Cavanagh JF, Kumar P, Mueller AA, Richardson SP, Mueen A. Diminished EEG habituation to novel events effectively classifies Parkinson’s patients. Clinical Neurophysiology: Official Journal of the International Federation of Clinical Neurophysiology. 2018; 129: 409–418. https://doi.org/10.1016/j.clinph.2017.11.023.
[11]
Lainscsek C, Hernandez ME, Weyhenmeyer J, Sejnowski TJ, Poizner H. Non-linear dynamical analysis of EEG time series distinguishes patients with Parkinson’s disease from healthy individuals. Frontiers in Neurology. 2013; 4: 200. https://doi.org/10.3389/fneur.2013.00200.
[12]
Singh A, Cole RC, Espinoza AI, Brown D, Cavanagh JF, Narayanan NS. Frontal theta and beta oscillations during lower-limb movement in Parkinson’s disease. Clinical Neurophysiology: Official Journal of the International Federation of Clinical Neurophysiology. 2020; 131: 694–702. https://doi.org/10.1016/j.clinph.2019.12.399.
[13]
Singh A. Oscillatory activity in the cortico-basal ganglia-thalamic neural circuits in Parkinson’s disease. The European Journal of Neuroscience. 2018; 48: 2869–2878. https://doi.org/10.1111/ejn.13853.
[14]
Bajpai R, Khare S, Joshi D. A multimodal model-fusion approach for improved prediction of freezing of gait in parkinson’s disease. IEEE Sensors Journal. 2023; 23: 16168–16175.
[15]
Singh A, Plate A, Kammermeier S, Mehrkens JH, Ilmberger J, Bötzel K. Freezing of gait-related oscillatory activity in the human subthalamic nucleus. Basal Ganglia. 2013; 3: 25–32. https://doi.org/10.1016/j.baga.2012.10.002.
[16]
Bosch TJ, Barsainya R, Ridder A, Santosh KC, Singh A. Interval timing and midfrontal delta oscillations are impaired in Parkinson’s disease patients with freezing of gait. Journal of Neurology. 2022; 269: 2599–2609. https://doi.org/10.1007/s00415-021-10843-9.
[17]
Bosch TJ, Espinoza AI, Mancini M, Horak FB, Singh A. Functional Connectivity in Patients With Parkinson’s Disease and Freezing of Gait Using Resting-State EEG and Graph Theory. Neurorehabilitation and Neural Repair. 2022; 36: 715–725. https://doi.org/10.1177/15459683221129282.
[18]
Movement Disorder Society Task Force on Rating Scales for Parkinson’s Disease. The Unified Parkinson’s Disease Rating Scale (UPDRS): status and recommendations. Movement Disorders: Official Journal of the Movement Disorder Society. 2003; 18: 738–750. https://doi.org/10.1002/mds.10473.
[19]
Nieuwboer A, Rochester L, Herman T, Vandenberghe W, Emil GE, Thomaes T, et al. Reliability of the new freezing of gait questionnaire: agreement between patients with Parkinson’s disease and their carers. Gait & Posture. 2009; 30: 459–463. https://doi.org/10.1016/j.gaitpost.2009.07.108.
[20]
Delorme A, Makeig S. EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. Journal of Neuroscience Methods. 2004; 134: 9–21. https://doi.org/10.1016/j.jneumeth.2003.10.009.
[21]
Nolan H, Whelan R, Reilly RB. FASTER: Fully Automated Statistical Thresholding for EEG artifact Rejection. Journal of Neuroscience Methods. 2010; 192: 152–162. https://doi.org/10.1016/j.jneumeth.2010.07.015.
[22]
Mognon A, Jovicich J, Bruzzone L, Buiatti M. ADJUST: An automatic EEG artifact detector based on the joint use of spatial and temporal features. Psychophysiology. 2011; 48: 229–240. https://doi.org/10.1111/j.1469-8986.2010.01061.x.
[23]
Wang P, Jiang A, Liu X, Shang J, Zhang L. LSTM-Based EEG Classification in Motor Imagery Tasks. IEEE Transactions on Neural Systems and Rehabilitation Engineering: a Publication of the IEEE Engineering in Medicine and Biology Society. 2018; 26: 2086–2095. https://doi.org/10.1109/TNSRE.2018.2876129.
[24]
Avuçlu E, Elen A. Evaluation of train and test performance of machine learning algorithms and Parkinson diagnosis with statistical measurements. Medical & Biological Engineering & Computing. 2020; 58: 2775–2788. https://doi.org/10.1007/s11517-020-02260-3.
[25]
Vidya B, Sasikumar P. Gait based Parkinson’s disease diagnosis and severity rating using multi-class support vector machine. Applied Soft Computing. 2021; 113: 107939. https://doi.org/10.1016/j.asoc.2021.107939.
[26]
Jiwani N, Gupta K, Afreen N. Automated Seizure Detection using Theta Band. In 2022 International Conference on Emerging Smart Computing and Informatics (ESCI) (pp. 1–4). 2022. https://doi.org/10.1109/ESCI53509.2022.9758331.
[27]
Guo X, Tinaz S, Dvornek NC. Characterization of Early Stage Parkinson’s Disease From Resting-State fMRI Data Using a Long Short-Term Memory Network. Frontiers in Neuroimaging. 2022; 1: 952084. https://doi.org/10.3389/fnimg.2022.952084.
[28]
Bdaqli M, Shoeibi A, Moridian P, Sadeghi D, Pouyani MF, Shalbaf A, et al. Diagnosis of Parkinson Disease from EEG Signals Using a CNN-LSTM Model and Explainable AI. In Ferrández Vicente JM, Val Calvo M, Adeli H (eds.) Artificial Intelligence for Neuroscience and Emotional Systems (pp. 128–138). Springer Nature Switzerland: Cham. 2024.
[29]
Alshammri R, Alharbi G, Alharbi E, Almubark I. Machine learning approaches to identify Parkinson’s disease using voice signal features. Frontiers in Artificial Intelligence. 2023; 6: 1084001. https://doi.org/10.3389/frai.2023.1084001.
[30]
Li K, Ao B, Wu X, Wen Q, Ul Haq E, Yin J. Parkinson’s disease detection and classification using EEG based on deep CNN-LSTM model. Biotechnology & Genetic Engineering Reviews. 2024; 40: 2577–2596. https://doi.org/10.1080/02648725.2023.2200333.
[31]
Abós A, Baggio HC, Segura B, García-Díaz AI, Compta Y, Martí MJ, et al. Discriminating cognitive status in Parkinson’s disease through functional connectomics and machine learning. Scientific Reports. 2017; 7: 45347. https://doi.org/10.1038/srep45347.
[32]
Aljalal M, Aldosari SA, Molinas M, AlSharabi K, Alturki FA. Detection of Parkinson’s disease from EEG signals using discrete wavelet transform, different entropy measures, and machine learning techniques. Scientific Reports. 2022; 12: 22547. https://doi.org/10.1038/s41598-022-26644-7.
[33]
Sugden RJ, Diamandis P. Generalizable electroencephalographic classification of Parkinson’s disease using deep learning. Informatics in Medicine Unlocked. 2023; 42: 101352. https://doi.org/10.1016/j.imu.2023.101352.
[34]
Khosla A, Kumar N, Khera P. Machine learning approach for predicting state transitions via shank acceleration data during freezing of gait in Parkinson’s disease. Biomedical Signal Processing and Control. 2024; 92: 106053. https://doi.org/10.1016/j.bspc.2024.106053.
[35]
Koch M, Geraedts V, Wang H, Tannemaat M, Back T. Automated Machine Learning for EEG-Based Classification of Parkinson’s Disease Patients. In 2019 IEEE International Conference on Big Data (Big Data) (pp. 4845–4852). 2019. https://doi.org/10.1109/BigData47090.2019.9006599.
[36]
Bayot M, Gérard M, Derambure P, Dujardin K, Defebvre L, Betrouni N, et al. Functional networks underlying freezing of gait: a resting-state electroencephalographic study. Neurophysiologie Clinique. 2022; 52: 212–222. https://doi.org/10.1016/j.neucli.2022.03.003.
[37]
Li Y, Zhang X, Ming D. Early-stage fusion of EEG and fNIRS improves classification of motor imagery. Frontiers in Neuroscience. 2023; 16: 1062889. https://doi.org/10.3389/fnins.2022.1062889.
[38]
Arif A, Wang Y, Yin R, Zhang X, Helmy A. EF-Net: Mental State Recognition by Analyzing Multimodal EEG-fNIRS via CNN. Sensors. 2024; 24: 1889. https://doi.org/10.3390/s24061889.
[39]
Huang Z, Ma Y, Wang R, Li W, Dai Y. A Model for EEG-Based Emotion Recognition: CNN-Bi-LSTM with Attention Mechanism. Electronics. 2023; 12: 3188. https://doi.org/10.3390/electronics12143188.
[40]
Khan H, Naseer N, Yazidi A, Eide PK, Hassan HW, Mirtaheri P. Analysis of Human Gait Using Hybrid EEG-fNIRS-Based BCI System: A Review. Frontiers in Human Neuroscience. 2021; 14: 613254. https://doi.org/10.3389/fnhum.2020.613254.
[41]
Vortmann LM, Ceh S, Putze F. Multimodal EEG and Eye Tracking Feature Fusion Approaches for Attention Classification in Hybrid BCIs. Frontiers in Computer Science. 2022; 4: 780580. https://doi.org/10.3389/fcomp.2022.780580.
[42]
Huang Q, Ding J, Wang X. A Method to Extract Task-Related EEG Feature Based on Lightweight Convolutional Neural Network. Neuroscience Bulletin. 2024; 40: 1915–1930. https://doi.org/10.1007/s12264-024-01247-6.
[43]
Chen W, Liao Y, Dai R, Dong Y, Huang L. EEG-based emotion recognition using graph convolutional neural network with dual attention mechanism. Frontiers in Computational Neuroscience. 2024; 18: 1416494. https://doi.org/10.3389/fncom.2024.1416494.
[44]
Marquez JS, Hasan SMS, Siddiquee MR, Luca CC, Mishra VR, Mari Z, et al. Neural Correlates of Freezing of Gait in Parkinson’s Disease: An Electrophysiology Mini-Review. Frontiers in Neurology. 2020; 11: 571086. https://doi.org/10.3389/fneur.2020.571086.

Publisher’s Note: IMR Press stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share
Back to top