TS-CNN-LSTM: EEG-Based Emotion Recognition Across Mixed-Subject and Cross-Subject Protocols Using Variational Mode Decomposition

Çelebı, Muharrem; ÖZTÜRK, SITKI; KAPLAN, KAPLAN

doi:10.1109/access.2026.3692665

TS-CNN-LSTM: EEG-Based Emotion Recognition Across Mixed-Subject and Cross-Subject Protocols Using Variational Mode Decomposition

Çelebı M., ÖZTÜRK S., KAPLAN K.

IEEE Access, cilt.14, ss.72668-72682, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 14
Basım Tarihi: 2026
Doi Numarası: 10.1109/access.2026.3692665
Dergi Adı: IEEE Access
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
Sayfa Sayıları: ss.72668-72682
Anahtar Kelimeler: convolutional neural network (CNN), Electroencephalogram (EEG), long short-term memory (LSTM), triple-stream (TS), variational mode decomposition (VMD)
Kocaeli Üniversitesi Adresli: Evet

Özet

Emotion recognition using electroencephalography (EEG) signals is a key research area in affective computing. Existing EEG-based methods often rely on limited feature representations and underutilize deep learning architectures, limiting their ability to capture the complex temporal, spatial, and spectral characteristics of EEG signals. To address these challenges, this study proposes a novel multi-feature EEG-based emotion recognition framework, TS-CNN-LSTM. EEG signals are first broken down into using Variational Mode Decomposition (VMD), enabling enhanced time–frequency analysis. Frequency-domain features are extracted from the VMD results, complemented by time-domain and nonlinear features. Nine distinct feature types are combined and transformed into nine-dimensional multi-spectral representations to preserve temporal–spatial–spectral relationships. These representations are processed through a Triple-Stream (TS) architecture comprising parallel 1D, 2D, and 3D convolutional neural networks (CNNs) and a long short-term memory (LSTM) network. The 1D-CNN captures spectral features, the 2D-CNN extracts spatial dependencies, and the 3D-CNN learns joint spatial–spectral representations, while the LSTM models sequential dynamics. The model was evaluated on the SEED and DEAP benchmark datasets using both mixed-subject and cross-subject testing protocols. Results show that TS-CNN-LSTM achieves strong performance in mixed-subject evaluations while highlighting the challenges of cross-subject generalization due to inter-subject variability in EEG signals. The proposed framework provides a comprehensive approach for EEG-based emotion recognition, integrating multi-domain feature extraction with a triple-stream CNN-LSTM architecture.