VoxVeritasNet: A new feature engineering model leveraging iterative feature selection for detecting fake or real speech

Çelik, Burak; Zeybek, Burcu; Karadeniz, Mahmut; Kocyigit, Adem; Arsalı, ONUR; Efeoglu, Ebru; Türetken, BAHATTİN

doi:10.1016/j.aej.2026.01.009

VoxVeritasNet: A new feature engineering model leveraging iterative feature selection for detecting fake or real speech

Çelik B., Zeybek B., Karadeniz M. B., Kocyigit A., Arsalı O., Efeoglu E., ...Daha Fazla

ALEXANDRIA ENGINEERING JOURNAL, cilt.136, ss.89-104, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 136
Basım Tarihi: 2026
Doi Numarası: 10.1016/j.aej.2026.01.009
Dergi Adı: ALEXANDRIA ENGINEERING JOURNAL
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
Sayfa Sayıları: ss.89-104
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
Kocaeli Üniversitesi Adresli: Evet

Özet

This study introduces VoxVeritasNet, a high-precision and computationally efficient feature engineering framework for deepfake audio detection. The methodology leverages a nine-level Multi-Level Discrete Wavelet Transform (MDWT) to capture intricate time-frequency artifacts. A key innovation is the quantum-inspired dual-path mapping algorithm, which models parallel signal dependencies and embeds features into a high-dimensional Hilbert space for enhancing geometric separability. To optimize performance, an iterative ensemble selection strategy utilizing Neighborhood Component Analysis (NCA), Chi2, and ReliefF is employed alongside Support Vector Machines and k-Nearest Neighbors. The framework was evaluated across three public datasets with varying class distributions, achieving state-of-the-art peak accuracies of 99.96% with db4 and 99.71% with sym8 wavelets. Even using with the computationally efficient sym4 baseline, the model maintained exceptional detection rates above 98.99% and an equal error rate (EER) as low as 0.14%. VoxVeritasNet operates with a processing throughput of 6.45 segments per second on standard CPU hardware with a negligible storage footprint, offering a lightweight and explainable alternative to resource-intensive deep learning architectures.