VoxVeritasNet: A new feature engineering model leveraging iterative feature selection for detecting fake or real speech


Creative Commons License

Çelik B., Zeybek B., Karadeniz M. B., Kocyigit A., Arsalı O., Efeoglu E., ...Daha Fazla

ALEXANDRIA ENGINEERING JOURNAL, cilt.136, ss.89-104, 2026 (SCI-Expanded, Scopus) identifier

Özet

This study introduces VoxVeritasNet, a high-precision and computationally efficient feature engineering framework for deepfake audio detection. The methodology leverages a nine-level Multi-Level Discrete Wavelet Transform (MDWT) to capture intricate time-frequency artifacts. A key innovation is the quantum-inspired dual-path mapping algorithm, which models parallel signal dependencies and embeds features into a high-dimensional Hilbert space for enhancing geometric separability. To optimize performance, an iterative ensemble selection strategy utilizing Neighborhood Component Analysis (NCA), Chi2, and ReliefF is employed alongside Support Vector Machines and k-Nearest Neighbors. The framework was evaluated across three public datasets with varying class distributions, achieving state-of-the-art peak accuracies of 99.96% with db4 and 99.71% with sym8 wavelets. Even using with the computationally efficient sym4 baseline, the model maintained exceptional detection rates above 98.99% and an equal error rate (EER) as low as 0.14%. VoxVeritasNet operates with a processing throughput of 6.45 segments per second on standard CPU hardware with a negligible storage footprint, offering a lightweight and explainable alternative to resource-intensive deep learning architectures.