ALEXANDRIA ENGINEERING JOURNAL, cilt.136, ss.89-104, 2026 (SCI-Expanded, Scopus)
This study introduces VoxVeritasNet, a high-precision and computationally efficient feature engineering framework for deepfake audio detection. The methodology leverages a nine-level Multi-Level Discrete Wavelet Transform (MDWT) to capture intricate time-frequency artifacts. A key innovation is the quantum-inspired dual-path mapping algorithm, which models parallel signal dependencies and embeds features into a high-dimensional Hilbert space for enhancing geometric separability. To optimize performance, an iterative ensemble selection strategy utilizing Neighborhood Component Analysis (NCA), Chi2, and ReliefF is employed alongside Support Vector Machines and k-Nearest Neighbors. The framework was evaluated across three public datasets with varying class distributions, achieving state-of-the-art peak accuracies of 99.96% with db4 and 99.71% with sym8 wavelets. Even using with the computationally efficient sym4 baseline, the model maintained exceptional detection rates above 98.99% and an equal error rate (EER) as low as 0.14%. VoxVeritasNet operates with a processing throughput of 6.45 segments per second on standard CPU hardware with a negligible storage footprint, offering a lightweight and explainable alternative to resource-intensive deep learning architectures.