A trio-based feature extraction framework for bird sounds classification


Creative Commons License

ÇELİK B., Akbal A.

APPLIED ACOUSTICS, cilt.242, 2026 (SCI-Expanded, Scopus) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 242
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1016/j.apacoust.2025.111064
  • Dergi Adı: APPLIED ACOUSTICS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Communication & Mass Media Index, Compendex, ICONDA Bibliographic, INSPEC, DIALNET
  • Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
  • Kocaeli Üniversitesi Adresli: Evet

Özet

Bird species identification is crucial for environmental monitoring, ecological studies, and species tracking. Automated bird sound classification systems have been developed to achieve precise species detection. While deep learning models offer high accuracy, their computational complexity poses challenges for resource-limited environments. To address this, we propose a novel lightweight and highly accurate bird sound classification model utilizing a multilevel feature generation framework named AvisPat, derived from the Latin term "Avis" (bird), emphasizing its focus on avian bioacoustics. The AvisPat model leverages a 7-level discrete wavelet transform (DWT) to decompose audio signals, extracting signum, upper ternary, and lower ternary features to capture diverse signal attributes. For feature selection, an enhanced iterative Neighborhood Component Analysis (NCA) and ReliefF methods are applied iteratively to select the most discriminative features, generating multiple feature subsets. These features are classified using k-Nearest Neighbor (k-NN) and Support Vector Machine (SVM) classifiers. In addition, the proposed model achieved 96.72% accuracy on a separate Xeno-Canto dataset containing 10 bird species from diverse geographic regions, demonstrating strong generalization capability. The 'trio' in AvisPat is chosen because the combination of signum, ternary features extracted via 7-level discrete wavelet transform comprehensively captures the time, frequency, and amplitude aspects of bird sounds, enhancing the model's ability to distinguish between species with high accuracy.