Spoken Accent Detection in English Using Audio-Based Transformer Models


Öztürk O., Kilimci H., Kılınç H., Kilimci Z. H.

International Conference on Computer Science and Engineering (UBMK), Antalya, Türkiye, 26 - 28 Ekim 2024, ss.539-544, (Tam Metin Bildiri)

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/ubmk63289.2024.10773414
  • Basıldığı Şehir: Antalya
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.539-544
  • Kocaeli Üniversitesi Adresli: Evet

Özet

Accent detection is a critical aspect in the de-velopment of sophisticated speech recognition systems, as it enables these systems to better understand and process the linguistic nuances of speakers from diverse backgrounds. By accurately identifying accents, speech recognition technologies can improve their accuracy and accessibility, providing more personalized and effective communication tools. Moreover, the study of accents contributes to a deeper understanding of linguistic diversity and cultural expression, offering insights into the variations and commonalities within the English language as spoken globally. In this study, the objective is to detect English accents spoken by individuals from different nationalities, in-cluding Indian, Arabic, Chinese, British, American, and African. To accomplish this, a comprehensive dataset is carefully curated from YouTube videos, featuring a wide range of contexts such as conferences and interviews. These recordings are selected to represent six distinct classes of English accents, providing a diverse and representative sample for analysis. The research employs advanced audio-based transformer models, specifically Unispeech, Wav2Vec2, SEW, AST, and HuBERT, to process and classify the audio data. These state-of-the-art models are chosen for their ability to capture complex acoustic features and linguistic patterns. The experimental results reveal that the Wav2Vec2 model, in particular, achieves significant performance in the accurate identification o f spoken accents. T his highlights its potential as a valuable tool in the field of accent detection, offering promising applications in both academic research and practical technology development.