Topic Classification of Text-Based Lesson Questions in Turkish with BERTurk


Doğan A. A., SAYAR A., Çetiner İ.

9th International Conference on Mining Intelligence and Knowledge Exploration, MIKE 2023, Kristiansand, Norveç, 28 - 30 Haziran 2023, cilt.13924 LNAI, ss.87-94 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Cilt numarası: 13924 LNAI
  • Doi Numarası: 10.1007/978-3-031-44084-7_9
  • Basıldığı Şehir: Kristiansand
  • Basıldığı Ülke: Norveç
  • Sayfa Sayıları: ss.87-94
  • Anahtar Kelimeler: Bert Model, Data Preprocessing, Intention Classification, Natural Language Processing, NLP, Topic Classification
  • Kocaeli Üniversitesi Adresli: Evet

Özet

With the corona virus pandemic, social contact has been kept to a minimum, and education in schools can be carried out remotely. As a result of this, the concept of distance education has gained importance. In this study, natural language processing (NLP) and its effects on distance education are discussed, and by using NLP, a topic classification system is proposed. Classification is applied to text-based lesson questions in Turkish language. In this way, the questions asked by the students can be quickly directed to the teachers in the relevant specialty through a system to be designed, and the processes can be accelerated. In the data preparation phase, the real-world lesson questions were collected and converted from image to text using the EasyOCR library, and topic classification was performed on the data set using the Berturk model. Since the image-to-text method was used in the data set preparation phase, we encountered some noise in the data. To clean the data, different data preprocessing and cleaning techniques are applied. Finally, the training has been performed, and accuracy rates are presented.