Enhancing Arabic Information Retrieval for Question Answering


Alghamdi M., Abushawarib M., Ellouh M., GHALEB M. M. S., Felemban M.

7th International Conference on Future Networks and Distributed Systems, ICFNDS 2023, Dubai, Birleşik Arap Emirlikleri, 21 - 22 Aralık 2023, ss.366-371, (Tam Metin Bildiri) identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1145/3644713.3644763
  • Basıldığı Şehir: Dubai
  • Basıldığı Ülke: Birleşik Arap Emirlikleri
  • Sayfa Sayıları: ss.366-371
  • Anahtar Kelimeler: Information Retrieval, Natural Language Processing
  • Kocaeli Üniversitesi Adresli: Evet

Özet

In the modern landscape of Natural Language Processing (NLP), intelligent chatbots like ChatGPT 3.5 and Google's Bard have shown remarkable competence in generic question-answering (QA) tasks. However, their performance falters when navigating domain-specific QA, particularly in the Arabic language, which is celebrated for its complex morphology and syntax. This paper presents a comprehensive approach to address these issues. The aim of this research is to build a chatbot tailored for a university community. We first create an extensive Arabic Q&A dataset by extracting data from academic documents, employing state-of-the-art Optical Character Recognition (OCR) tools. Then, we evaluate multiple text similarity measures like Pooled FastText Word embedding, BM25 ranking functions, and various semantic sentence embedding models. A thorough performance assessment reveals that the domain-specific model excels at both sentence-level similarity and context-relevance tasks. The developed web application chatbot, leveraging LangChain library and Retrieval Augmented Generation (RAG) methods, outperforms existing chatbots in domain-specific, Arabic language QA scenarios.