Enhancing Arabic Information Retrieval for Question Answering

Alghamdi M., Abushawarib M., Ellouh M., GHALEB M. M. S., Felemban M.

7th International Conference on Future Networks and Distributed Systems, ICFNDS 2023, Dubai, Birleşik Arap Emirlikleri, 21 - 22 Aralık 2023, ss.366-371, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Doi Numarası: 10.1145/3644713.3644763
Basıldığı Şehir: Dubai
Basıldığı Ülke: Birleşik Arap Emirlikleri
Sayfa Sayıları: ss.366-371
Anahtar Kelimeler: Information Retrieval, Natural Language Processing
Kocaeli Üniversitesi Adresli: Evet

Özet

In the modern landscape of Natural Language Processing (NLP), intelligent chatbots like ChatGPT 3.5 and Google's Bard have shown remarkable competence in generic question-answering (QA) tasks. However, their performance falters when navigating domain-specific QA, particularly in the Arabic language, which is celebrated for its complex morphology and syntax. This paper presents a comprehensive approach to address these issues. The aim of this research is to build a chatbot tailored for a university community. We first create an extensive Arabic Q&A dataset by extracting data from academic documents, employing state-of-the-art Optical Character Recognition (OCR) tools. Then, we evaluate multiple text similarity measures like Pooled FastText Word embedding, BM25 ranking functions, and various semantic sentence embedding models. A thorough performance assessment reveals that the domain-specific model excels at both sentence-level similarity and context-relevance tasks. The developed web application chatbot, leveraging LangChain library and Retrieval Augmented Generation (RAG) methods, outperforms existing chatbots in domain-specific, Arabic language QA scenarios.