Enhancing Arabic Information Retrieval for Question Answering


Alghamdi M., Abushawarib M., Ellouh M., GHALEB M. M. S., Felemban M.

7th International Conference on Future Networks and Distributed Systems, ICFNDS 2023, Dubai, United Arab Emirates, 21 - 22 December 2023, pp.366-371, (Full Text) identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1145/3644713.3644763
  • City: Dubai
  • Country: United Arab Emirates
  • Page Numbers: pp.366-371
  • Keywords: Information Retrieval, Natural Language Processing
  • Kocaeli University Affiliated: Yes

Abstract

In the modern landscape of Natural Language Processing (NLP), intelligent chatbots like ChatGPT 3.5 and Google's Bard have shown remarkable competence in generic question-answering (QA) tasks. However, their performance falters when navigating domain-specific QA, particularly in the Arabic language, which is celebrated for its complex morphology and syntax. This paper presents a comprehensive approach to address these issues. The aim of this research is to build a chatbot tailored for a university community. We first create an extensive Arabic Q&A dataset by extracting data from academic documents, employing state-of-the-art Optical Character Recognition (OCR) tools. Then, we evaluate multiple text similarity measures like Pooled FastText Word embedding, BM25 ranking functions, and various semantic sentence embedding models. A thorough performance assessment reveals that the domain-specific model excels at both sentence-level similarity and context-relevance tasks. The developed web application chatbot, leveraging LangChain library and Retrieval Augmented Generation (RAG) methods, outperforms existing chatbots in domain-specific, Arabic language QA scenarios.