Enhancing Arabic Information Retrieval for Question Answering
7th International Conference on Future Networks and Distributed Systems, ICFNDS 2023, Dubai, Birleşik Arap Emirlikleri, 21 - 22 Aralık 2023, ss.366-371, (Tam Metin Bildiri)
- Yayın Türü: Bildiri / Tam Metin Bildiri
- Doi Numarası: 10.1145/3644713.3644763
- Basıldığı Şehir: Dubai
- Basıldığı Ülke: Birleşik Arap Emirlikleri
- Sayfa Sayıları: ss.366-371
- Anahtar Kelimeler: Information Retrieval, Natural Language Processing
- Kocaeli Üniversitesi Adresli: Evet
Özet
In the modern landscape of Natural Language Processing (NLP), intelligent chatbots like ChatGPT 3.5 and Google's Bard have shown remarkable competence in generic question-answering (QA) tasks. However, their performance falters when navigating domain-specific QA, particularly in the Arabic language, which is celebrated for its complex morphology and syntax. This paper presents a comprehensive approach to address these issues. The aim of this research is to build a chatbot tailored for a university community. We first create an extensive Arabic Q&A dataset by extracting data from academic documents, employing state-of-the-art Optical Character Recognition (OCR) tools. Then, we evaluate multiple text similarity measures like Pooled FastText Word embedding, BM25 ranking functions, and various semantic sentence embedding models. A thorough performance assessment reveals that the domain-specific model excels at both sentence-level similarity and context-relevance tasks. The developed web application chatbot, leveraging LangChain library and Retrieval Augmented Generation (RAG) methods, outperforms existing chatbots in domain-specific, Arabic language QA scenarios.