Concept-LDA: Incorporating Babelfy into LDA for aspect extraction


Ekinci E., İLHAN OMURCA S.

Journal of Information Science, cilt.46, sa.3, ss.406-418, 2020 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 46 Sayı: 3
  • Basım Tarihi: 2020
  • Doi Numarası: 10.1177/0165551519845854
  • Dergi Adı: Journal of Information Science
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Scopus, Academic Search Premier, FRANCIS, IBZ Online, ABI/INFORM, Aerospace Database, Analytical Abstracts, Applied Science & Technology Source, Business Source Elite, Business Source Premier, Communication Abstracts, Compendex, Computer & Applied Sciences, EBSCO Education Source, Education Abstracts, Index Islamicus, Information Science and Technology Abstracts, INSPEC, Library and Information Science Abstracts, Library Literature and Information Science, Metadex, Civil Engineering Abstracts, Library, Information Science & Technology Abstracts (LISTA)
  • Sayfa Sayıları: ss.406-418
  • Anahtar Kelimeler: Aspect extraction, Babelfy, latent Dirichlet allocation, semantic knowledge, topic modelling, ONLINE, MODEL
  • Kocaeli Üniversitesi Adresli: Evet

Özet

© The Author(s) 2019.Latent Dirichlet allocation (LDA) is one of the probabilistic topic models; it discovers the latent topic structure in a document collection. The basic assumption under LDA is that documents are viewed as a probabilistic mixture of latent topics; a topic has a probability distribution over words and each document is modelled on the basis of a bag-of-words model. The topic models such as LDA are sufficient in learning hidden topics but they do not take into account the deeper semantic knowledge of a document. In this article, we propose a novel method based on topic modelling to determine the latent aspects of online review documents. In the proposed model, which is called Concept-LDA, the feature space of reviews is enriched with the concepts and named entities, which are extracted from Babelfy to obtain topics that contain not only co-occurred words but also semantically related words. The performance in terms of topic coherence and topic quality is reported over 10 publicly available datasets, and it is demonstrated that Concept-LDA achieves better topic representations than an LDA model alone, as measured by topic coherence and F-measure. The learned topic representation by Concept-LDA leads to accurate and an easy aspect extraction task in an aspect-based sentiment analysis system.