Concept-LDA: Incorporating Babelfy into LDA for aspect extraction


Journal of Information Science, cilt.46, ss.406-418, 2020 (SCI Expanded İndekslerine Giren Dergi) identifier identifier

  • Cilt numarası: 46 Konu: 3
  • Basım Tarihi: 2020
  • Doi Numarası: 10.1177/0165551519845854
  • Dergi Adı: Journal of Information Science
  • Sayfa Sayıları: ss.406-418


© The Author(s) 2019.Latent Dirichlet allocation (LDA) is one of the probabilistic topic models; it discovers the latent topic structure in a document collection. The basic assumption under LDA is that documents are viewed as a probabilistic mixture of latent topics; a topic has a probability distribution over words and each document is modelled on the basis of a bag-of-words model. The topic models such as LDA are sufficient in learning hidden topics but they do not take into account the deeper semantic knowledge of a document. In this article, we propose a novel method based on topic modelling to determine the latent aspects of online review documents. In the proposed model, which is called Concept-LDA, the feature space of reviews is enriched with the concepts and named entities, which are extracted from Babelfy to obtain topics that contain not only co-occurred words but also semantically related words. The performance in terms of topic coherence and topic quality is reported over 10 publicly available datasets, and it is demonstrated that Concept-LDA achieves better topic representations than an LDA model alone, as measured by topic coherence and F-measure. The learned topic representation by Concept-LDA leads to accurate and an easy aspect extraction task in an aspect-based sentiment analysis system.