MGRank: A keyword extraction system based on multigraph GoW model and novel edge weighting procedure


GÖZ F., MUTLU A.

KNOWLEDGE-BASED SYSTEMS, cilt.251, 2022 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 251
  • Basım Tarihi: 2022
  • Doi Numarası: 10.1016/j.knosys.2022.109292
  • Dergi Adı: KNOWLEDGE-BASED SYSTEMS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Applied Science & Technology Source, Computer & Applied Sciences, INSPEC, Library and Information Science Abstracts, Library, Information Science & Technology Abstracts (LISTA)
  • Anahtar Kelimeler: Keywordextraction, Multigraph, Completegraph, Windowsize, Edgeweighting
  • Kocaeli Üniversitesi Adresli: Evet

Özet

Keyword extraction is the process of extracting the most descriptive words from a textual document. State-of-the-art graph-based keyword extraction systems generally represent text documents using a simple graph called a graph-of-words (GoW), based on the sliding window concept. This representation of a text document requires determining a proper window size, models the document on a local scale, and allows the establishment of a single relation between two candidate keywords. In this study, we address these problems and propose a keyword extraction system called MGRank which uses a complete multigraph structure to build a GoW model to represent a text document. The completeness property of the proposed GoW model provides a means to represent a document globally and eliminates the need to determine the window-size parameter. Parallel edges allow the establishment of multiple relations between candidate keywords. In this study, we also propose a new edge-weighting procedure based on the positional distance of candidate keywords. To evaluate the performance of MGRank, we performed experiments on seven benchmark datasets and compared the results with those of six baseline methods. The experimental results show that MGRank outperforms the baseline methods statistically in precision, recall, and F1-score in almost all cases. In terms of mean average precision and mean reciprocal rank, MGRank performs statistically better than node ranking-based and statistical baseline methods and achieves on-par results with topic-based baseline methods. Furthermore, the experimental results showed that MGRank extracted the most relevant keywords. (C) 2022 Elsevier B.V. All rights reserved.