A Method for Similarity Detection in Vector Space by Summarizing News Articles Haber Metinlerinin Özetlenerek Vektör Uzayinda Benzerlik Tespiti Uçin Bir Yöntem


Torun H., İNNER A. B.

30th Signal Processing and Communications Applications Conference, SIU 2022, Safranbolu, Türkiye, 15 - 18 Mayıs 2022 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/siu55565.2022.9864677
  • Basıldığı Şehir: Safranbolu
  • Basıldığı Ülke: Türkiye
  • Anahtar Kelimeler: annoy, approximate nearest neighbor search, automatic text summarization, doc2vec, similarity detection
  • Kocaeli Üniversitesi Adresli: Evet

Özet

© 2022 IEEE.With the spread of internet journalism, traditional media organs are undergoing a huge transformation. News articles on the internet are generally published by different sources with similar content and it is difficult to reach the content of the news quickly for the visitors. Clickbait headlines are used to be visited by more users. It becomes harder for visitors to reach the basic elements of journalism, such as what, when, where, how and who. In this study, 28.000 news which were published in January 2022 from 5 different news sources in Turkish were collected through a web scraper developed by us and summarized with the TextRank algorithm. A method has been developed to detect similar news articles from different sources by creating news vectors using raw and summarized news texts, and approximate nearest neighbors searching among news vectors. To measure the success of the system, a web-based voting system has been developed and evaluated by experts.