A Comparative Assessment of Various Embeddings for Keyword Extraction

Ashqar G., MUTLU A.

5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications, HORA 2023, İstanbul, Türkiye, 8 - 10 Haziran 2023, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Doi Numarası: 10.1109/hora58378.2023.10156762
Basıldığı Şehir: İstanbul
Basıldığı Ülke: Türkiye
Anahtar Kelimeler: comparative study, keyword extraction, word embeddings
Kocaeli Üniversitesi Adresli: Evet

Özet

Automatic keyword extraction from a text document is the problem of identifying in-text words or phrases that best describe the content of the text document. Recently, word embeddings found application in keyword extraction as they improve the performance by incorporating semantic information. In this study, we focus various embeddings and and compare their performance in keyword extraction. To this aim, firstly, we modified a keyword extraction system called KeyBERT to work with different embeddings. Then, we run the modfied application using ten models on seven benchmark datasets. The experimental findings show that all-mpnet-base-v2 achieved statistically better results over the other models in precision, recall, and F1 score. Moreover, all-mpnet-base-v2 achieved highest scores for MAP and MRR and also retrieved the most number of relevant keywords on the average.