An Annotated Corpus for Turkish Sentiment Analysis at Sentence Level

Omurca S., Ekinci E., Turkmen H.

2017 International Artificial Intelligence and Data Processing Symposium (IDAP), Malatya, Türkiye, 16 - 17 Eylül 2017, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Cilt numarası:
Doi Numarası: 10.1109/idap.2017.8090212
Basıldığı Şehir: Malatya
Basıldığı Ülke: Türkiye
Anahtar Kelimeler: Aspect based sentiment analysis, Turkish Language, text mining, morphological analysis, annotation, JSON data
Kocaeli Üniversitesi Adresli: Evet

Özet

With the rapid growth of unstructured data accessible via web, managing these data and finding undiscovered information in huge dataset become a necessary task. Consequently text mining, which can be defined as gleaning important information from natural language text, has emerged. In this study, in order to facilitate information management for aspect based sentiment analysis studies, a Turkish sentiment corpus, which is comprised of user reviews and is annotated semi-automatically, is constructed. In the constructed corpus, the root form of the words, the usage (aspect/multiaspect/seedsentiment/absent) of these words, Part of Speech (POS) tags and their polarities are defined. Turkish hotel review dataset which contains 1000 reviews and 5364 sentences for this study was crawled from a web source. The system takes reviews, aspect and seedsentiment lists and returns JSON data structures of the annotated corpus. In this paper, both we provide a ready to use dataset for developing aspect based sentiment analysis applications and we make this dataset easy to use for Java applications by creating JSON data.