Application of Paragraph Vectors to News and Tweet Data

Celenli H. I.

26th IEEE Signal Processing and Communications Applications Conference (SIU), İzmir, Türkiye, 2 - 05 Mayıs 2018 identifier identifier


Machine learning methods can be used easily and quickly because of the rapid development of technology, high hardware features and low price. However, these methods are insufficient by increasing the amount of data. Deep Learning algorithms, which are considered as a sub-branch of machine learning, are able to produce better results by optimizing the data in a fast and good way, where the machine learning is insufficient. With the increase in the amount of data, the analysis studies on the texts have increased. In our study, we analyzed Doc2Vec model using Paragraph Vectors via news and tweet data. K-Nearest Neighbors (KNN) The Doc2Vec model is compared with the Term Frequency-Reverse Document Frequency (TF-IDF) using Multiple Naive Bayes (MNB), Support Vector Machines (SVM) and Nearest Centroid (CN) classifiers.