N-Gram Pattern Recognition using Multivariate-Bernoulli Model with Smoothing Methods for Text Classification


Kilimci Z. H., Akyokus S.

24th Signal Processing and Communication Application Conference (SIU), Zonguldak, Türkiye, 16 - 19 Mayıs 2016, ss.597-600 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Cilt numarası:
  • Doi Numarası: 10.1109/siu.2016.7495811
  • Basıldığı Şehir: Zonguldak
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.597-600
  • Anahtar Kelimeler: Naive Bayes, text pattern recognition, smoothing methods, classification, n-gram, LANGUAGE, PROBABILITIES
  • Kocaeli Üniversitesi Adresli: Hayır

Özet

In this paper, we mainly study on n-gram models on text classification domain. In order to measure impact of n-gram models on the classification performance, we carry out Naive Bayes classifier with various smoothing methods. Naive Bayes classifier has generally used two main event models for text classification which are Bernoulli and multinomial models. Researchers usually address multinomial model and Laplace smoothing on text classification and similar domains. The objective of this study is to demonstrate the classification performance of event models of Naive Bayes by analyzing both event models with different smoothing methods and using n-gram models from a different perspective. In order to find various patterns between two event models, we carry on experiments a large Turkish dataset. Experiment results indicate that Bernoulli event model with an appropriate smoothing method can outperform on most of the n-gram models.