A Novel Semantic Smoothing Method based on Higher Order Paths for Text Classification

12th IEEE International Conference on Data Mining (ICDM), Brussels, Belçika, 10 - 13 Aralık 2012, ss.615-624

Yayın Türü: Bildiri / Tam Metin Bildiri
Cilt numarası:
Doi Numarası: 10.1109/icdm.2012.109
Basıldığı Şehir: Brussels
Basıldığı Ülke: Belçika
Sayfa Sayıları: ss.615-624
Kocaeli Üniversitesi Adresli: Hayır

Özet

It has been shown that Latent Semantic Indexing (LSI) takes advantage of implicit higher-order (or latent) structure in the association of terms and documents. Higher-order relations in LSI capture "latent semantics". Inspired by this, a novel Bayesian framework for classification named Higher Order Naive Bayes (HONB), which can explicitly make use of these higher-order relations, has been introduced previously. We present a novel semantic smoothing method named Higher Order Smoothing (HOS) for the Naive Bayes algorithm. HOS is built on a similar graph based data representation of HONB which allows semantics in higher-order paths to be exploited. Additionally, we take the concept one step further in HOS and exploited the relationships between instances of different classes in order to improve the parameter estimation when dealing with insufficient labeled data. As a result, we have not only been able to move beyond instance boundaries, but also class boundaries to exploit the latent information in higher-order paths. The results of our extensive experiments demonstrate the value of HOS on several benchmark datasets.