IFIT: an unsupervised discretization method based on the Ramer-Douglas-Peucker algorithm


Mutlu A., Göz F., Akbulut O.

TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, cilt.27, sa.3, ss.2344-2360, 2019 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 27 Sayı: 3
  • Basım Tarihi: 2019
  • Doi Numarası: 10.3906/elk-1806-192
  • Dergi Adı: TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, TR DİZİN (ULAKBİM)
  • Sayfa Sayıları: ss.2344-2360
  • Anahtar Kelimeler: Unsupervised discretization, the Ramer-Douglas-Peucker algorithm, polyline simplification, the standard error of the estimate
  • Kocaeli Üniversitesi Adresli: Evet

Özet

Discretization is the process of converting continuous values into discrete values. It is a preprocessing step of several machine learning and data mining algorithms and the quality of discretization may drastically affect the performance of these algorithms. In this study we propose a discretization algorithm, namely line fitting-based discretization (IFIT), based on the Ramer-Douglas-Peucker algorithm. It is a static, univariate, unsupervised, splitting-based, global, and incremental discretization method where intervals are determined based on the Ramer-Douglas- Peucker algorithm and the quality of partitioning is assessed based on the standard error of the estimate. To evaluate the performance of the proposed method, a set of experiments are conducted on ten benchmark datasets and the achieved results are compared to those obtained by eight state-of-the-art discretization methods. Experimental results show that IFIT achieves higher predictive accuracy and produces less number of inconsistency while it generates larger number of intervals. The obtained results are also validated through Friedman's test and Holm's post hoc test which revealed the fact that IFIT produces discretization schemes that statistically comply both with supervised and unsupervised discretization methods.