A Neighborhood Merging Policy Based on Shannon’s Entropy and Symmetric Relative Entropy for Non-convex Cluster Detection


Ünver M., Erginel N.

International Conference on Intelligent and Fuzzy Systems, INFUS 2021, İstanbul, Turkey, 24 - 26 August 2021, vol.307, pp.44-51 identifier

  • Publication Type: Conference Paper / Full Text
  • Volume: 307
  • Doi Number: 10.1007/978-3-030-85626-7_6
  • City: İstanbul
  • Country: Turkey
  • Page Numbers: pp.44-51
  • Keywords: Clustering, Jensen-Shannon Divergence, Neighborhood merging, Non-convex clusters, Shannon’s entropy
  • Kocaeli University Affiliated: No

Abstract

© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.Detection of non-convex and/or linearly inseparable clusters in datasets is one of the essentials for the robustness of a clustering algorithm. Some density based approaches and graph theory based approaches are able to handle that issue along with an input parameter of estimation problem. Contrarily to these approaches, in order to overcome arbitrary shaped cluster detection problem, a parameter free methodology is proposed based on neighborhood merging strategy upon statistical concepts: Shannon’s Entropy and Jensen-Shannon Divergence. Entropy refers to disorder in information theory and it is defined as average rate at which data is produced by a stochastic source. Similarly, entropy increment or information loss between two distributions can be defined as relative entropy (Kullback-Leibler Divergence). In this study, a novel clustering methodology is proposed in which the main idea is that determining a set of vectors in minimal entropy level and extending it with similar neighbor sets without surpassing a dynamic threshold value in iterations to construct clusters. The similarity between a set and its neighborhoods is measured by symmetric relative entropy. (Jensen-Shannon Divergence) Then, the proposed clustering methodology is experimentally analyzed on synthetic datasets, that have non-convex and/or linearly inseparable clusters and its performance is tested by some performance metrics. The experimental analysis demonstrated that the proposed methodology presents promising results in terms of clustering schema quality and validation indexes.