CLUSTERING HIGH-DIMENSIONAL DATA USING HUBNESS-BASED APPROACHES

International Journal of Computer Science (IJCS Journal) Published by SK Research Group of Companies (SKRGC) Scholarly Peer Reviewed Research Journals

Format: Volume 3, Issue 1, No 5, 2015.

Copyright: All Rights Reserved ©2015

Year of Publication: 2015

Author: SENTHILKUMAR.G,NANTHINI.M

Reference:IJCS-094

View PDF Format

Abstract

Clustering is an unsupervised process of grouping elements together, so that elements assigned to the same cluster are more similar to each other than to the remaining data points. High-dimensional data takes place in many fields. Clustering process is because of sparsity, also growing complexity in unique distances between data points. Here capture an original perception on the trouble of clustering high-dimensional data. To neglect the curse of dimensionality by scrutinizing a lower dimensional feature subspace, hold dimensionality by taking advantage of inherently high-dimensional phenomena. Exclusively, using hubness. Validate our hypothesis by demonstrating that hubness is a good measure of point centrality within a high-dimensional data cluster, and by proposing several hubness-based clustering algorithms, showing that major hubs can be used effectively as cluster prototypes or as guides during the search for centroid-based cluster configurations. The proposed method called “Neighbor clustering”, which takes as input measures of correspondence between pairs of data points. Experimental results demonstrate good performance of our algorithms in multiple settings.

References

[1] Kailing, H.-P. Kriegel, and P. Kroger(2004).”Density-Connected Subspace Clustering for High-Dimensional Data,”Proc. Fourth SIAM Int’l Conf. Data Mining (SDM), pp. 246-257 [2] NenadTomasave,Milo’sRadovanovic,DunjaMladenic,andMirjana Ivanovic(2014),”The Role of hubness in clustering high dimensional data “ IEEE Transactions on knowledge and data engineering VOL. 26. [3] C. Aggarwaland P.S. Yu (2000), “Finding Generalized Projected Clusters in High Dimensional Spaces,” Proc. 26th ACM SIGMOD Int’l Conf. Management of Data[4] I.S. Dhillon, Y. Guan, and B. Kulis,(2004) “Kernel k-Means: Spectral Clustering and Normalized Cuts,” Proc. 10th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, pp. 551-556 [5] Kaba´n, “Non-Parametric Detection of Meaningless Distances in High Dimensional Data,” Statistics and Computing, vol. 22, no. 2, pp. 375-385, 2012 [6] Han and M. Kamber, Data Mining: Concepts and Techniques, second ed. Morgan Kaufmann, 2006. R.J. Durrant and A. Kaba´n, “When Is ‘Nearest Neighbour’ Meaningful: A Converse Theorem and Implications,” J. Complexity, vol. 25, no. 4, pp. 385-397. [7] E. Agirre, D. Martı´nez, O.L. de Lacalle, and A. Soroa, “Two Graph-Based Algorithms for State-of-the-Art WSD,” Proc. Conf. Empirical Methods in Natural Language Processing (EMNLP), pp. 585-593, 2006 [8] Ning, H. Ng, S. Srihari, H. Leong, and A. Nesvizhskii, “Examination of the Relationship between Essential Genes in PPI Network and Hub Proteins in Reverse Nearest Neighbor Topology,” BMC Bioinformatics, vol. 11, pp. 1-14, 2010.


Keywords

Clustering, curse of dimensionality, nearest neighbors, hubs.

This work is licensed under a Creative Commons Attribution 3.0 Unported License.   

TOP
Facebook IconYouTube IconTwitter IconVisit Our Blog