An Improved Feature Subset Selection with Correlation Measures using Clustering Technique

International Journal of Computer Science (IJCS Journal) Published by SK Research Group of Companies (SKRGC) Scholarly Peer Reviewed Research Journals

Format: Volume 2, Issue 1, No 4, 2014.

Copyright: All Rights Reserved ©2014

Year of Publication: 2014

Author: Ms.S.Sasikala

Reference:IJCS-050

View PDF Format

Abstract

Clustering techniques are used to partition the transaction data values. Vector based similarity models are suitable for low dimensional data values. High dimensional data values are clustered using subspace clustering method. Feature selection involves identifying a subset of the most useful features that produces compatible results as the original set of features. In this paper the clustering technique are used to improve the feature subset selection with correlation measure. Based on these criteria, a fast clustering based feature selection algorithm, FAST, is proposed and experimentally evaluated in this paper. A feature selection algorithm is constructed with the consideration of efficiency and effectiveness factor. The efficiency concerns the time required to find a subset of features. The effectiveness is related to the quality of the subset of features. Fast clustering-based feature selection algorithm (FAST) is used to cluster the high dimensional data. The feature selection process is improved with correlation measures. Redundant feature filtering mechanism is used to filter the similar features and also the custom threshold is used to improve the clustering accuracy

References

[1] M.A. Hall.. “Correlation-Based Feature Selection for Discrete and Numeric class Machine Learning”, In Proceeding of 17th International Conference on Machine Learning, pp 359-366,2000. [2] D.A. Bell and H. Wang., “Formalism for Relevance and its Application in Feature Subset Selection,“ Machine Learning, 41(2), pp 175-195,2000. [3] M. Dash, H. Liu and H. Motoda ., “Consistency based Feature Selection”, In Proceeding of the fourth Pacific Asia Conference on Knowledge Discovery and Data Mining, pp 98-109,2000. [4] S. Das., “Filters, Wrappers And A Boosting Based Hybrid For Feature Selection”, In Proceeding Of The 18th International Conference On Machine Learning, pp 74-81, 2001. [5] I.S Dhillon., S .Mallela. and R .Kumar., “A Divise Information Theoretic Feature Clustering Algorithm For Text Classification,” J. Mach. Learn. Res., 3,, pp 1265-1287, 2003. [6] F .Fleuret., “Fast Binary Feature Selection with Conditional Mutual Information, Journal of Machine” Learning Research, 5,Pp 1531-1555, 2004. [7] G. Forman., “An Extensive Empirical Study of Feature Selection Metrics for Text Classification”. Journal of Machine Learning Research, 3, pp 1289-1305, 2003. [8] F. Herrera, and S. Garcia., “An Extension on Statistical Comparison of Classifiers over Multiple Datasets for All Pairwise Comparisons”. J.Mach. Learn. Res., 9,pp 2677-2694, 2008. [9]I. Guyn . and A. Elisseeff ., “An Introduction to Variable and Feature Selection”. Journal of Machine Learning Research, 3 pp 1157-1182, 2003. [10] M.A Hall ., “Correlation Based Feature Subset Selection for Machine Learning”, Ph.D Dissertation Waikato, New Zealand: Univ. Waikato,1999. [11] C. Krier C. D. Francois., F. Rossi, and M. Verleysen., “Feature Clustering And Mutual Information For The Selection Of Variables In Spectral Data”. In Proc European Symposium On Artificial Neural Networks Advances In Computational Intelligence And Learning, pp 157-162, 2007. [12] M. Last., A. Kandel. And O. Mainmom., “Information Theoretic Algorithm For Feature Selection, Pattern Recongnitition Letters”, 22(6-7),pp 799-811,2001. [13] L.C Monila, L. Belanche. And A. Nebot., “Feature Selection Algorithms: A Survey and Experimental Evaluation”, In Proc. IEEE Int.Conf. Data Mining. ,pp 306-313, 2002. [14] H. Park, and H. Kwon., “Extended Relief Algorithm In Instance Based Feature Filtering”, In Proceeding Of The Sixth International Conference On Advanced Language Processing And Web Information Technology (ALPTI 2007), pp 123-128, 2007. [15]B. Raman. And T.R Loerger,, “Instance Based Filter for Feature Selection Journal of Machine Learning Research, 1, pp 1-23, 2002. [16] M. Robnik-Sikonja. and I. Kononenko., “Theoritic and Empirical Analysis of Relief and Relief”, Machine Learning Research, 53, pp 23-69, 2003. [17] P. Scanlon., G. Potamiano., “Mutual information based visual feature selection for lip-reading, in int. conf. on spoken language processing, 2004. [18] Scherf M. and Brauer W., Feature Selection By Means Of A Feature Weighting Approach, Technical Report FKI-221-97, Institute Fur Informatics, and Technics Universidad Munched 1997. [19] C. Sha, X.Quiu. And A. Zhou., “Feature Selection Based On A New Dependency Measure,” 2008 Fifth International Conference On Fuzzy Systems And Knowledge Discovery, 1, pp 266-270, 2008. [20] J. Souza., “Feature Selection with A General Hybrid Algorithm, Ph.D, University Of Ottawa, Ottawa, Ontario, Canada, 2004. [21] G. Van Dijk. and M.M Van Hulle .,” Speeding Up the Wrapper Feature Subset Selection in Regression by Mutual Information Relevance and Redundancy Analysis, “International Conference on Artificial Neural Networks, 2006.[22] G.I Webb., “Multi boosting for Combining Boosting and Wagging, Machine Learning, 40(2), pp 159-196, 2000. [23] E. Xing., M. Jordan. And R. Karp, “Feature Selection for High Dimensional Genomic Microarray Data, “In Proceedings of the Eighteenth International Conference on Machine Learning, pp 601-608, 2008. [24] J. Yu., S.S.R. Adipi. And P.H Artes., “A Hybrid Feature Selection Strategy For Image Defining Features: “Towards Interpretation Of Optic Nerve Images, In. Proceedings Of 2005 International Conference On Machine Learning And Cybernetics, 8, pp 5127-5132, 2005. [25] L. Yu. And H. Liu H., “Feature Selection For High Dimensional Data: Fast Correlation Based Filter Solution, “In Proceedings of 20th International Conferences On Machine Learning, 20(2).


Keywords

Feature clustering, Feature subset selection, redundant filtering, filter method.

This work is licensed under a Creative Commons Attribution 3.0 Unported License.   

TOP
Facebook IconYouTube IconTwitter IconVisit Our Blog