Feature selection using feature dissimilarity measure and density-based clustering: Application to biological data
Debarka Sengupta Indranil Aich Sanghamitra Bandyopadhyay
Click here to view fulltext PDF
Reduction of dimensionality has emerged as a routine process in modelling complex biological systems. A large number of feature selection techniques have been reported in the literature to improve model performance in terms of accuracy and speed. In the present article an unsupervised feature selection technique is proposed, using maximum information compression index as the dissimilarity measure and the well-known density-based cluster identification technique DBSCAN for identifying the largest natural group of dissimilar features. The algorithm is fast and less sensitive to the user-supplied parameters. Moreover, the method automatically determines the required number of features and identifies them. We used the proposed method for reducing dimensionality of a number of benchmark data sets of varying sizes. Its performance was also extensively compared with some other well-known feature selection methods.
Debarka Sengupta1 Indranil Aich2 Sanghamitra Bandyopadhyay3
Volume 48, 2023
Continuous Article Publishing mode
Click here for Editorial Note on CAP Mode