• Single pass kernel 𝑘-means clustering method

• # Keywords

Data mining; unsupervised classiﬁcation; kernel 𝑘-means clustering method; gradient descent method.

• # Abstract

In unsupervised classiﬁcation, kernel 𝑘-means clustering method has been shown to perform better than conventional 𝑘-means clustering method in identifying non-isotropic clusters in a data set. The space and time requirements of this method are $O(n^2)$, where 𝑛 is the data set size. Because of this quadratic time complexity, the kernel 𝑘-means method is not applicable to work with large data sets. The paper proposes a simple and faster version of the kernel 𝑘-means clustering method, called single pass kernel k-means clustering method. The proposed method works as follows. First, a random sample $\mathcal{S}$ is selected from the data set $\mathcal{D}$. A partition $\Pi_{\mathcal{S}}$ is obtained by applying the conventional kernel 𝑘-means method on the random sample $\mathcal{S}$. The novelty of the paper is, for each cluster in $\Pi_{\mathcal{S}}$, the exact cluster center in the input space is obtained using the gradient descent approach. Finally, each unsampled pattern is assigned to its closest exact cluster center to get a partition of the entire data set. The proposed method needs to scan the data set only once and it is much faster than the conventional kernel 𝑘-means method. The time complexity of this method is $O(s^2+t+nk)$ where 𝑠 is the size of the random sample $\mathcal{S}$, 𝑘 is the number of clusters required, and 𝑡 is the time taken by the gradient descent method (to ﬁnd exact cluster centers). The space complexity of the method is $O(s^2)$. The proposed method can be easily implemented and is suitable for large data sets, like those in data mining applications. Experimental results show that, with a small loss of quality, the proposed method can signiﬁcantly reduce the time taken than the conventional kernel 𝑘-means clustering method. The proposed method is also compared with other recent similar methods.

• # Author Affiliations

1. Department of Computer Science and Engineering, Srinivasa Ramanujan Institute of Technology, Anantapur 515701, India
2. Department of Computer Science and Engineering, Rajeev Gandhi Memorial College of Engineering and Technology, Nandyal 518501, India
3. Department of Computer Science and Engineering, Jawaharlal Nehru Technological University, Anantapur College of Engineering, Anantapur 515002, India