Articles written in Sadhana
Volume 45 All articles Published: 15 May 2020 Article ID 0125
Clustering of tweets is useful for analyzing the attitudes of people towards a particular product. The companies can use this analysis to modify their products to meet the needs of people. Recently, K-means clustering is widely used to cluster the tweets with bag of words as a feature set. The key factors contributing to the quality of clusters and performance of clustering are dimensionality reduction and initial selection of centroids. This paper addresses these issues using a newly proposed Tag Score (T_S) model with improved K-means in which semantically similar features from bag of words are grouped into tags, scores are modified based on sentiment polarity values and the initial centroids are selected with the help of sentiment scores. The performance of the proposed T_S model with improved K-means is compared with T_S model with random K-means and conventional word vectors with random K-means by considering three labeled datasets and three unlabeled datasets. The results show that the proposed method produces significant results in approximately 70% of the cases in terms of purity, F-measure, intra-cluster distance and inter-cluster distance.