Articles written in Sadhana
Volume 43 Issue 11 November 2018 Article ID 0186
Sentiment analysis has become a very useful tool in recent times for studying people’s opinions, sentiments and subjective evaluation of any event of social and economic relevance, and in particular, policy decisions. The present paper proposes a framework for sentiment analysis using twitter data for the ’demonetization’ effort of the Government of India. The paper employs twitter data using Twitter API. The methodology of the paper involves collection of data from twitter from different cities of India using geolocation and preprocessing followed by a lexicon-based approach to analyse users’ sentiments over a period of five weeks preceding the policy announcement. In addition to this, the paper also attempts to analyse the sentiments of specific groups of people representing diverse interest groups.
Volume 45 All articles Published: January 2020 Article ID 0011 Original Article (Computer Sciences)
Feature selection is a critical research problem in data science. The need for feature selection has become more critical with the advent of high-dimensional data sets especially related to text, image and microarray data. In this paper, a graph-theoretic approach with step-by-step visualization is proposed in the context of supervised feature selection. Mutual information criterion is used to evaluate the relevance of the features with respect to the class. A graph-based representation of the input data set, named as feature information map (FIM) is created, highlighting the vertices representing the less informative features. Amongst the more informative features, the inter-feature similarity is measured to draw edges between features having high similarity. At the end, minimal vertex cover is applied on the connected vertices to identify a subset of features potentially havingless similarity among each other. Results of the experiments conducted with standard data sets show that the proposed method gives better results than the competing algorithms for most of the data sets. The proposed algorithm also has a novel contribution of rendering a visualization of features in terms of relevance andredundancy.
Volume 45 All articles Published: March 2020 Article ID 0066 Original Article (Computer Sciences)
Single-Linkage algorithm is a distance-based Hierarchical clustering method that can find arbitrary shaped clusters but is most unsuitable for large datasets because of its high time complexity. The paper proposes an efficient accelerated technique for the algorithm with a merging threshold. It is a two-stage algorithm with the first one as an incremental pre-clustering step that uses the triangle inequality method to eliminate the unnecessary distance computations. The incremental approach makes it suitable for partial clustering of streaming dataalong with the collection. The second step using the property of the Single-Linkage algorithm itself takes a clustering decision without comparing all the patterns. This method shows how the neighbourhood between the input patterns can be used as a tool to accelerate the algorithm without hampering the cluster quality. Experiments are conducted with various standard and large real datasets and the result confirms its effectiveness for large datasets.
Volume 45 All articles Published: 23 September 2020 Article ID 0242
In this work, a graph-based approach has been adopted for feature selection in case of highdimensional data. Feature selection intends to identify an optimal feature subset to solve the given learning problem. In an optimal feature subset, only relevant features are selected as ‘‘members’’ and features that haveredundancy are considered as ‘‘non-members’’. This concept of ‘‘membership’’ and ‘‘non-membership’’ of a feature to an optimal feature subset has been represented by a strong intuitionistic fuzzy graph. The algorithm proposed in this work at first maps the feature set of the data as the vertex set of a strong intuitionistic fuzzy graph. Then the association between features represented as an edge-set is decided by the degree of hesitation between the features. Based on the feature association, the Strong Intuitionistic Fuzzy Feature Association Map (SIFFAM) is developed for the datasets. Then a sub-graph of SIFFAM is derived to identify features with maximal non-redundancy and relevance. Finally, the SIFFAM based feature selection algorithm is applied on very high dimensional datasets having features of the order of thousand. Empirically, the proposed approach SIFFAM based feature selection algorithm is found to be competitive with several benchmark feature selection algorithms in the context of high-dimensional data
Volume 46 All articles Published: 26 February 2021 Article ID 0045
Single Linkage algorithm is a hierarchical clustering method which is most unsuitable for large dataset because of its high convergence time. The paper proposes an efficient accelerated technique for the algorithm for clustering univariate data with a merging threshold. It is a two-stage algorithm with the first one as an incremental pre-clustering step that uses the farthest neighbour principle to partially cluster the database by scanning it only once. The algorithm uses the Segment Addition Postulate as a major tool for accelerating thepre-clustering stage. The incremental approach makes it suitable for partial clustering of streaming data while collecting it. The Second stage merges these pre-clusters to produce the final set of Single Linkage clusters bycomparing the biggest and the smallest data of each pre-cluster and thereby converging faster in comparison to those methods where all the members of the clusters are used for a clustering action. The algorithm is also suitable for fast-changing dynamic databases as it can cluster a newly added data without using all the data of the database. Experiments are conducted with various datasets and the result confirms that the proposed algorithm outperforms its well-known variants