Articles written in Sadhana
Volume 45 All articles Published: March 2020 Article ID 0066 Original Article (Computer Sciences)
Single-Linkage algorithm is a distance-based Hierarchical clustering method that can find arbitrary shaped clusters but is most unsuitable for large datasets because of its high time complexity. The paper proposes an efficient accelerated technique for the algorithm with a merging threshold. It is a two-stage algorithm with the first one as an incremental pre-clustering step that uses the triangle inequality method to eliminate the unnecessary distance computations. The incremental approach makes it suitable for partial clustering of streaming dataalong with the collection. The second step using the property of the Single-Linkage algorithm itself takes a clustering decision without comparing all the patterns. This method shows how the neighbourhood between the input patterns can be used as a tool to accelerate the algorithm without hampering the cluster quality. Experiments are conducted with various standard and large real datasets and the result confirms its effectiveness for large datasets.
Volume 46 All articles Published: 26 February 2021 Article ID 0045
Single Linkage algorithm is a hierarchical clustering method which is most unsuitable for large dataset because of its high convergence time. The paper proposes an efficient accelerated technique for the algorithm for clustering univariate data with a merging threshold. It is a two-stage algorithm with the first one as an incremental pre-clustering step that uses the farthest neighbour principle to partially cluster the database by scanning it only once. The algorithm uses the Segment Addition Postulate as a major tool for accelerating thepre-clustering stage. The incremental approach makes it suitable for partial clustering of streaming data while collecting it. The Second stage merges these pre-clusters to produce the final set of Single Linkage clusters bycomparing the biggest and the smallest data of each pre-cluster and thereby converging faster in comparison to those methods where all the members of the clusters are used for a clustering action. The algorithm is also suitable for fast-changing dynamic databases as it can cluster a newly added data without using all the data of the database. Experiments are conducted with various datasets and the result confirms that the proposed algorithm outperforms its well-known variants