Graduation Year

2004

Document Type

Thesis

Degree

M.S.C.S.

Degree Granting Department

Computer Science

Major Professor

Lawrence O. Hall, Ph.D.

Committee Member

Dmitry B. Goldgof, Ph.D.

Committee Member

Sudeep Sarkar, Ph.D.

Keywords

Cluster Analysis, Swarm Intelligence, Ant Colony Optimization, Fuzzy C Means Algorithm, Hard C Means Algorithm

Abstract

We present two Swarm Intelligence based approaches for data clustering. The first algorithm, Fuzzy Ants, presented in this thesis clusters data without the initial knowledge of the number of clusters. It is a two stage algorithm. In the first stage the ants cluster data to initially create raw clusters which are refined using the Fuzzy C Means algorithm. Initially, the ants move the individual objects to form heaps. The centroids of these heaps are redefined by the Fuzzy C Means algorithm. In the second stage the objects obtained from the Fuzzy C Means algorithm are hardened according to the maximum membership criteria to form new heaps. These new heaps are then moved by the ants. The final clusters formed are refined by using the Fuzzy C Means algorithm. Results from experiments with 13 datasets show that the partitions produced are competitive with those from FCM. The second algorithm, Fuzzy ant clustering with centroids, is also a two stage algorithm, it requires an initial knowledge of the number of clusters in the data. In the first stage of the algorithm ants move the cluster centers in feature space. The cluster centers found by the ants are evaluated using a reformulated Fuzzy C Means criterion. In the second stage the best cluster centers found are used as the initial cluster centers for the Fuzzy C Means algorithm. Results on 18 datasets show that the partitions found by FCM using the ant initialization are better than those from randomly initialized FCM. Hard C Means was also used in the second stage and the partitions from the ant algorithm are better than from randomly initialized Hard C Means. The Fuzzy Ants algorithm is a novel method to find the number of clusters in the data and also provides good initializations for the FCM and HCM algorithms. We performed sensitivity analysis on the controlling parameters and found the Fuzzy Ants algorithm to be very sensitive to the Tcreateforheap parameter. The FCM and HCM algorithms, with random initializations can get stuck in a bad extrema, the Fuzzy ant clustering with centroids algorithm successfully avoids these bad extremas.

Share

COinS