Graduation Year

2009

Document Type

Dissertation

Degree

Ph.D.

Degree Granting Department

Computer Science and Engineering

Major Professor

Lawrence O. Hall, Ph.D.

Co-Major Professor

Dimitry B. Goldgof, Ph.D.

Keywords

Partitioning, Fuzzy C means, Cluster validity, Ant colony, Ensemble

Abstract

Clustering is actively used in several research fields, such as pattern recognition, machine learning and data mining. This dissertation focuses on clustering algorithms in the data mining area. Clustering algorithms can be applied to solve the unsupervised learning problem, which deals with finding clusters in unlabeled data. Most clustering algorithms require the number of cluster centers be known in advance. However, this is often not suitable for real world applications, since we do not know this information in most cases. Another question becomes, once clusters are found by the algorithms, do we believe the clusters are exactly the right ones or do there exist better ones? In this dissertation, we present two new Swarm Intelligence based approaches for data clustering to solve the above issues.

Swarm based approaches to clustering have been shown to be able to skip local extrema by doing a form of global search, our two newly proposed ant clustering algorithms take advantage of this. The first algorithm is a kernel-based fuzzy ant clustering algorithm using the Xie-Beni partition validity metric, it is a two stage algorithm, in the first stage of the algorithm ants move the cluster centers in feature space, the cluster centers found by the ants are evaluated using a reformulated kernel-based Xie-Beni cluster validity metric. We found when provided with more clusters than exist in the data our new ant-based approach produces a partition with empty clusters and/or very lightly populated clusters. Then the second stage of this algorithm was applied to automatically detect the number of clusters for a data set by using threshold solutions.

The second ant clustering algorithm, using chemical recognition of nestmates is a combination of an ant based algorithm and a consensus clustering algorithm. It is a two-stage algorithm without initial knowledge of the number of clusters. The main contributions of this work are to use the ability of an ant based clustering algorithm to determine the number of cluster centers and refine the cluster centers, then apply a consensus clustering algorithm to get a better quality final solution. We also introduced an ensemble ant clustering algorithm which is able to find a consistent number of clusters with appropriate parameters. We proposed a modified online ant clustering algorithm to handle clustering large data sets. To our knowledge, we are the first to use consensus to combine multiple ant partitions to obtain robust clustering solutions.

Experiments were done with twelve data sets, some of which were benchmark data sets, two artificially generated data sets and two magnetic resonance image brain volumes. The results show how the ant clustering algorithms play an important role in finding the number of clusters and providing useful information for consensus clustering to locate the optimal clustering solutions. We conducted a wide range of comparative experiments that demonstrate the effectiveness of the new approaches.

Share

COinS