Graduation Year


Document Type




Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Electrical Engineering

Major Professor

Ji-En Morris Chang, Ph.D.

Committee Member

Zhuo Lu, Ph.D.

Committee Member

Xinming Ou, Ph.D.

Committee Member

Kaiqi Xiong, Ph.D.

Committee Member

Yasin Yilmaz, Ph.D.


Data Mining, Deep Learning, Differential Privacy, Network Security, Social Network Analysis


Peer-to-peer (P2P) botnets have become one of the major threats in network security for serving as the infrastructure that responsible for various of cyber-crimes. Though a few existing work claimed to detect traditional botnets effectively, the problem of detecting P2P botnets involves more challenges. In this dissertation, we present two P2P botnet detection systems, PeerHunter and Enhanced PeerHunter. PeerHunter starts from a P2P hosts detection component. Then, it uses mutual contacts as the main feature to cluster bots into communities. Finally, it uses community behavior analysis to detect potential botnet communities and further identify bot candidates. Enhanced PeerHunter is an extension of PeerHunter, aiming to use network-flow level community behaviors to detect waiting stage P2P botnets, even in the scenario that P2P bots and legitimate P2P applications are running on the same set of hosts. Through extensive experiments with real and simulated network traces, both PeerHunter and Enhanced PeerHunter can achieve very high detection rate and low false positives.

The major component of our P2P botnet detection is a community detection algorithm. Community detection is of great importance for online social network analysis. The volume, variety and velocity of data generated by today's online social networks are advancing the way researchers analyze those networks. For instance, real-world networks, such as Facebook, LinkedIn and Twitter, are inherently growing rapidly and expanding aggressively over time. However, most of the studies so far have been focusing on detecting communities on the static networks. It is computationally expensive to directly employ a well-studied static algorithm repeatedly on the network snapshots of the dynamic networks. We propose DynaMo, a novel modularity-based dynamic community detection algorithm, aiming to detect communities of dynamic networks as effective as repeatedly applying static algorithms but in a more efficient way. In the experimental evaluation, a comprehensive comparison has been made among DynaMo, Louvain (static) and 5 other dynamic algorithms. Extensive experiments have been conducted on 6 real-world networks and 10,000 synthetic networks. Our results show that DynaMo outperforms all the other 5 dynamic algorithms in terms of the effectiveness, and is 2 to 5 times (by average) faster than Louvain algorithm.

In the big data era, many real-world applications, e.g., botnet detection, community detection, image recognition, require to collect a large amount of data from individuals, which involves more privacy concerns. The collected data could be repurposed in different ways, so it could be reused for entirely different purposes by different data users, which were not envisioned at the data collection stage by the data publisher but might jeopardize someone else's privacy. To provide strong privacy guarantees for the collected data and to give the data users greater flexibility in conducting the required data analysis, it is of great importance to enable privacy-enhancing technologies in such analysis. In this dissertation, we present several privacy-enhancing technologies for data mining and machine learning applications, utilizing the concept of dimensionality reduction and differential privacy, including (i) a privacy-preserving facial recognition approach utilizing dimensionality reduction techniques; (ii) a perturbation-based utility-aware privacy-preserving data releasing framework; and (iii) a locally differentially private distributed deep learning framework via knowledge distillation.