Graduation Year

2014

Document Type

Dissertation

Degree

Ph.D.

Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Industrial and Management Systems Engineering

Major Professor

Bo Zeng, Ph.D.

Co-Major Professor

Xiaoning Qian, Ph.D.

Committee Member

Jose Zayas-Castro, Ph.D.

Committee Member

Tapas Das, Ph.D.

Committee Member

Kendra Vehik, M.P.H.,Ph.D.

Keywords

Biomarker Identication, Column Generation, Combinatorial Optimization, Integer Programming, Maximum Clique Problem

Abstract

We introduce and study a novel graph optimization problem to search for multiple cliques with the maximum overall weight, to which we denote as the Maximum Weighted Multiple Clique Problem (MWMCP). This problem arises in research involving network-based data mining, specifically, in bioinformatics where complex diseases, such as various types of cancer and diabetes, are conjectured to be triggered and influenced by a combination of genetic and environmental factors. To integrate potential effects from interplays among underlying candidate factors, we propose a new network-based framework to identify effective biomarkers by searching for "groups" of synergistic risk factors with high predictive power to disease outcome. An interaction network is constructed with vertex weight representing individual predictive power of candidate factors and edge weight representing pairwise synergistic interaction among factors. This network-based biomarker identification problem is then formulated as a MWMCP. To achieve near optimal solutions for large-scale networks, an analytical algorithm based on column generation method as well as a fast greedy heuristic have been derived. Also, to obtain its exact solutions, an advanced branch-price-and-cut algorithm is designed and solved after studying the properties of the problem. Our algorithms for MWMCP have been implemented and tested on random graphs and promising results have been obtained. They also are used to analyze two biomedical datasets: a Type 1 Diabetes (T1D) dataset from the Diabetes Prevention Trial-Type 1 (DPT-1) Study, and a breast cancer genomics dataset for metastasis prognosis. The results demonstrate that our network-based methods can identify important biomarkers with better prediction accuracy compared to the conventional feature selection that only considers individual effects.

Share

COinS