Graduation Year

2016

Document Type

Dissertation

Degree

Ph.D.

Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Mathematics and Statistics

Major Professor

Lesław A. Skrzypek, Ph.D.

Committee Member

Gangaram S. Ladde, Ph.D.

Committee Member

Yuncheng You, Ph.D.

Committee Member

Marcus McWaters, Ph.D.

Keywords

Statistical Modeling, Survival Analysis, Parametric Analysis, Probability Distribution, Decision Trees, Artificial Neural Networks, Classification.

Abstract

Survival analysis today is widely implemented in the fields of medical and biological sciences, social sciences, econometrics, and engineering. The basic principle behind the survival analysis implies to a statistical approach designed to take into account the amount of time utilized for a study period, or the study of time between entry into observation and a subsequent event. The event of interest pertains to death and the analysis consists of following the subject until death. Events or outcomes are defined by a transition from one discrete state to another at an instantaneous moment in time. In the recent years, research in the area of survival analysis has increased greatly because of its large usage in areas related to bio sciences and the pharmaceutical studies. After identifying the probability density function that best characterizes the tumors and survival times of breast cancer women, one purpose of this research is to compare the efficiency between competing estimators of the survival function. Our study includes evaluation of parametric, semi-parametric and nonparametric analysis of probability survival models.

Artificial Neural Networks (ANNs), recently applied to a number of clinical, business, forecasting, time series prediction, and other applications, are computational systems consisting of artificial neurons called nodes arranged in different layers with interconnecting links. The main interest in neural networks comes from their ability to approximate complex nonlinear functions. Among the available wide range of neural networks, most research is concentrated around feed forward neural networks called Multi-layer perceptrons (MLPs). One of the important components of an artificial neural network (ANN) is the activation function. This work discusses properties of activation functions in multilayer neural networks applied to breast cancer stage classification. There are a number of common activation functions in use with ANNs. The main objective in this work is to compare and analyze the performance of MLPs which has back-propagation algorithm using various activation functions for the neurons of hidden and output layers to evaluate their performance on the stage classification of breast cancer data.

Survival analysis can be considered a classification problem in which the application of machine-learning methods is appropriate. By establishing meaningful intervals of time according to a particular situation, survival analysis can easily be seen as a classification problem. Survival analysis methods deals with waiting time, i.e. time till occurrence of an event. Commonly used method to classify this sort of data is logistic regression. Sometimes, the underlying assumptions of the model are not true. In model building, choosing an appropriate model depends on complexity and the characteristics of the data that affect the appropriateness of the model. Two such strategies, which are used nowadays frequently, are artificial neural network (ANN) and decision trees (DT), which needs a minimal assumption. DT and ANNs are widely used methodological tools based on nonlinear models. They provide a better prediction and classification results than the traditional methodologies such as logistic regression. This study aimed to compare predictions of the ANN, DT and logistic models by breast cancer survival. In this work our goal is to design models using both artificial neural networks and logistic regression that can precisely predict the output (survival) of breast cancer patients. Finally we compare the performances of these models using receiver operating characteristic (ROC) analysis.

Share

COinS