Document Type




Degree Granting Department

Mathematics and Statistics

Major Professor

Chris P. Tsokos


Decision Tree, Drug Efficiency, Random Forest, Survival Analysis, Variable Rank


The aim of the present study is to develop a statistical algorithm and model associ- ated with breast and lung cancer patients. In this study, we developed several statistical softwares, R packages, and models using our new statistical approach.

In the present study, we used the five parameters logistic model for determining the optimal doses of a pharmaceutical drugs, including dynamic initial points, an automatic process for outlier detection and an algorithm that develops a graphic user interface(GUI) program. The developed statistical procedure assists medical scientists by reducing their time in determining the optimal dose of new drugs, and can also easily identify which drugs need more experimentation.

Secondly, in the present study, we developed a new classification method that is very useful in the health sciences. We used a new decision tree algorithm and a random forest method to rank our variables and to build a final decision tree model. The decision tree can identify and communicate complex data systems to scientists with minimal knowledge in statistics.

Thirdly, we developed statistical packages using the Johnson SB probability distribu- tion which is important in parametrically studying a variety of health, environmental, and engineering problems. Scientists are experiencing difficulties in obtaining estimates for the four parameters of the subject probability distribution. The developed algorithm com- bines several statistical procedures, such as, the Newtwon Raphson, the Bisection, the Least Square Estimation, and the regression method to develop our R package. This R package has functions that generate random numbers, calculate probabilities, inverse probabilities, and estimate the four parameters of the SB Johnson probability distribution. Researchers can use the developed R package to build their own statistical models or perform desirable statistical simulations.

The final aspect of the study involves building a statistical model for lung cancer sur- vival time. In developing the subject statistical model, we have taken into consideration the number of cigarettes the patient smoked per day, duration of smoking, and the age at diagnosis of lung cancer. The response variables the survival time. The significant factors include interaction. the probability density function of the survival times has been obtained and the survival function is determined. The analysis is have on your groups the involve gender and with factors. A companies with the ordinary survival function is given.