Graduation Year


Document Type




Degree Granting Department

Computer Science

Major Professor

Goldgof, Dmitry

Co-Major Professor

Kallergi, Maria


neural network, filtering, segmentation, detection, shape analysis, feature selection, receiver operating characteristics (roc)


Breast cancer is the second leading cause of cancer deaths among women in the United States and microcalcifications clusters are one of the most important indicators of breast disease. Computer methodologies help in the detection and differentiation between benign and malignant lesions and have the potential to improve radiologists' performance and breast cancer diagnosis significantly. A Computer-Aided Diagnosis (CAD-Dx) algorithm has been previously developed to assist radiologists in the diagnosis of mammographic clusters of calcifications with the modules: (a) detection of all calcification-like areas, (b) false-positive reduction and segmentation of the detected calcifications, (c) selection of morphological and distributional features and (d) classification of the clusters. Classification was based on an artificial neural network (ANN) with 14 input features and assigned a likelihood of malignancy to each cluster.

The purpose of this work was threefold: (a) optimize the existing algorithm and test on a large database, (b) rank classification features and select the best feature set, and (c) determine the impact of single and two-view feature estimation on classification and feature ranking. Classification performance was evaluated with the NevProp4 artificial neural network trained with the leave-one-out resampling technique. Sequential forward selection was used for feature selection and ranking. Mammograms from 136 patients, containing single or two views of a breast with calcification cluster were digitized at 60 microns and 16 bits per pixel. 260 regions of interest (ROI's) centered on calcification cluster were defined to build the single-view dataset. 100 of the 136 patients had a two-view mammogram which yielded 202 ROI's that formed the two-view dataset. Classification and feature selection were evaluated with both these datasets.

To decide on the optimal features for two-view feature estimation several combinations of CC and MLO view features were attempted. On the single-view dataset the classifier achieved an AZ =0.8891 with 88% sensitivity and 77% specificity at an operating point of 0.4; 12 features were selected as the most important. With the two-view dataset, the classifier achieved a higher performance with an AZ =0.9580 and sensitivity and specificity of 98% and 80% respectively at an operating point of 0.4; 10 features were selected as the most important.