Graduation Year

2003

Document Type

Thesis

Degree

M.S.

Degree Granting Department

Physics

Major Professor

Ph.D, David A. Rabson

Keywords

probability, single peak constrained least squares, bump-hunting

Abstract

One often wishes to understand the probability distribution of stochastic data from experiment or computer simulations. However, where no model is given, practitioners must resort to parametric or non-parametric methods in order to gain information about the underlying distribution. Others have used initially a nonparametric estimator in order to understand the underlying shape of a set of data, and then later returned with a parametric method to locate the peaks. However they are interested in estimating spectra, which may have multiple peaks, where in this work we are interested in approximating the peak position of a single-peak probability distribution. One method of analyzing a distribution of data is by fitting a curve to, or smoothing them. Polynomial regression and least-squares fit are examples of smoothing methods. Initial understanding of the underlying distribution can be obscured depending on the degree of smoothing.

Problems such as under and oversmoothing must be addressed in order to determine the shape of the underlying distribution.Furthermore, smoothing of skewed data can give a biased estimation of the peak position. We propose two new approaches for statistical mode estimation based on the assumption that the underlying distribution has only one peak. The first method imposes the global constraint of unimodality locally, by requiring negative curvature over some domain. The second method performs a search that assumes a position of the distribution's peak and requires positive slope to the left, and negative slope to the right.

Each approach entails a constrained least-squares fit to the raw cumulative probability distribution.We compare the relative efficiencies [12] of finding the peak location of these two estimators for artificially generated data from known families of distributions Weibull, beta, and gamma. Within each family a parameter controls the skewness or kurtosis, quantifying the shapes of the distributions for comparison. We also compare our methods with other estimators such as the kernel-density estimator, adaptive histogram, and polynomial regression. By comparing the effectiveness of the estimators, we can determine which estimator best locates the peak position. We find that our estimators do not perform better than other known estimators. We also find that our estimators are biased.

Overall, an adaptation of kernel estimation proved to be the most efficient.The results for the work done in this thesis will be submitted, in a different form, for publication by D.A. Rabson and J.K. Looper.

Share

COinS