Big data, conical behavior, high dimension low sample size, PCA
Digital Object Identifier (DOI)
The aim of this paper is to establish several deep theoretical properties of principal component analysis for multiple-component spike covariance models. Our new results reveal an asymptotic conical structure in critical sample eigendirections under the spike models with distinguishable (or indistinguishable) eigenvalues, when the sample size and/or the number of variables (or dimension) tend to infinity. The consistency of the sample eigenvectors relative to their population counterparts is determined by the ratio between the dimension and the product of the sample size with the spike size. When this ratio converges to a nonzero constant, the sample eigenvector converges to a cone, with a certain angle to its corresponding population eigenvector. In the High Dimension, Low Sample Size case, the angle between the sample eigenvector and its population counterpart converges to a limiting distribution. Several generalizations of the multi-spike covariance models are also explored, and additional theoretical results are presented.
Was this content written or created while at USF?
Citation / Publisher Attribution
Statistica Sinica, v. 26, issue 4, p. 1747-1770
Scholar Commons Citation
Shen, Dan; Shen, Haipeng; Zhu, Hongtu; and Marron, J S, "The Statistics and Mathematics of High Dimension Low Sample Size Asymptotics" (2016). Mathematics and Statistics Faculty Publications. 5.