File | Description | Format |
---|---|---|
![]() |
Main Article |
Title: | Regularized k-means clustering of high-dimensional data and its asymptotic consistency |
Author(s): | Sun, Wei; Wang, Junhui; Fang, Yixin |
Subject(s): |
K-means
diverging dimension lasso selection consistency |
Abstract: | K-means clustering is a widely used tool for cluster analysis due to its conceptual simplicity and computational efficiency. However, its performance can be distorted when clustering high-dimensional data where the number of variables becomes relatively large and many of them may contain no information about the clustering structure. This article proposes a high-dimensional cluster analysis method via regularized k-means clus- tering, which can simultaneously cluster similar observations and eliminate redundant variables. The key idea is to formulate the k-means clustering in a form of regularization, with an adaptive group lasso penalty term on cluster centers. In order to optimally balance the trade-off between the clustering model fitting and sparsity, a selection criterion based on clustering stabil- ity is developed. The asymptotic estimation and selection consistency of the regularized k-means clustering with diverging dimension is established. The effectiveness of the regularized k-means clustering is also demonstrated through a variety of numerical experiments as well as applications to two gene microarray examples. The regularized clustering framework can also be extended to the general model-based clustering. |
Issue Date: | 2012 |
Publisher: | Institute of Mathematical Statistics |
Citation Info: | Sun, W. Wang, J. H. Fang, Y. X. (2012). "Regularized k-means clustering of high-dimensional data and its asymptotic consistency." Electronic Journal of Statistics 6: 148-167. DOI: 10.1214/12-EJS668 |
Type: | Article |
Description: | This is a copy of an article published in the Electronic Journal of Statistics © 2012 Institute of Mathematical Statistics at DOI: 10.1214/12-EJS668. |
URI: | http://hdl.handle.net/10027/10481 |
ISSN: | 1935-7524 |
Date Available in INDIGO: | 2013-11-14 |
Country Code | Views |
United States of America | 362 |
China | 258 |
Russian Federation | 28 |
Ukraine | 19 |
United Kingdom | 10 |