INDIGO Home University of Illinois at Urbana-Champaign logo uic building uic pavilion uic student center

Regularized k-means clustering of high-dimensional data and its asymptotic consistency

Show simple item record

Bookmark or cite this item:

Files in this item

File Description Format
PDF euclid.ejs.1328280901.pdf (305KB) Main Article PDF
Title: Regularized k-means clustering of high-dimensional data and its asymptotic consistency
Author(s): Sun, Wei; Wang, Junhui; Fang, Yixin
Subject(s): K-means diverging dimension lasso selection consistency
Abstract: K-means clustering is a widely used tool for cluster analysis due to its conceptual simplicity and computational efficiency. However, its performance can be distorted when clustering high-dimensional data where the number of variables becomes relatively large and many of them may contain no information about the clustering structure. This article proposes a high-dimensional cluster analysis method via regularized k-means clus- tering, which can simultaneously cluster similar observations and eliminate redundant variables. The key idea is to formulate the k-means clustering in a form of regularization, with an adaptive group lasso penalty term on cluster centers. In order to optimally balance the trade-off between the clustering model fitting and sparsity, a selection criterion based on clustering stabil- ity is developed. The asymptotic estimation and selection consistency of the regularized k-means clustering with diverging dimension is established. The effectiveness of the regularized k-means clustering is also demonstrated through a variety of numerical experiments as well as applications to two gene microarray examples. The regularized clustering framework can also be extended to the general model-based clustering.
Issue Date: 2012
Publisher: Institute of Mathematical Statistics
Citation Info: Sun, W. Wang, J. H. Fang, Y. X. (2012). "Regularized k-means clustering of high-dimensional data and its asymptotic consistency." Electronic Journal of Statistics 6: 148-167. DOI: 10.1214/12-EJS668
Type: Article
Description: This is a copy of an article published in the Electronic Journal of Statistics © 2012 Institute of Mathematical Statistics at DOI: 10.1214/12-EJS668.
ISSN: 1935-7524
Date Available in INDIGO: 2013-11-14

This item appears in the following Collection(s)

Show simple item record


Country Code Views
United States of America 363
China 258
Russian Federation 28
Ukraine 20
United Kingdom 10


My Account


Access Key