INDIGO Home University of Illinois at Urbana-Champaign logo uic building uic pavilion uic student center

Regularized k-means clustering of high-dimensional data and its asymptotic consistency

Show full item record

Bookmark or cite this item: http://hdl.handle.net/10027/10481

Files in this item

File Description Format
PDF euclid.ejs.1328280901.pdf (305KB) Main Article PDF
Title: Regularized k-means clustering of high-dimensional data and its asymptotic consistency
Author(s): Sun, Wei; Wang, Junhui; Fang, Yixin
Subject(s): K-means diverging dimension lasso selection consistency
Abstract: K-means clustering is a widely used tool for cluster analysis due to its conceptual simplicity and computational efficiency. However, its performance can be distorted when clustering high-dimensional data where the number of variables becomes relatively large and many of them may contain no information about the clustering structure. This article proposes a high-dimensional cluster analysis method via regularized k-means clus- tering, which can simultaneously cluster similar observations and eliminate redundant variables. The key idea is to formulate the k-means clustering in a form of regularization, with an adaptive group lasso penalty term on cluster centers. In order to optimally balance the trade-off between the clustering model fitting and sparsity, a selection criterion based on clustering stabil- ity is developed. The asymptotic estimation and selection consistency of the regularized k-means clustering with diverging dimension is established. The effectiveness of the regularized k-means clustering is also demonstrated through a variety of numerical experiments as well as applications to two gene microarray examples. The regularized clustering framework can also be extended to the general model-based clustering.
Issue Date: 2012
Publisher: Institute of Mathematical Statistics
Citation Info: Sun, W. Wang, J. H. Fang, Y. X. (2012). "Regularized k-means clustering of high-dimensional data and its asymptotic consistency." Electronic Journal of Statistics 6: 148-167. DOI: 10.1214/12-EJS668
Type: Article
Description: This is a copy of an article published in the Electronic Journal of Statistics © 2012 Institute of Mathematical Statistics at DOI: 10.1214/12-EJS668.
URI: http://hdl.handle.net/10027/10481
ISSN: 1935-7524
Date Available in INDIGO: 2013-11-14
 

This item appears in the following Collection(s)

Show full item record

Statistics

Country Code Views
United States of America 346
China 258
Russian Federation 28
United Kingdom 10
India 6

Browse

My Account

Information

Access Key