EEMCS

Home > Publications
Home University of Twente
Education
Research
Prospective Students
Jobs
Publications
Intranet (internal)
 
 Nederlands
 Contact
 Sitemap
 Search
 Organisation

EEMCS EPrints Service


16970 Worst-case and smoothed analysis of $k$-means clustering with Bregman divergences
Home Policy Brochure Browse Search User Area Contact Help

Manthey, B. and Röglin, H. (2009) Worst-case and smoothed analysis of $k$-means clustering with Bregman divergences. In: Proceedings of the 20th International Symposium on Algorithms and Computation, ISAAC 2009, 16-18 Dec 2009, Honolulu, Hawaii, USA. pp. 1024-1033. Lecture Notes in Computer Science 5878. Springer. ISBN 978-3-642-10630-9

Full text available as:

PDF
- Univ. of Twente only - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
329 Kb

Official URL: http://dx.doi.org/10.1007/978-3-642-10631-6_103

Exported to Metis

Abstract

The $k$-means algorithm is the method of choice for clustering large-scale data sets and it performs exceedingly well in practice. Most of the theoretical work is restricted to the case that squared Euclidean distances are used as similarity measure. In many applications, however, data is to be clustered with respect to other measures like, e.g., relative entropy, which is commonly used to cluster web pages. In this paper, we analyze the running-time of the $k$-means method for Bregman divergences, a very general class of similarity measures including squared Euclidean distances and relative entropy. We show that the exponential lower bound known for the Euclidean case carries over to almost every Bregman divergence. To narrow the gap between theory and practice, we also study $k$-means in the semi-random input model of smoothed analysis. For the case that $n$ data points in $\mathbb{R}^d$ are perturbed by noise with standard deviation $\sigma,$ we show that for almost arbitrary Bregman divergences the expected running-time is bounded by poly($n^{\sqrt k}, 1/\sigma$) and $k^{kd}$ poly($n, 1/\sigma$).

Item Type:Conference or Workshop Paper (Full Paper, Talk)
Research Group:EWI-DMMP: Discrete Mathematics and Mathematical Programming
Research Program:CTIT-IE&ICT: Industrial Engineering and ICT
Uncontrolled Keywords:Machine learning, Smoothed analysis, $k$-Means, Clustering, Bregman divergence, Kullback-Leibler divergence, Relative entropy, Generalized I-Divergence, Mahalanobis Distance, Itakura-Saito Divergence
ID Code:16970
Status:Published
Deposited On:08 December 2009
Refereed:Yes
International:Yes
More Information:statisticsmetis

Export this item as:

To request a copy of the PDF please email us request copy

To correct this item please ask your editor

Repository Staff Only: edit this item