Compressed K - Means for large-scale clustering

Publication Type:
Conference Proceeding
Citation:
31st AAAI Conference on Artificial Intelligence, AAAI 2017, 2017, pp. 2527 - 2533
Issue Date:
2017-01-01
Filename Description Size
AAAI17.pdfPublished version1.8 MB
Adobe PDF
Full metadata record
Copyright © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Large-scale clustering has been widely used in many applications, and has received much attention. Most existing clustering methods suffer from both expensive computation and memory costs when applied to large-scale datasets. In this paper, we propose a novel clustering method, dubbed compressed k-means (CKM), for fast large-scale clustering. Specifically, high-dimensional data are compressed into short binary codes, which are well suited for fast clustering. CKM enjoys two key benefits: 1) storage can be significantly reduced by representing data points as binary codes; 2) distance computation is very efficient using Hamming metric between binary codes. We propose to jointly learn binary codes and clusters within one framework. Extensive experimental results on four large-scale datasets, including two million-scale datasets demonstrate that CKM outperforms the state-of-theart large-scale clustering methods in terms of both computation and memory cost, while achieving comparable clustering accuracy.
Please use this identifier to cite or link to this item: