The Effectiveness of Lloyd-type Methods for the k-means .ppt
《The Effectiveness of Lloyd-type Methods for the k-means .ppt》由会员分享,可在线阅读,更多相关《The Effectiveness of Lloyd-type Methods for the k-means .ppt(20页珍藏版)》请在麦多课文档分享上搜索。
1、The Effectiveness of Lloyd-type Methods for the k-means Problem,Chaitanya Swamy University of Waterloo Joint work with Rafi Ostrovsky, Yuval Rabani, Leonard SchulmanUCLA Technion Caltech,The k-means Problem,Given: n points in d-dimensional spacepartition X into k clusters X1, Xk assign each point in
2、 Xi to a common center ciRd Goal: Minimize i xXi d(x,ci)2,d: L2 distance,X Rd : point set with |X| = n,k-means (contd.),Given the cis, best clustering is to assign each point to nearest center: Xi = xX: ci is ctr. nearest to x Given the Xis, best choice of centers is to setci = center of mass of Xi
3、= ctr(Xi) = xXi x / |Xi| Optimal solution satisfies both properties Problem is NP-hard even for k=2 (n, d not fixed),Related Work,k-means problem dates back to Steinhaus (1956). a) Approximation algorithms algorithms with provable guarantees,PTASs with varying runtime dependence on n, d, k: poly/lin
4、ear in n, could be exponential in d and/or k Matousek (poly(n), exp(d,k) Kumar, Sabharwal guarantees also translate to k-means Charikar, Guha, Tardos & Shmoys Arya et al. + Kanungo et al.: (9+e)-approximation,b) Heuristics: Lloyds method invented in 1957 and remains an extremely popular heuristic ev
5、en today,1) Start with k initial / “seed” centers c1, ck. 2) Iterate the following Lloyd step Assign each point to nearest center ci to obtain clustering X1, Xk. Update ci ctr(Xi) = xXi x/|Xi| .,1) Start with k initial / “seed” centers c1, ck. 2) Iterate the following Lloyd step Assign each point to
6、 nearest center ci to obtain clustering X1, Xk. Update ci ctr(Xi) = xXi x/|Xi| .,b) Heuristics: Lloyds method invented in 1957 and remains an extremely popular heuristic even today,1) Start with k initial / “seed” centers c1, ck. 2) Iterate the following Lloyd step Assign each point to nearest cente
7、r ci to obtain clustering X1, Xk. Update ci ctr(Xi) = xXi x/|Xi| .,b) Heuristics: Lloyds method invented in 1957 and remains an extremely popular heuristic even today,Some bounds on number of iterations of Lloyd-type methods: Inaba-Katoh-Imai; Har-Peled-Sadri; Arthur-Vassilvitskii (06) Performance v
8、ery sensitive to choice of seed centers; lot of literature on finding “good” seeding methods for LloydBut, almost no analysis that proves performance guarantees about quality of final solution for arbitrary k and dimension,Lloyds method: Whats known?,Our Goal: to analyze Lloyd and try to prove rigor
9、ous performance guarantees for Lloyd-type methods,Our Results,Introduce a clusterability or separation condition. Give a novel, efficient sampling process for seeding Lloyds method with initial centers. Show that if data satisfies our clusterabililty condition: seeding + 1 Lloyd step yields a consta
10、nt-approximation in time linear in n and d, poly(k): is potentially faster than Lloyd variants which require multiple reseedings seeding + KSS04-sampling gives a PTAS: algorithm is faster and simpler than PTAS in KSS04.,Main Theorem: If data has a “meaningful k-clustering”, then there is a simple, e
11、fficient seeding method s.t. Lloyd-type methods return a near-optimal solution.,“Meaningful k-Clustering”,Settings where one would NOT consider data to possess a meaningful k-clustering:,1) If near-optimum cost can be achieved by two very distinct k-partitions of data, then identity of an optimal k-
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- THEEFFECTIVENESSOFLLOYDTYPEMETHODSFORTHEKMEANSPPT

链接地址:http://www.mydoc123.com/p-373252.html