欢迎来到麦多课文档分享! | 帮助中心 海量文档,免费浏览,给你所需,享你所想!
麦多课文档分享
全部分类
  • 标准规范>
  • 教学课件>
  • 考试资料>
  • 办公文档>
  • 学术论文>
  • 行业资料>
  • 易语言源码>
  • ImageVerifierCode 换一换
    首页 麦多课文档分享 > 资源分类 > PPT文档下载
    分享到微信 分享到微博 分享到QQ空间

    The Effectiveness of Lloyd-type Methods for the k-means .ppt

    • 资源ID:373252       资源大小:153.50KB        全文页数:20页
    • 资源格式: PPT        下载积分:2000积分
    快捷下载 游客一键下载
    账号登录下载
    微信登录下载
    二维码
    微信扫一扫登录
    下载资源需要2000积分(如需开发票,请勿充值!)
    邮箱/手机:
    温馨提示:
    如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
    如需开发票,请勿充值!如填写123,账号就是123,密码也是123。
    支付方式: 支付宝扫码支付    微信扫码支付   
    验证码:   换一换

    加入VIP,交流精品资源
     
    账号:
    密码:
    验证码:   换一换
      忘记密码?
        
    友情提示
    2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
    3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
    4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。
    5、试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。

    The Effectiveness of Lloyd-type Methods for the k-means .ppt

    1、The Effectiveness of Lloyd-type Methods for the k-means Problem,Chaitanya Swamy University of Waterloo Joint work with Rafi Ostrovsky, Yuval Rabani, Leonard SchulmanUCLA Technion Caltech,The k-means Problem,Given: n points in d-dimensional spacepartition X into k clusters X1, Xk assign each point in

    2、 Xi to a common center ciRd Goal: Minimize i xXi d(x,ci)2,d: L2 distance,X Rd : point set with |X| = n,k-means (contd.),Given the cis, best clustering is to assign each point to nearest center: Xi = xX: ci is ctr. nearest to x Given the Xis, best choice of centers is to setci = center of mass of Xi

    3、= ctr(Xi) = xXi x / |Xi| Optimal solution satisfies both properties Problem is NP-hard even for k=2 (n, d not fixed),Related Work,k-means problem dates back to Steinhaus (1956). a) Approximation algorithms algorithms with provable guarantees,PTASs with varying runtime dependence on n, d, k: poly/lin

    4、ear in n, could be exponential in d and/or k Matousek (poly(n), exp(d,k) Kumar, Sabharwal guarantees also translate to k-means Charikar, Guha, Tardos & Shmoys Arya et al. + Kanungo et al.: (9+e)-approximation,b) Heuristics: Lloyds method invented in 1957 and remains an extremely popular heuristic ev

    5、en today,1) Start with k initial / “seed” centers c1, ck. 2) Iterate the following Lloyd step Assign each point to nearest center ci to obtain clustering X1, Xk. Update ci ctr(Xi) = xXi x/|Xi| .,1) Start with k initial / “seed” centers c1, ck. 2) Iterate the following Lloyd step Assign each point to

    6、 nearest center ci to obtain clustering X1, Xk. Update ci ctr(Xi) = xXi x/|Xi| .,b) Heuristics: Lloyds method invented in 1957 and remains an extremely popular heuristic even today,1) Start with k initial / “seed” centers c1, ck. 2) Iterate the following Lloyd step Assign each point to nearest cente

    7、r ci to obtain clustering X1, Xk. Update ci ctr(Xi) = xXi x/|Xi| .,b) Heuristics: Lloyds method invented in 1957 and remains an extremely popular heuristic even today,Some bounds on number of iterations of Lloyd-type methods: Inaba-Katoh-Imai; Har-Peled-Sadri; Arthur-Vassilvitskii (06) Performance v

    8、ery sensitive to choice of seed centers; lot of literature on finding “good” seeding methods for LloydBut, almost no analysis that proves performance guarantees about quality of final solution for arbitrary k and dimension,Lloyds method: Whats known?,Our Goal: to analyze Lloyd and try to prove rigor

    9、ous performance guarantees for Lloyd-type methods,Our Results,Introduce a clusterability or separation condition. Give a novel, efficient sampling process for seeding Lloyds method with initial centers. Show that if data satisfies our clusterabililty condition: seeding + 1 Lloyd step yields a consta

    10、nt-approximation in time linear in n and d, poly(k): is potentially faster than Lloyd variants which require multiple reseedings seeding + KSS04-sampling gives a PTAS: algorithm is faster and simpler than PTAS in KSS04.,Main Theorem: If data has a “meaningful k-clustering”, then there is a simple, e

    11、fficient seeding method s.t. Lloyd-type methods return a near-optimal solution.,“Meaningful k-Clustering”,Settings where one would NOT consider data to possess a meaningful k-clustering:,1) If near-optimum cost can be achieved by two very distinct k-partitions of data, then identity of an optimal k-

    12、partition carries little meaning provides ambiguous classification. 2) If cost of best k-clustering cost of best (k-1)-clustering, then a k-clustering yields only marginal benefit over the best (k-1)-clustering should use smaller value of k here.,Example: k=3,We formalize 2). Let Dk2(X) = cost of be

    13、st k-clustering of X. X is e-seperated for k-means iff Dk2(X) / Dk-12(X) e2 .,Simple condition. Drop in k-clustering cost is already used by practitioners to choose the right k Can show that (roughly),X is e-separated for k-means,two low-cost k-clusterings disagree on only a small fraction of data,r

    14、*2,r*1,The 2-means problem (k=2),X*1, X*2 : optimal clusters c*i = ctr(X*i), D* = d(c*1,c*2) ni = |X*i|, (r*i)2 = xX*i d(x, c*i)2 / ni = D12(X*i) / ni = avg. squared distance in cluster X*i,Lemma: For i=1, 2, (r*i)2 e2/(1-e2).D*2 . Proof: D22(X) / e2 D12(X) = D22(X) + (n1n2 / n).D*2 .,X is e-separat

    15、ed for 2-means.,c*1,c*2,D*,X*1,X*2,The 2-means algorithm,Assume: X is e-separated for 2-means,1) Sampling-based seeding procedure: Pick two seed centers c1, c2 by randomly picking the pair x, yX with probability d(x,y)2. 2) Lloyd step or simpler “ball k-means step”: For each ci, let Bi = xX: d(x,ci)

    16、 d(c1,c2)/3. Update ci ctr(Bi); return these as final centers.,Sampling can be implemented in O(nd) time, so entire algorithm runs in O(nd) time.,c*1,c*2,D*,X*1,X*2,c1,2-means: Analysis,c*1,c*2,D*,X*1,X*2,core(X*1),core(X*2),c2,Let core(X*i) = xX*i : d(x,c)2 (r*i)2 / r, where r= q(e2) 1. Seeding lem

    17、ma: With prob. 1O(r), c1,c2 lie in cores of X*1, X*2. Proof: |core(X*i)| (1-r)ni for i=1,2. Let A = xcore(X*1), ycore(X*2) d(x,y)2 (1-r)2n1n2D*2.B = x,yX d(x,y)2 = n.D12(X) n1n2D*2. Probability = A / B (1-r)2 = 1 O(r).,2-means analysis (contd.),c1,c*1,c*2,D*,X*1,X*2,core(X*1),core(X*2),c2,Recall tha

    18、t Bi = xX: d(x,ci) d(c1,c2)/3 Ball-k-means lemma: For i=1,2, core(X*i) Bi X*i . Therefore d(ctr(Bi), c*i)2 r(r*i)2 /(1r) . Intuitively, since Bi X*i and Bi contains almost all of the mass of X*i, ctr(Bi) must be close to ctr(X*i) = c*i.,B1,B2,2-means analysis (contd.),Theorem: With probability 1O(r)

    19、, cost of final clustering is at most D22(X)/(1r), get a (1/(1r)-approximation algorithm. Since r = O(e2), we have approximation ratio 1 as e 0.probability of success 1 as e 0.,Arbitrary k,Algorithm and analysis follow the same outline as in 2-means. If X is e-separated for k-means, can again show t

    20、hat all clusters are well separated, that is, cluster radius inter-cluster distance, r*i = O(e).d(c*i, c*j) “i,j 1) Seeding stage: we choose k initial centers and ensure that they lie in the “cores” of the k optimal clusters. exploits the fact that clusters are well separated after seeding stage, ea

    21、ch optimal center has a distinct seed center very “near” it 2) Now, can run either a Lloyd step or a ball-k-means step. Theorem: If X is e-separated for k-means, then one can obtain an a(e)-approximation algorithm where a(e) 1 as e 0.,Schematic of entire algorithm,Simple sampling: Pick k centers as

    22、follows. first pick 2 centers c1, c2 as in 2-means to pick center ci+1, pick xX with probability minji d(x,cj)2,Simple sampling: success probability = exp(-k),Oversampling + deletion: sample O(k) centers, then greedily delete till k remain O(1) success probability, O(nkd+k3d),Greedy deletion: O(n3d)

    23、,Greedy deletion: Start with n centers and keep deleting the center that causes least cost increase till k centers remain,k well-placed seeds,Ball k-means or Lloyd step: gives O(1)-approx.,KSS04-sampling: gives PTAS,Open Questions,Deeper analysis of Lloyd: are there weaker conditions under which one can prove performance guarantes for Lloyd-type methods? PTAS for k-means with polytime dependence on n, k and d? Is it APX hard in geometric setting? PTAS for k-means under our separation condition? Other applications of separation condition?,Thank You.,


    注意事项

    本文(The Effectiveness of Lloyd-type Methods for the k-means .ppt)为本站会员(proposalcash356)主动上传,麦多课文档分享仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文档分享(点击联系客服),我们立即给予删除!




    关于我们 - 网站声明 - 网站地图 - 资源地图 - 友情链接 - 网站客服 - 联系我们

    copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
    备案/许可证编号:苏ICP备17064731号-1 

    收起
    展开