Association Rule Mining (II).ppt
《Association Rule Mining (II).ppt》由会员分享,可在线阅读,更多相关《Association Rule Mining (II).ppt(42页珍藏版)》请在麦多课文档分享上搜索。
1、Association Rule Mining (II),Instructor: Qiang Yang Thanks: J.Han and J. Pei,Frequent-pattern mining methods,2,Bottleneck of Frequent-pattern Mining,Multiple database scans are costly Mining long patterns needs many passes of scanning and generates lots of candidates To find frequent itemset i1i2i10
2、0 # of scans: 100 # of Candidates: (1001) + (1002) + + (110000) = 2100-1 = 1.27*1030 ! Bottleneck: candidate-generation-and-test Can we avoid candidate generation?,Frequent-pattern mining methods,3,FP-growth: Frequent-pattern Mining Without Candidate Generation,Heuristic: let P be a frequent itemset
3、, S be the set of transactions contain P, and x be an item. If x is a frequent item in S, x P must be a frequent itemset No candidate generation! A compact data structure, FP-tree, to store information for frequent pattern mining Recursive mining algorithm for mining complete set of frequent pattern
4、s,Frequent-pattern mining methods,4,Example,Min Support = 3,Frequent-pattern mining methods,5,Scan the database,List of frequent items, sorted: (item:support)The root of the tree is created and labeled with “” Scan the database Scanning the first transaction leads to the first branch of the tree: Or
5、der according to frequency,Frequent-pattern mining methods,6,Scanning TID=100,Transaction Database TID Items 100 f,a,c,d,g,i,m,p,f:1,c:1,a:1,m:1,p:1,Header TableNode Item count head f 1 c 1 a 1m 1p 1,root,Frequent-pattern mining methods,7,Scanning TID=200,Frequent Single Items: F1= TID=200 Possible
6、frequent items: Intersect with F1: f,c,a,b,m Along the first branch of , intersect: Generate two children , ,Frequent-pattern mining methods,8,Scanning TID=200,Transaction Database TID Items 200 f,c,a,b,m,f:2,c:2,a:2,m:1,p:1,Header TableNode Item count head f 1 c 1 a 1 b 1 m 2p 1,root,b:1,m:1,Freque
7、nt-pattern mining methods,9,The final FP-tree,Transaction Database TID Items 100 f,a,c,d,g,i,m,p 200 a,b,c,f,l,m,o 300 b,f,h,j,o 400 b,c,k,s,p 500 a,f,c,e,l,p,m,n Min support = 3,Frequent 1-items in frequency descending order:f,c,a,b,m,p,f:4,c:1,b:1,p:1,b:1,c:3,a:3,b:1,m:2,p:2,m:1,Header TableNode I
8、tem count head f 1 c 2 a 1 b 3 m 2 p 2,Frequent-pattern mining methods,10,FP-Tree Construction,Scans the database only twice Subsequent mining: based on the FP-tree,Frequent-pattern mining methods,11,How to Mine an FP-tree?,Step 1: form conditional pattern base Step 2: construct conditional FP-tree
9、Step 3: recursively mine conditional FP-trees,Frequent-pattern mining methods,12,Conditional Pattern Base,Let I be a frequent item A sub database which consists of the set of prefix paths in the FP-tree With item I as a co-occurring suffix pattern Example: m is a frequent item ms conditional pattern
10、 base: : support =2 : support = 1 Mine recursively on such databases,f:4,c:1,b:1,p:1,b:1,c:3,a:3,b:1,m:2,p:2,m:1,Frequent-pattern mining methods,13,Conditional Pattern Tree,Let I be a suffix item, DB|I be the conditional pattern base The frequent pattern tree TreeI is known as the conditional patter
11、n tree Example: m is a frequent item ms conditional pattern base: : support =2 : support = 1 ms conditional pattern tree,f:4,c:3,a:3,m:2,Frequent-pattern mining methods,14,Composition of patterns a and b,Let a be a frequent item in DB, B be as conditional pattern base, and b be an itemset in B. Then
12、 a + b is frequent in DB if and only if b is frequent in B. Example: Starting with a=p ps conditional pattern base (from the tree) B= (f,c,a,m): 2 (c,b): 1 Let b be c. Then a+b=p,c, with support = 3.,Frequent-pattern mining methods,15,Single path tree,Let P be a single path FP tree Let I1, I2, Ik be
13、 an itemset in the tree Let Ij have the lowest support Then the support(I1, I2, Ik)=support(Ij) Example:,f:4,c:1,b:1,p:1,b:1,c:3,a:3,b:1,m:2,p:2,m:1,Frequent-pattern mining methods,16,FP_growth Algorithm Fig 6.10,Recursive Algorithm Input: A transaction database, min_supp Output: The complete set of
14、 frequent patterns 1. FP-Tree construction 2. Mining FP-Tree by calling FP_growth(FP_tree, null) Key Idea: consider single path FP-tree and multi-path FP-tree separately Continue to split until get single-path FP-tree,Frequent-pattern mining methods,17,FP_Growth (tree, a),If tree contains a single p
15、ath P, then For each combination (denoted as b) of the nodes in the path P, then Generate pattern b+a with support = min_supp of nodes in b Else for each a in the header of tree, do Generate pattern b = a + a with support = a.support; Construct (1) bs conditional pattern base and (2) bs conditional
16、FP-tree Treeb If Treeb is not empty, then Call FP-growth(Treeb, b); ,Frequent-pattern mining methods,18,FP-Growth vs. Apriori: Scalability With the Support Threshold,Data set T25I20D10K,Frequent-pattern mining methods,19,FP-Growth vs. Tree-Projection: Scalability with the Support Threshold,Data set
17、T25I20D100K,Frequent-pattern mining methods,20,Why Is FP-Growth the Winner?,Divide-and-conquer: decompose both the mining task and DB according to the frequent patterns obtained so far leads to focused search of smaller databases Other factors no candidate generation, no candidate test compressed da
18、tabase: FP-tree structure no repeated scan of entire database basic opscounting and FP-tree building, not pattern search and matching,Frequent-pattern mining methods,21,Implications of the Methodology: Papers by Han, et al.,Mining closed frequent itemsets and max-patterns CLOSET (DMKD00) Mining sequ
19、ential patterns FreeSpan (KDD00), PrefixSpan (ICDE01) Constraint-based mining of frequent patterns Convertible constraints (KDD00, ICDE01) Computing iceberg data cubes with complex measures H-tree and H-cubing algorithm (SIGMOD01),Frequent-pattern mining methods,22,Visualization of Association Rules
20、: Pane Graph,Frequent-pattern mining methods,23,Visualization of Association Rules: Rule Graph,Frequent-pattern mining methods,24,Mining Various Kinds of Rules or Regularities,Multi-level, quantitative association rules, correlation and causality, ratio rules, sequential patterns, emerging patterns,
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- ASSOCIATIONRULEMININGIIPPT
