Adaptive Query Processing for Data Aggregation-.ppt
《Adaptive Query Processing for Data Aggregation-.ppt》由会员分享,可在线阅读,更多相关《Adaptive Query Processing for Data Aggregation-.ppt(32页珍藏版)》请在麦多课文档分享上搜索。
1、Adaptive Query Processing for Data Aggregation:,Mining, Using and Maintaining Source Statistics,M.S Thesis Defense by Jianchun FanCommittee Members: Dr. Subbarao Kambhampati (chair) Dr. Huan Liu Dr. Yi Chen April 13, 2006,Introduction,Data Aggregation: Vertical Integration,Mediator,R (A1, A2, A3, A4
2、, A5, A6),S1,R1 (A1, A2, _, _, A5, A6),S2,R2 (A1, _, A3, A4, A5, A6),S3,R1 (A1, A2, A3, A4, A5, _),Introduction,Query Processing in Data Aggregation Sending every query to all sources ? Increasing work load on sources Consuming a lot of network resources Keeping users waiting Primary processing task
3、:Selecting the most relevant sources regarding difference user objectives, such as completeness and quality of the answers and response time Need several types of sources statistics to guide source selection Usually not directly available,Introduction,Challenges Automatically gather various types of
4、 source statistics to optimize individual goal Many answers (high coverage) Good answers (high density) Answered quickly (short latency) Combine different statistics to support multi-objective query processing Maintain statistics dynamically,System Overview,System Overview,Test beds: Bibfinder: Onli
5、ne bibliography mediator system, integrating DBLP, IEEE xplore, CSB, Network Bibligraph, ACM Digital Library, etc. Synthetic test bed: 30 synthetic data sources (based on Yahoo! Auto database) with different coverage, density and latency characteristics.,Outline,Introduction & Overview Coverage/Over
6、lap Statistics Learning Density Statistics Learning Latency Statistics Multi-Objective Query Processing Other Contribution Conclusion,Coverage/Overlap Statistics,Coverage: how many answers a source provides for a given query Overlap: how many common answers a set of sources share for a given query B
7、ased on Nie & Kambkampati ICDE 2004,Density Statistics,Coverage measures “vertical completeness” of the answer set “horizontal completeness” is important too quality of the individual answers,Density statistics measures the horizontal completeness of the individual answer tuples,Defining Density,Den
8、sity of a source w.r.t a given query:Average of density of all answers,Select A1, A2, A3, A4 From S Where A1 v1 Density = (1 + 0.5 + 0.5 + 0.75) / 4= 0.675,Learning density for every possible source/query combination? too costly The number of possible queries is exponential to the number of attribut
9、es,Projection Attribute set,Selection Predicates,Learning Density Statistics,A more realistic solution: classify the queries and learn density statistics only w.r.t the classes,Select A1, A2, A3, A4 From S Where A1 v1,Projection Attribute set,Selection Predicates,Assumption: If a tuple t represents
10、a real world entity E, then whether or not t has missing value on attribute A is independent to Es actual value of A.,Learning Density Statistics,Query class for density statistics: projection attribute set For queries whose projection attribute set is (A1, A2, , Am), 2m different types of answers,2
11、2 different density patterns: dp1 = (A1, A2) dp2 = (A1, A2) dp3 = (A1, A2) dp4 = (A1, A2),Density(A1, A2 | S) = P(dp1 | S) * 1.0 + P(dp2 | S) * 0.5 + P(dp3 | S) * 0.5 + P(dp4 | S) * 0.0,Learning Density Statistics,R(A1, A2, , An),2n possible projection attribute set,(A1) (A1, A2) (A1, A3) (A1, A2, ,
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- ADAPTIVEQUERYPROCESSINGFORDATAAGGREGATIONPPT
