How to Build a Stream Database.ppt
《How to Build a Stream Database.ppt》由会员分享,可在线阅读,更多相关《How to Build a Stream Database.ppt(21页珍藏版)》请在麦多课文档分享上搜索。
1、How to Build a Stream Database,Theodore Johnson AT&T Labs - Research,What is a stream database?,Query data from a stream A data feed with a schema You can also query conventional relations Examples Sensor data Stock market quotes Network monitoring data Querying a stream forces some changes to the D
2、BMS: Must use push-based rather than pull-based operators Must be able to provide partial answers E.g., you never finish the query One-pass E.g., you cannot (in general) rewind the stream.,Stream Databases for Network Measurements,Continuing need to measure and monitor networks Router configuration,
3、 debugging, detect network attacks, verify service agreements, . Very large amounts of data In principle, wed like to query every packet flowing in the network And in real time Data arrives in streams IP streams, NetFlow streams, SNMP streams, Special queries : grouping by subsequences IP packets fo
4、rming a flow, forming a TCP/IP session, forming a users interactions, ,Query Language,Typical queries: For each source IP address and each 5 minute interval, count the number of bytes and number of packets related to HTTP transfers Find the TCP/IP SYN packets with and without matching FIN packets Co
5、mpute the NetFlows in the packet stream, using a 30-second timeout between packets Pervasive use of time and sequence. We would like to express these queries using a minimal change to SQL. We will rely on the query optimizer making use of ordering properties of the data streams.,Basics,Selection, pr
6、ojection, join, group-by, aggregation, etc. Mix stream with tables Some restrictions to ensure that we can answer the query in limited space Join : When joining streams, the join predicate must define a window in which the join must occur E.g. match SYN packets on an inbound link with SYNACK on an o
7、utbound link. Group-by and Aggregation : We must be able to determine when all tuples for a group have been processed E.g., number of packets during each 30 second interval More on this later.,Complex Aggregation,Grouping Variables Analogous to table variables Represents the value of a correlated su
8、bquery Only aggregate values can be referenced Example:,Select SourceIP, tb, (count(*)+count(X)/2+count(Y)/4)/1.75 From Packets Group By SourceIP, ts/60, ts/60+1,ts/60+2 as tb, X, Y Such thatX.SourceIP=SourceIP and X.ts/60+1=tbY.SourceIP=SourceIP and Y.ts/60+2=tb,X represents the querySelect * from
9、Packetswhere SourceIP=$SourceIP and ts/60+1 = $tb,Defining Sequences,Count the packets in connection K between the SYN packet and the FIN packet,Select K, ts, count(Y) from TCPIP Where SYN=1 Group by K, ts : X, Y Such ThatX.K = K and X.ts ts and X.FIN = 1Y.K = K and Y.ts = ts and Y.ts = MIN(X.ts),Or
10、dering Properties,The query language lets us express queries that seem to require self-joins, etc. But the queries frequently have a temporal component: timestamps as group-by variables, timestamps in the join predicates, etc. If we can reason about timestamps, we can find a stream evaluation plan f
11、or these queries But not all We want to avoid cumbersome model restrictions, e.g. sequence databases We want precise semantics, e.g. avoid “continuous query” models.,Temporal Properties,Define ordering properties on attributes of a stream. Allow for multiple ordering properties, e.g. multiple timest
12、amps, start time vs. end time, timestamp vs. sequence number, etc. Many types of ordering properties Increasing, nondecreasing, Increasing within delta, banded-increasing(epsilon) Increasing in group G Ordering properties are part of the data type.,Stream TCPIPUllong timestamp increasing;Uint Source
13、IP;Uint SequenceNbr increasing_in_group(SourceIP, ) ; ,Stream Operators,Power of relational algebra : closed algebra. Enable the composition of complex queries E.g., COUNT DISTINCT is a COUNT(*) over a GROUP BY Need stream operators which produce streams That is, we can deduce ordering properties of
14、 the output We have defined ordering properties to capture semantics of the output of operators Increasing in group G : group-by and aggregation Banded-increasing : window join. Implementation detail : special operators Emulate complex network protocols, e.g. IP defragmentation,Basic Operators,Selec
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- HOWTOBUILDASTREAMDATABASEPPT
