1、B+-Trees (Part 1),COMP171,Main and secondary memories,Secondary storage device is much, much slower than the main RAM Pages and blocksInternal, external sorting CPU operations Disk access: Disk-read(), disk-write(), much more expensive than the operation unit,Contents,Why B+ Tree? B+ Tree Introducti
2、on Searching and Insertion in B+ Tree,Motivation,AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows most operations finishes within O(logN) time The theoretical conclusion works as long as the entire structure can fit into the main memory Whe
3、n the data size is too large and has to reside on disk, the performance of AVL tree may deteriorate rapidly,A Practical Example,A 500-MIPS machine, with 7200 RPM hard disk 500 million instruction executions, and approximately 120 disk accesses each second (roughly, 500 000 faster!) A database with 1
4、0,000,000 items, 256 bytes each (assume it doesnt fit in memory) The machine is shared by 20 usersLets calculate a typical searching time for 1 user A successful search need log 10000000 = 24 disk access, around 4 sec. This is way too slow!We want to reduce the number of disk access to a very small
5、constant,From Binary to M-ary,Idea: allow a node in a tree to have many children Less disk access = less tree height = more branching As branching increases, the depth decreases An M-ary tree allows M-way branching Each internal node has at most M childrenA complete M-ary tree has height that is rou
6、ghly logMN instead of log2N if M = 20, then log20 220 5 Thus, we can speedup the search significantly,M-ary Search Tree,Binary search tree has one key to decide which of the two branches to take M-ary search tree needs M-1 keys to decide which branch to takeM-ary search tree should be balanced in so
7、me way too We dont want an M-ary search tree to degenerate to a linked list, or even a binary search tree,B+ Tree,A B+-tree of order M (M3) is an M-ary tree with the following properties: The data items are stored at leaves The root is either a leaf or has between two and M children Node: The (inter
8、nal) node (non-leaf) stores up to M-1 keys (redundant) to guide the searching; key i represents the smallest key in subtree i+1 All nodes (except the root) have between M/2 and M children Leaf: A leaf has between L/2 and L data items, for some L (usually L M, but we will assume M=L in most examples)
9、 All leaves are at the same depth,Note there are various definitions of B-trees, but mostly in minor ways. The above definition is one of the popular forms.,Keys in Internal Nodes,Which keys are stored at the internal nodes?There are several ways to do it. Different books adopt different conventions
10、.We will adopt the following convention: key i in an internal node is the smallest key (redundant) in its i+1 subtree (i.e. right subtree of key i)Even following this convention, there is no unique B+-tree for the same set of records.,B+ Tree Example 1 (M=L=5),Records are stored at the leaves (we on
11、ly show the keys here) Since L=5, each leaf has between 3 and 5 data items Since M=5, each nonleaf nodes has between 3 to 5 childrenRequiring nodes to be half full guarantees that the B+ tree does not degenerate into a simple binary tree,B+ Tree Example 2 (M=4, L=3),We can still talk about left and
12、right child pointers E.g. the left child pointer of N is the same as the right child pointer of J We can also talk about the left subtree and right subtree of a key in internal nodes,B+ Tree in Practical Usage,Each internal node/leaf is designed to fit into one I/O block of data. An I/O block usuall
13、y can hold quite a lot of data. Hence, an internal node can keep a lot of keys, i.e., large M. This implies that the tree has only a few levels and only a few disk accesses can accomplish a search, insertion, or deletion.B+-tree is a popular structure used in commercial databases. To further speed u
14、p the search, the first one or two levels of the B+-tree are usually kept in main memory.The disadvantage of B+-tree is that most nodes will have less than M-1 keys most of the time. This could lead to severe space wastage. Thus, it is not a good dictionary structure for data in main memory.The text
15、book calls the tree B-tree instead of B+-tree. In some other textbooks, B-tree refers to the variant where the actual records are kept at internal nodes as well as the leaves. Such a scheme is not practical. Keeping actual records at the internal nodes will limit the number of keys stored there, and
16、 thus increasing the number of tree levels.,Searching Example,Suppose that we want to search for the key K. The path traversed is shown in bold.,Searching Algorithm,Let x be the input search key. Start the searching at the root If we encounter an internal node v, search (linear search or binary sear
17、ch) for x among the keys stored at v If x Kmin at v, follow the left child pointer of Kmin If Ki x Ki+1 for two consecutive keys Ki and Ki+1 at v, follow the left child pointer of Ki+1 If x Kmax at v, follow the right child pointer of Kmax If we encounter a leaf v, we search (linear search or binary
18、 search) for x among the keys stored at v. If found, we return the entire record; otherwise, report not found.,Insertion Procedure,we want to insert a key K Search for the key K using the search procedure This leads to a leaf x Insert K into x If x is not full, trivial, If so, troubles, need splitti
19、ng to maintain the properties of B+ tree (instead of rotations in AVL trees),Insertion into a Leaf,A: If leaf x contains L keys, then insert K into x (at the correct position in node x) D: If x is already full (i.e. containing L keys). Split x Cut x off from its parent Insert K into x, pretending x
20、has space for K. Now x has L+1 keys. After inserting K, split x into 2 new leaves xL and xR, with xL containing the (L+1)/2 smallest keys, and xR containing the remaining (L+1)/2 keys. Let J be the minimum key in xR Make a copy of J to be the parent of xL and xR, and insert the copy together with it
21、s child pointers into the old parent of x.,Inserting into a Non-full Leaf (L=3),Splitting a Leaf: Inserting T,Splitting Example 1,Two disk accesses to write the two leaves, one disk access to update the parentFor L=32, two leaves with 16 and 17 items are created. We can perform 15 more insertions wi
22、thout another split,Splitting Example 2,Contd,= Need to split the internal node,E: Splitting an Internal Node,To insert a key K into a full internal node x: Cut x off from its parent Insert K as usual by pretending there is space Now x has M keys! Not M-1 keys. Split x into 3 new internal nodes xLan
23、d xR, and x-parent! xL containing the ( M/2 - 1 ) smallest keys, and xR containing the M/2 largest keys. Note that the (M/2)th key J is a new node, not placed in xL or xR Make J the parent node of xL and xR, and insert J together with its child pointers into the old parent of x.,Example: Splitting I
24、nternal Node (M=4),3+1 = 4, and 4 is split into 1, 1 and 2. So D J L N is into D and J and L N,Contd,Termination,Splitting will continue as long as we encounter full internal nodes If the split internal node x does not have a parent (i.e. x is a root), then create a new root containing the key J and
25、 its two children,Summary of B+ Tree of order M and of leaf size L,The root is either a leaf or 2 to M children Each (internal) node (except the root) has between M/2 and M children (at most M chidren, so at most M-1 keys) Each leaf has between L/2 and L keys and corresponding data itemsWe assume M=
26、L in most examples.,Roadmap of insertion,A: Trivial (leaf is not full) B: Leaf is full C: Split a leaf, D: trivial (node is not full) E: node is full Split a node,insert a key K Search for the key K and get to a leaf x Insert K into x If x is not full, trivial, If full, troubles , need splitting to maintain the properties of B+ tree (instead of rotations in AVL trees),Main conern: leaf and node might be full!,