Chapter 18 - B-trees

Document last modified: 

Overview - Dealing with maps containing very large amounts of data

  1. Similar to binary-search-tree but has n-ary children.
     
  2. All leafs at same depth (very important).
     
  3. Node x with x.n keys has x.n + 1 children
     
  4. Keys are stored in non-decreasing order within a node
     
  5. Keys at root node of each subtree divide children into ranges of keys.

    For any key ki in a subtree of x:

    k ≤  x.key1 ≤  k ≤  x.key2  ≤...≤  x.keyx.n ≤  kx.n + 1 

Example: Nodes examined in search for key R.

 

B-trees useful when multi-level data storage required such as database that is too large to be in memory, some must be in some larger scale storage.

Example - memory for some working set of keys, magnetic for non-working set.

Access time:

memory about 5*10-9 seconds

magnetic about 13*10-3 seconds

due to RPM of disk and positioning time of read/write head

typically read/write large number of bytes (216 bytes) at a time

About Secondary Storage

 

B-trees design goals

  1. minimize disk accesses
     
  2. m-ary with large m for large branching factor but small tree height
     
  3. root always in memory
     
  4. children may be on disk, each level of B-tree requires 1 disk access
     
  5. Insert, delete and search operations on B-tree of height h should require at most h-1 disk accesses, implies one pass with no back-tracking.

 

Example: B-tree of height two with more than one billion keys requires at most two disk reads/writes to access any key.

Question 18.1 - See B-tree below.

  1. How many nodes are examined to search for the R key?
     
  2. What do you estimate worst-case search to be in terms of height?
     
  3. Which keys are examined to search for the Z key?
     
  4. If accessing a new level requires a disk access, how should you design a tree?
     
  5. Why is it reasonable to always keep the root in memory?

 

18.1 Two definitions of B-Tree

From Data Structures and Program Design by Chapter 11 of Kruse:

A B-tree of order m is an m-way tree in which:

  1. All leaves are on the same level.
     
  2. All internal nodes except the root have at most m nonempty children and at least  ⌈m/2⌉   nonempty children.
     
  3. The number of keys in each internal node is one less than the number of its nonempty children, and these keys partition the keys in the children in the fashion of a search tree.
     
  4. The root has at most m children but nodes may also have as few as 2 if it is not a leaf, or none if the tree consists of the root alone.
Not a B-tree

Some nodes have empty (bc) or too few children (r)

Not all leaves on same level (wxyz).

From Chapter 18.1 of Corman (we'll use his terminology but the graphics aren't as pretty):

B-Tree T is a rooted tree of nodes:

  1. Every node x has the following fields:
     
    1. x.n is an integer representing the number of keys currently stored in node x          (eg. JKL.n is 3)
       
    2. x.keyi is the ith key of node x, i = 1, ..., x.n                                                          (eg. JKL.key2 is K)
      the x.n keys are stored in non-decreasing order so that:

                  x.key1x.key2 ≤ ... ≤ x.keyx.n                                                               (eg. J≤K≤L)
       
    3. x.leaf is type boolean, is TRUE if x is a leaf, is FALSE if x is an internal node          (eg. JKL.leaf is true)
       
    4. x.ci points to child nodes one level down tree, each node x has 1 .. x.n+1 child pointers       (eg. DH.c2 is FG)
       
  2. Child Pointers:
     
    • If x.leaf = FALSE, then x.ci contain the pointers to x's children   
    • If x.leaf = TRUE, then x.ci are undefined
       
  3. Keys x.keyi separate the ranges of keys stored in each subtree: 
     
    • let ki be any key stored in the subtree with root x.ci then:

           k ≤  x.key1 ≤  k ≤  x.key2  ≤...≤  x.keyx.n ≤  kx.n + 1 

    Example: At right, for x=DH

    DH.c1=BC

    DH.c2=FG

    DH.c3=JKL

    BC ≤  D1  ≤  FG ≤  H2  ≤  JKL3 

    Question 18.2 Does this hold for QTX?

     

  4. All leaves have the same depth, which is the tree's height.   (eg. upper-right B-tree, height=2)
     
  5. Minimum Degree: t ≥ 2
     
    1. minimum number of keys/node:
      • root node: 1 (for a non-empty tree)
      • non-root node: t - 1
         
    2. maximum number of keys/node:
      • all nodes: 2t - 1
         
    3. minimum number of children/node:
      • root node (see Figure 18.6):
        • 0 children, all the keys can be in the root when the number of keys in the tree is [1..2t - 1]
        • 2 children if 2t or more keys in the tree: 1 key at root, t keys at one child, t-1 keys at the other.
      • non-root internal nodes: always a minimum of t children
      • leaf nodes: always 0 children
    4. maximum number of children:
      • node is full when it contains 2t - 1 keys
      • internal nodes including root: 2t
      • leaf nodes: always 0 children

    Question 18.3

    1. Is the figure at right a B-tree? How do you know?
       
    2. What is t, the minimum degree, of the B-tree at right?
       
    3. Can FG have an additional key? Can JKL?
       
    4. For minimum degree t=5, what is:
      1. the fewest keys other than the root?
      2. the most keys?
      3. the fewest children of an internal node other than the root?
      4. the most children?
         
    5. Why must the root be allowed to have as few as one key?
       
    6. What must happen when a node is full (2t-1 keys) before another key can be added within the same range? Consider inserting I into JKL.
       
    7. Give one factor used in determining t

     

Table of possible values for t:

Root Node Internal Nodes
t # of keys
[1..2t-1]
# of children
[0, 2 .. 2t]
2 [1..3] [0, 2..4]
3 [1..5] [0, 2..6]
4 [1..7] [0, 2..8]
5 [1..9] [0, 2..10]
...    
500 [1..999] [0, 2..1000]
t # of keys
[t-1 .. 2t-1]
# of children
[t .. 2t]
2 [1..3] [2..4]
3 [2..5] [3..6]
4 [3..7] [4..8]
5 [4..9] [5..10]
...    
500 [499..999] [500..1000]

Question 18.4

  1. Is this a B-tree?
  2. What is the minimum number of keys/node?         t-1=?
  3. What is the maximum number of keys/node?        2t-1=?
  4. What is t? Solve for t in the following:

    t-1 = Minimum number of keys/node

    2t-1 = Maximum number of keys/node
     

  5. The root cannot have only 1 child. Why?
  6. For a tree with 2t keys:
    1. How many keys must be at the root?
    2. How many children of root?
    3. How many keys at each child?



Root Node



Internal Nodes
t # keys
[1..2t-1]
# children
[0, 2 .. 2t]
3 [1..5] [0, 2..6]
t # keys
[t-1 .. 2t-1]
# children
[t .. 2t]
3 [2..5] [3..6]

Question 18.5

  1. Is this a B-tree?
  2. What are legal values of t?

    Minimum keys/node        t-1=?

    Maximum keys/node       2t-1=?
     

  3. Does the root satisfy the B-tree definition?
  4. What is the maximum number of keys at the root?
  5. What is the maximum number of keys anywhere other than the root?
  6. What is the maximum number of children anywhere?
  7. What is the minimum number of children anywhere other than the root?