Chapter 18 - B-trees |
Document last modified: |
Overview - Dealing with maps containing very large amounts of data

Basic structure
- Similar to binary-search-tree but has n-ary children.
- All leafs at same depth (very important).
- Node x with x.n keys has x.n + 1 children
- Keys are stored in non-decreasing order within a node
- Keys at root node of each subtree divide children into ranges of keys. For any key ki in a subtree of x:
k1 ≤ x.key1 ≤ k2 ≤ x.key2 ≤...≤ x.keyx.n ≤ kx.n + 1
Example: Nodes examined in search for key R.
B-trees useful when multi-level data storage required such as database that is too large to be in memory, some must be in some larger scale storage.
Example - memory for some working set of keys, magnetic for non-working set.
Access time:
memory about 5*10-9 seconds
magnetic about 13*10-3 seconds
due to RPM of disk and positioning time of read/write head
typically read/write large number of bytes (216 bytes) at a time
B-trees design goals
- minimize disk accesses
- m-ary with large m for large branching factor but small tree height
- root always in memory
- children may be on disk, each level of B-tree requires 1 disk access
- Insert, delete and search operations on B-tree of height h should require at most h-1 disk accesses
About Secondary Storage
- Currently disk drives are the most commonly used secondary storage.
- Hardware: spindle, platters, read/write heads
- Logical: track, cylinder, page
- Accessing data requires - head movement and rotational delay
- Average disk access time: 3 to 9 milliseconds
- Average maim memory access time: 100 nanoseconds
- Parts of a second:
- millisecond - 10-3, or 1000th of a second
- microsecond - 10-6, or 1,000,000th of a second
- nanosecond - 10-9, or 1,000,000,000th of a second
- Page
- because disk access takes so long, reads and writes are done in big chunks called pages
- typical page sizes range: 211 to 214 bytes
- B-Tree node sizes are fixed to be equivalent to the machine's disk page size
Example: B-tree of more than one billion keys requires at most two disk reads/writes to access any key.
Question 18.1 - See B-tree below.
- How many nodes are examined to search for the R key?
- What do you estimate worst-case search to be in terms of height?
- Which keys are examined to search for the Z key?
- If accessing a new level requires a disk access, how would you design a tree?
- Why is it reasonable to always keep the root in memory?
18.1 Two definitions of B-Tree
From Data Structures and Program Design by Chapter 11 of Kruse:
A B-tree of order m is an m-way tree in which:
- All leaves are on the same level.
- All internal nodes except the root have at most m nonempty children and at least ém/2ù nonempty children.
- The number of keys in each internal node is one less than the number of its nonempty children, and these keys partition the keys in the children in the fashion of a search tree.
- The root has at most m children but nodes may also have as few as 2 if it is not a leaf, or none if the tree consists of the root alone.
| Not a B-tree, some nodes have empty (bc) or too few children (r), not all
leaves on same level (wxyz).
|
![]() |
From Chapter 18.1 of Corman( we'll use his terminology but the graphics aren't as pretty):
A B-Tree T is a rooted tree of nodes:
- Every node x has the following fields:
- x.n is an integer representing the number of keys currently stored in node x (eg. JKL.n is 3)
- x.keyi is the ith key of node x, i = 1, ..., x.n (eg. JKL.key2 is K)
the x.n keys are stored in non-decreasing order so that:
x.key1 ≤ x.key2 ≤ ... ≤ x.keyx.n (eg. J≤K≤L)
- x.leaf is type boolean, is TRUE if x is a leaf, is FALSE if x is an internal node (eg. JKL.leaf is true)
- x.ci points to child nodes one level down in the tree, each node x has 1 .. x.n+1 child pointers (eg. DH.c2 is FG)
- Child Pointers:
- If x.leaf = FALSE, then x.ci contain the pointers to x's children
- If x.leaf = TRUE, then x.ci are undefined
- The keys x.keyi separate the ranges of keys stored in each subtree:
- let ki be any key stored in the subtree with root x.ci then:
k1 ≤ x.key1 ≤ k2 ≤ x.key2 ≤...≤ x.keyx.n ≤ kx.n + 1
Example: At right, for x=DH and DH.c1=BC, DH.c2=FG, DH.c3=JKL
BC1 ≤ D ≤ FG2 ≤ H ≤ JKL3
Question 18.2 Does this hold for QTX?
- All leaves have the same depth, which is the tree's height. (eg. upper-right B-tree, height=2)
- Minimum Degree: t ³ 2
- minimum number of keys/node:
- root node: 1 (for a non-empty tree)
- non-root node: t - 1
- maximum number of keys/node:
- all nodes: 2t - 1
- minimum number of children/node:
- root node (see Figure 18.6):
- 0 children, all the keys can be in the root
when the number of keys in the tree is [1..2t - 1]
- 2 children if 2t or more keys in the tree: 1 key at root, t keys at one child, t-1 keys at the other.
- non-root internal nodes: always a minimum of t children
- leaf nodes: always 0 children
- maximum number of children:
- node is full when it contains 2t - 1 keys
- internal nodes including root: 2t
- leaf nodes: always 0 children
Question 18.3
- Is the figure at right a B-tree? How do you know?
- What is the minimum degree of the B-tree at right?
- Could FG have an additional key? JKL?
- For minimum degree t=5, what is:
- the fewest keys other than the root?
- the most keys?
- the fewest children of an internal node other than the root?
- the most children?
- Why must the root be allowed to have as few as one key?
- What must happen when a node is full (2t-1 keys) before another key can be added within the same range? Consider inserting I into JKL.
- Give one factor used in determining t
Table of possible values for t:
| Root Node | Internal Nodes | |||||||||||||||||||||||||||||||||||||||||||
|
|
Question 18.4
- Is this a B-tree?
- What is t?
- What is the minimum number of keys/node?
- What is the maximum number of keys/node?
- The root cannot have only 1 child. Why?
- For a tree with 2t keys:
- How many keys must be at the root?
- How many children?
- How many keys at each child?
Root Node |
![]() |
Internal Nodes |
||||||||||||
|
|
Question 18.5
- Is this a B-tree?
- What are legal values of t?
- Does the root satisfy the B-tree definition?
- What is the maximum number of keys at the root?
- What is the maximum number of keys anywhere other than the root?
- What is the maximum number of children anywhere?
- What is the minimum number of children anywhere other than the root?
