1: Binary Search Trees SDP4
2: Binary Trees Binary tree is a root left subtree (maybe empty) right subtree (maybe empty) Properties max of leaves: max of nodes: average depth for N nodes: Representation:
3: Binary Tree Representation
4: Dictionary ADT Dictionary operations create destroy insert find delete Stores values associated with user-specified keys values may be any (homogeneous) type keys may be any (homogeneous) comparable type
5: Dictionary ADT: Used Everywhere Arrays Sets Dictionaries Router tables Page tables Symbol tables C structures … Anywhere we need to find things fast based on a key
6: Search ADT Dictionary operations create destroy insert find delete Stores only the keys keys may be any (homogenous) comparable quickly tests for membership Simplified dictionary, useful for examples (e. g. CSE 326)
7: Dictionary Data Structure: Requirements Fast insertion runtime: Fast searching runtime: Fast deletion runtime:
8: Naïve Implementations
9: Binary Search Tree Dictionary Data Structure Binary tree property each node has 2 children result: storage is small operations are simple average depth is small Search tree property all keys in left subtree smaller than roots key all keys in right subtree larger than roots key result: easy to find any given key Insert/delete by changing links
10: Example and Counter-Example
11: Complete Binary Search Tree Complete binary search tree (aka binary heap): Links are completely filled, except possibly bottom level, which is filled left-to-right.
12: In-Order Traversal visit left subtree visit node visit right subtree What does this guarantee with a BST?
13: Recursive Find Node find(Comparable key, Node t) if (t NULL) return t; else if (key t-key) return find(key, t-left); else if (key t-key) return find(key, t-right); else return t;
14: Iterative Find Node find(Comparable key, Node t) while (t ! NULL && t-key ! key) if (key t-key) t t-left; else t t-right; return t;
15: Insert void insert(Comparable x, Node t) if ( t NULL ) t new Node(x); else if (x t-key) insert( x, t-left ); else if (x t-key) insert( x, t-right ); else // duplicate // handling is app-dependent
16: BuildTree for BSTs Suppose the data 1, 2, 3, 4, 5, 6, 7, 8, 9 is inserted into an initially empty BST: in order in reverse order median first, then left median, right median, etc.
17: Analysis of BuildTree Worst case is O(n2) 1 2 3 … n O(n2) Average case assuming all orderings equally likely: O(n log n) averaging over all insert sequences (not over all binary trees) equivalently: average depth of a node is log n proof: see Introduction to Algorithms, Cormen, Leiserson, & Rivest
18: BST Bonus: FindMin, FindMax Find minimum Find maximum
19: Successor Node Next larger node in this nodes subtree
20: Predecessor Node
21: Deletion
22: Lazy Deletion Instead of physically deleting nodes, just mark them as deleted simpler physical deletions done in batches some adds just flip deleted flag extra memory for deleted flag many lazy deletions slow finds some operations may have to be modified (e. g. , min and max)
23: Lazy Deletion
24: Deletion - Leaf Case
25: Deletion - One Child Case
26: Deletion - Two Child Case
27: Delete Code
28: Thinking about Binary Search Trees Observations Each operation views two new elements at a time Elements (even siblings) may be scattered in memory Binary search trees are fast if theyre shallow Realities For large data sets, disk accesses dominate runtime Some deep and some shallow BSTs exist for any data
29: Beauty is Only (log n) Deep Binary Search Trees are fast if theyre shallow: perfectly complete complete – possibly missing some fringe (leaves) any other good cases? What matters? Problems occur when one branch is much longer than another i. e. when tree is out of balance
30: Dictionary Implementations BSTs looking good for shallow trees, i. e. if Depth is small (log n); otherwise as bad as a linked list!
31: Digression: Tail Recursion Tail recursion: when the tail (final operation) of a function recursively calls the function Why is tail recursion especially bad with a linked list? Why might it be a lot better with a tree? Why might it not?
32: Making Trees Efficient: Possible Solutions Keep BSTs shallow by maintaining balance AVL trees … also exploit most-recently-used (mru) info Splay trees Reduce disk access by increasing branching factor B-trees