IT Lecture Notes by Mark Kelly, McKinnon Secondary College

Binary Trees and Indexes

Physically moving data around to get it in order is very CPU-intensive and time-consuming, so most databases store data in random order - records are stored in the order they are created. Physically sorting records is also usually a waste of time because people often want data stored in different orders, such as by ID, name, amount owed, postcode etc.

There are some traditional strategies to make both sorting and searching far less of an effort: binary trees, indexing and linked lists.

 

Binary Trees

One way of reducing search times is to use a binary tree so data is semi-sorted when it's saved. Lets assume you need to store 7 names: John, Bruce, Tom, Mary, Fred, Gina and Xeno.

A binary tree is made up of nodes and each node has these fields: a node number, the data (e.g. the person's name), a left-child and a right-child. These children are pointers to the nodes that come before and after the current node. An example may be clearer...

The first datum is "JOHN" so we create the first node:

binary tree - first node

Since there are no other nodes, both children of node 1 are zero.

.

Coding Tip

In fact, the "node" is just stored in an appropriate data structure. In VB6, you might use a random access file and a custom TYPE (vb.net calls it a STRUCTURE) like this:

TYPE nodetype
num as integer
value as string * 15
leftchild as integer
rightchild as integer
END TYPE

After declaring the type's "class", you then create an actual instance of the structure with:

DIM node AS nodetype

To create the first node in code, you could say:

Open "c:\NAMELIST.DB" For Random Shared As #1 Len = Len(node) ' create the random file
nodenumber = 1
node.num = nodenumber
node.value = "John"
node.leftchild = 0
node.rightchild = 0
Put #1, nodenumber, node

 

Adding the second node

 

Starting at the 'root node' (node 1) check whether the new data (Bruce) is less than or greater than the data in the root node (John). Since Bruce < John, its node is assigned as the left-child of John.

 

 

Adding the third node

To add TOM, we start again at the root node. TOM is alphabetically greater than JOHN so the new node is assigned as the right-child of node 1.

 

 

Adding MARY

Let's walk it through. Compare MARY to JOHN. It's greater, so follow the right-child path. We arrive at node 3 and again compare values - MARY is less than TOM and there is no existing left-child path so we add MARY as the new left-child of TOM.

 

 

What's our data structure looking like? Something like this:

Node
Num
Value Left
Child
Right
Child
1 John 2 3
2 Bruce 0 0
3 Tom 4 0
4 Mary 0 0

 

Adding Fred

Fred < John so it belongs to the left. There is a left-child (node 2) so go to node 2.
Fred > Bruce so it belongs to the right. There is no right-child, so add Fred as the right-child of Bruce.

 

 

Adding Gina

 

Gina < John so follow the left path to node 2.

Gina > Bruce so follow the right path to node 5.

Gina > Fred so go right. There is no right node, so add Gina as the right-child of Fred.

And finally, Xeno.

Xeno > John so follow the right path to node 3.

Xeno > Tom so follow the right path.

There is no right node, so add Xeno as the right-child of Tom.

Node
Num
Value Left
Child
Right
Child
1 John 2 3
2 Bruce 0 5
3 Tom 4 7
4 Mary 0 0
5 Fred 0 6
6 Gina 0 0
7 Xeno 0 0

 

 

Now that we have data structured in this way, searching can be done in far fewer steps than a brute-force sequential search. To find Mary, for example, requires 2 comparisons (against John and Tom) rather than 3 (John, Bruce, Tom) which would happen with a straightforward search through records in their order of creation.

Similarly, Zeno only takes 2 comparisons instead of 6. Imagine the savings of time and CPU cycles if there were millions of records in the database! So, for a little processing expended during record creation, we save huge effort for the rest of the life of the database.

Also see: indexing

 

Back to the IT Lecture Notes index

Back to the last page you visited

Created 15 August 2007

Last changed: April 21, 2008 12:23 PM

IT Lecture notes copyright © Mark Kelly 2001-