IT Lecture Notes by Mark Kelly, McKinnon Secondary College

Databases for Dunderheads

To avoid confusion, I'll use the term Database Management Software (DBMS) to refer to database applications such as Access, Filemaker, SQL.
The products of a DBMS (such as a music database or a client database) will be referred to as a database.

Also see my recorded tutorial on how to use Filemaker Pro to create a relational database.
Favourite databases - Access and Filemaker Pro

 

Database Management Software (DBMS) are programs that let you

  • store
  • search
  • sort
  • report on

large quantities of data. Modern business would be impossible without the data-managing power of the modern electronic database.

Popular databases used by average citizens include:

  • Microsoft Access
  • Filemaker Pro

Less popular (yet) , but free include:

More technically-minded people often use SQL (Structured Query Language) products such as the free and powerful MySQL. Other brands are Foxpro, dBase.

There are industrial-strength corporate databases with massive capacity and huge numbers of simultaneous users ; such million-dollar database manufacturers include Oracle, Lotus, Sybase.

You are already on a dozen or so databases, starting when you were born and increasingly as you deal with organisations. Let's look at some data basics...

DATABASICS

 

The concept of a database is really simple: it's a collection of data in a table.

The simplest database is a list of your friends...

First name Last name Birthday Phone number
Alice Alabaster 1 June 1990 9345 2345
Brenda Beetle 32 May 1992 9643 0948
Cherie Carrot 5 August 1991 9348 4953
Donna Doughnut 15 January 1993 0402 394 495

Each column contains a particular type of data for each person (e.g. for any row of this database, column 1 will always contain the person's first name.)

Each column corresponds to a database field. Of course, you must put the right type of data in the right field : putting a person's birthday in the first name column would make the database useless.

A full collection of fields about a single person (in other words, each row of the table) is called a record. So Alice's data is stored in record 1.

Database rule 1 - each field contains the same type of data (e.g. phone number) in every record.

Capabilities and limitations of databases

Capabilities: DBMS can

  • quickly and accurately search massive amounts of data for records that meet any desired criteria.
  • quickly and easily sort data on multiple fields
  • support many data types (including multimedia and binary files such as pictures, sounds, movies, Word documents, PDF)
  • validate data to ensure only reasonable data is entered
  • create summary statistics across many records (e.g. totals, averages)
  • produce output onscreen or on paper with great flexibility and precision
  • produce output record-by-record. Unlike spreadsheets, databases can easily produce (for example) a printed receipt or certificate for each record in a table. Spreadsheets have to be beaten around the head to attempt that, and even then they do it poorly.
  • correlate data from different sources and make connections between data sources
  • store many kinds of data, including multimedia
  • calculate new information from existing data using formulas similar to a spreadsheet's
  • use hundreds of built-in functions to do accounting, statistics, mathematics, text/date operations etc
  • convert data to and from many different formats (e.g. CSV or Excel input, PDF or email output)
  • use scripting/programming to automate tedious tasks such as producing reports
  • produce record-by-record output (e.g. certificates, information sheets, invoices) which spreadsheets are hopeless at because spreadsheets cannot cycle though individual records and produce independent output for each one
  • be used by many people simultaneously over a network or even internationally over the internet
  • use hierarchical levels of security to allow different users various levels of access depending on their needs

Limitations: DBMS have trouble with

  • getting created. Much skill, thought and planning is required to even get a database started
  • data of variable length. The maximum lengths of text fields must often be guessed in advance, potentially leading to wasted space or truncated (chopped-off) data
  • massive data stores. As databases increase in size, they work more slowly and require more storage and bandwidth
  • disastrous data loss. Since all of one's data is in one location, a single accident can result in the loss of all data
  • data theft - stealing huge quantities of data, even from remote locations, is much more likely than someone stealing several tonnes' worth of paper records
  • security - strong password security needs to be used effectively to prevent unauthorised access to data
  • flexibility - once data tables are defined, it can be quite laborious to change their structure. Unlike with a spreadsheet, doing ad hoc ('spur of the moment') calculations and data 'fiddling' in a database is not easy.
  • data corruption - it does not take much disk damage to render an entire database unreadable
  • producing charts and graphs. Filemaker has no graphing facilities.
  • others?

For those who always get rows and columns confused, remember...

COLUMNS go up and down like the COLUMNS that hold up a roof. They are database fields.

ROWS go from side to side like ROWS of seats. They are database records.


Quick Quiz - why are the people's first name and surname stored separately in different fields?
Answer: So you can search and sort by surnames. If names were stored as a single field e.g. "Sharon Shingle" is is effectively impossible for the database to find the surname.

If names were stored in a single field and sorted, you get a useless list like this, because the names are sorted by first name rather than surname...

  1. Adam Zebra
  2. Bruce Yak
  3. Charlie Xylophone

Storing the name parts separately also allows you to create reports that say things like "Dear Sharon" or "Dear Miss Shingle" (rather than nasty constructions like "Dear Sharon Shingle" using the whole name.)

Database Rule 2 - store people's first and last names in separate fields. If a title is stored (e.g. "Miss", "Doctor", "Sir") it goes into a third separate field.

Database Rule 3 - only store one piece of data in a field. Don't combine different data in one field (e.g. addresses should be stored in 3 fields - number + street, suburb, postcode.)

Street Suburb Postcode Country
4 Paradox Crt McKinnon 3204 Australia

When you create a new database, you have to define its fields before you're allowed to enter data. After all, where can the data be stored without fields?

Typically, you have to tell the DBMS the following information about its fields:

1. The field name e.g. firstname, DOB, PayRate

Field naming tips:

  • make fieldnames descriptive and not cryptic (e.g. WTPL is a bad name for "Where The Person Lives", but DOB is a common abbreviation for Date Of Birth.
  • don't make field names too long - it just invites a subtle typing error that will cause endless grief.
  • use a consistent style e.g. all global fields begin with "g_" so they're easy to identify, multiword field names use Capitals or underscores to divide them up e.g. DateOfBirth or Date_Of_Birth. Once you land on a style you're comfortable with, stick to it!
  • Don't use spaces in field names, even if the DBMS lets you. It's a bad habit to get into because just about everywhere else in the programming world, spaces are forbidden in object names and website filenames.

2. The field type: commonly text, numeric, date/time, Boolean or calculated. Specifying the field type lets the DBMS efficiently allocate storage space.

Type tips:

  • Boolean (pronounced BOOL-e-yun) stores true/false, yes/no, on/off, active/inactive sort of data with only 2 possible values. It's very efficient to store.
  • DBMSs differ in what options they allow when storing numbers. Some just allow "numeric", others let you be much more specific with numeric subtypes such as:
    • long integer - very large whole numbers (i.e. no fractional part)
    • short integer - whole numbers from about -32767 to +32767
    • byte - whole numbers from 0-255
    • single precision - numbers with up to 7 decimal places
    • double precision - numbers with up to 14 decimal places
    • currency - 2 decimal places.
  • As you might have guessed, the more complicated the number, the more storage space it requires. A byte field takes one byte to store; a double precision number might take seven bytes to store. Getting the numeric type right can dramatically reduce the size of a database.
  • Filemaker Pro offers the container field type which is really handy for storing pictures, sounds, movies etc.
  • Filemaker also offers repeating fields which creates multiple storage slots within a single field - e.g. a PhoneNumber field can be created as a field with 4 repetitions so for each person, up to 4 phone numbers can be stored.
  • Filemaker also offers global fields

3. Text field length - If a field is specified as type text, you need to specify a maximum length it can be.

Text length tips:

  • The default length is often 50 characters, but don't be lazy and always accept the default. If you know the field is always going to store a 7-character student ID, don't waste 43 characters in every record with a lazy length setting.
  • If you need a variable length text field, especially one that could be quite large, look for the memo field type.
  • When setting lengths of text fields, do it carefully: whatever field size you set will be hard-wired into the database and you will never be able to enter text that is longer than the length you give. Don't be too stingy and specify a maximum length of 15 characters for a surname field: Mrs Xanthoudakis-Papadopolous would get her name chopped in half.
  • Such decisions are worth doing research to discover what typical and atypical text lengths you will encounter as you acquire data.
  • You don't enter lengths for date, time or numeric fields because their lengths are automatically determined by their type.

4. Validation options - you can set rules about how the value is entered. These typically include setting that the field:

  • must be between value X and value Y (i.e. a range check)
  • must be numeric, date, time etc (i.e. a type check)
  • must not be left empty (i.e. an existence check)
  • must be unique ( e.g. an ID number)
  • must exist in a limited list (i.e. must be a pre-approved value)
  • must be a certain length

Different DBMSs offer luxury options such as automatically entering values for new fields in new records such as a serial number, a timestamp, the current date, the value from the previous record, fixed data (e.g. auto-setting the customer's country to "Australia" since that will be the case for 99.9% of your customers).

 

 

Planning a database

As you might have guessed already, creating a database takes some thought. It would be a big pain to create a database with a thousand people's names in one field, then decide to separate them into two fields, and have to go through 1000 names cutting and pasting.

Plan the structure first, get it right, then start entering the data. It's a bit like building a house - you don't want to have to tear down a half-built house because you got the foundations wrong.

A very handy tool for creating databases is a data dictionary. Another handy tool, for defining calculated fields, is an IPO chart. Details of these design tools are here.

 

TABLES

 

In database talk, a table like the ones shown above is (surprisingly) called a table.

A database needs at least one table. A database with a single table is called a flat-file database. Sometimes that's all you need, but databases often grow...

Let's consider a business that buys parts, builds things and sells those things to customers.

They know that storing data in an electronic database is far better than keeping paper records so they create a single database to record purchases by customers like so...

Customer Address Quantity Goods ordered Date
Goldfish Pty Ltd 3 Kitten Rd 4 16mm Woofle Valves 1 Dec 2006
Supernova Ltd 34 Industry St 300 White nylon widgets 2 Dec 2006
Goldfish Pty Ltd 3 Kitten Rd 14 16mm Woofle Valves 7 Dec 2006

Already you should be able to see a problem developing... lots of repetition. The same customer name and address is being repeated, even the name of the ordered goods is repeating. This can lead to really bad wasted space, making the database much bigger than it needs to be.

A more serious problem is - what happens if Goldfish Pty Ltd changes its address? To keep the database up to date and accurate, you would need to change every occurrence of their address in the database. This could be a massive and tiresome job, and any missed occurrence could result in orders not being found, or being sent to the wrong place.

So, what to do? Here's the way you would probably do it if you were creating a pen and paper list...

ORDERS

Customer ID Quantity Goods ordered Date
00001 4 16mm Woofle Valves 1 Dec 2006
00002 300 White nylon widgets 2 Dec 2006
00001 14 16mm Woofle Valves 7 Dec 2006

 

CUSTOMERS

ID Customer Address
00001 Goldfish Pty Ltd 3 Kitten Rd
00002 Supernova Ltd 34 Industry St

Now, when an order comes in, you look up the customer in the customers table, find their ID and enter that in the orders table. Notice that the repetition in the orders table has been largely removed. Also, each customer's details are recorded only once in one place - the customers table. If a customer's address changes, for example, it only has to be changed once - in the customers table.

But how do you know that a customer in the orders table matches a customer in the customers table? You tell the database that a field in one table is related to (or matches) a field in another table.

By telling this two-table database that a customer ID in the orders table is related to a given ID in the ID field in the customers table, the database can join them together. It knows that the ID "00002" in the orders table refers to Supernova Ltd because "00002" in the ID field of the customers table lines up with "Supernova Ltd".

Notice we are using the ID field in both tables to create the relationship. An ID in one table always exactly matches the same ID in the other table. This ID field is therefore known as a key field, and it has to be unique. That means that every customer must have an ID all to itself that it will always keep and never be used for any other customer. This is why on most bills you will see a customer number or account number - it identifies you uniquely in their database and regardless of how many "Fred Smiths" there are in the database, each can be uniquely identified by the table's key field: the unique identifying customer account number.

Database rule 4 - key fields must be unique. A value in a key field (e.g. 00002) may only appear once in the entire table.

When you have a database like this with two or more related tables, you have (drumroll please...) a relational database!

The process of removing duplication of data by creating related tables is called normalisation. Don't ask me why. Just remember it.

Is our sample database fully normalised yet? Has all repetition been removed? Not quite... look at the goods ordered field. There's repetition there too that can be removed by ... yep. Another related table.

Let's create a new table called products and every product we sell is listed once, and only once. We need another key field to match a product in the orders table with a product in the products table. We'll call it ProdID.

ORDERS

Customer ID Quantity Product ID Date
00001 4 V00001 1 Dec 2006
00002 300 W0009 2 Dec 2006
00001 14 V00001 7 Dec 2006

 

CUSTOMERS

ID Customer Address
00001 Goldfish Pty Ltd 3 Kitten Rd
00002 Supernova Ltd 34 Industry St

 

PRODUCTS

ProdID Product Description
V00001 16mm Woofle Valves
W00009 White nylon widgets

Ah! How nice it is to be fully normalised! No unnecessary repetition! Using the orders table, you can get all the information you need about an order by looking up the customer's details in the customers table (using the ID field to match customers between the tables) and you get product info by looking up the product's ProdID in the products table. Easy as pie!

Tip: Notice that the field in the orders table is called "Product ID" but the matching field in the products table is called "prodID"? This is perfectly OK, but some people like to use the same field name in both related tables just to make it clear that the fields are in fact related.

Note: Microsoft's Access database is so concerned about having unique key fields in tables that when you create a table it insists on you nominating a key field that is unique for every record in the table! If you choose not to name one, it creates one for you! Access is a demanding wife when it comes to relationships!

Still with me? Good. Let's wander onward...

TIP- relational is not always best! Creating a relational database can be quite tricky and long-winded. Sometimes for simple situations a single-table flat-file database may suffice. But flat file databases have distinct limits. When many data sources need to be accessed to create a complete item, you really need a relational database.

DEFINING A TABLE

 

When you create a table in a database, you have to tell the database some stuff so it can create the fields. These can be called field properties. Typical properties you define are

FIELD PROPERTY Examples
name DateOfBirth, surname, IsRegistered
type text, numeric, picture, yes/no, memo, date, time,
maximum length* surname - 50 characters, sex - 1 character
validation rules must be present, must be numeric, must be between 1 and 100
default value

"Australia" for country. "Female" for patient's sex in a maternity ward's database.

calculated**

Payment_Due = Date_Now + 14

Full_Name = Title & " " & First_Name & " " & Surname

 

*Field Length - some field types have automatic lengths, such as Date, Byte, Short Integer and Yes/No data types. Others, such as text, can vary. Setting a maximum length for a field's data lets the database store data more efficiently and not assign lots of disk space for tiny pieces of data.

**Calculated fields are not entered as data - they are automatically calculated by the database using existing date, much like Excel can calculate new values based on existing values. You need to enter a formula to make the calculation happen. The formulas actually look a lot like those in Excel or Visual BASIC. (The "&" operator joins pieces of text together to create new text values.) It is really calculated fields that give databases their massive power and usefulness.

 

ACCESS DBMS

Access databases contain objects called queries and reports.

  • Queries are stored search requests (e.g. find people with a shoe size of 6, sort them by surname then firstname, and calculate a new field called 'age' from their date of birth).
  • Reports define how information should be set out when printed or displayed onscreen. e.g. 'This text goes in the header (to be repeated on every printed page). This field will appear here with this size, type face, number of decimal places etc. There will be a box around these items, and in the footer there will be the page number in the middle of the page.'
  • Forms are similar to reports, but define data input screens
  • Macros are recorded programming steps to automate tasks. Macros can be recorded from the keyboard, but more powerful ones must be written in a programming language similar to Visual Basic.

FILEMAKER PRO DBMS

 

Filemaker Pro databases approach similar tasks slightly differently.

  • Finds - Filemaker Pro's data searches are 'ad hoc' - i.e. they can be done instantly on the spot without having to create and save and run a query. By entering 'Find Mode' and filling in what information should be found in which fields, the database finds matching records and hides the others. This is called the 'Found Set' and further operations apply only to the found set (e.g. deleting, printing, averaging field values). Finds can be stored in scripts for automated searching later (e.g. a script can be created to find males over 18). Buttons can easily be created to then run the script, making it easier for unskilled users.
  • Layouts are Filemaker's way of combining reports and forms. A layout can be used for laying out printed and/or on-screen data. While some layouts work equally well as printed output or onscreen output, and they also define the layout of data input forms, often layouts are designed for either printing, or on-screen outout, or data input. e.g. a printing layout might be designed to fit on an A4 page with no colour or shading. The data input screen might use drop-down lists, tabbed controls etc which would not be much use when printed.
  • Scripts - a major attraction of Filemaker for most people is the ease of automating tasks with scripts. Using simple drag and drop and filling in options creates powerful scripts. In later versions of Filemaker, one can create and use variables (local and global) and programatically access repeating fields (which are like arrays).

Back to the IT Lecture Notes

Back to the last page you visited

Created March 29, 2006

Last changed: April 30, 2008 10:20 AM

IT Lecture notes (c) Mark Kelly, McKinnon Secondary College