Let us start with a concrete example. In this example, we are storing data about employees. The data about each employee contains the usual information - name, employee #, social security #, pay rate, date of employment and birthdate.
For a given employee x, all information was available, except date of birth. The birthdate is missing information for employee x. The fact that the date of birth is missing might be filled in on a hard-copy employment form by various ad-hoc methods - 1) leaving the entry blank, 2) drawing a line through the entry, 3) entering the ubiquitous N/A meaning Not Available or Not Applicable.
Applications utilizing early database systems applied similar techniques. Fields with missing values were set to arbitrary ones - blank, zero, etc.. The database systems themselves had no way of representing missing information or even being aware of the existence of missing information. There was no systematic method of dealing with values that were missing, not present. It was incumbent on the application to deal with all aspects of missing data values.
The advent of relational database provided solutiions for processing of missing information. In the relational database model, missing information is represented by a meta-value called a Null. See How Relational solves the Missing Information Problem.
In systems where missing data values are represented by ad-hoc, default values, responses to questions about the data values are at best confusing, at worst simply incorrect. Such a system might represent a missing birthdate with a zero value. If Jerry is an employee whose date of birth is missing and the field is set to zero, then no is the answer to the question - Is Jerry's birthdate after 12/31/59?
To someone making the query and knowing that Jerry just turned 21, this would be obviously incorrect. The answer to the question when data is missing is not yes or no; it should be - I don't know. In other words, the answer to the question is unknown. Answers to queries involving potentially missing information must include yes, no and unknown.
The application had the responsibility to make the necessary tests for missing information in earlier systems. Application support was uneven and often incorrect. The problem is that neither the database system nor the application had ways to deal with answers that are not true or false but unknown.
Relational database solves the problem by providing what is known as 3 Valued Logic. In 3 Valued Logic, answers can be true, false or unknown. This is described in How Relational solves the Missing Information Problem.
Return to Contents Page: Should Nulls be considered harmful? Return to Issues Page