In Defense of Nulls
Nulls in Relational Database
Nulls represent real world situations - that of missing information.
The situation arises in filling out forms for example.
When individuals are unable or unwilling to enter a requested piece of information,
they resort to various devices - leaving the entry blank (this may also
indicate they didn't even examine the question), entering N/A for Not
Applicable, drawing a horizontal line through the space, etc..
In pre-relational systems, the non-entry of information is often represented
by a blank value or for numeric fields - 0 or -1. In a relational
system, nulls are used.
Some important uses for nulls in relational databases are:
- Nulls provide a systematic, consistent way to deal with missing
data. Nulls have the same characteristics for all domains/types.
Absence of nulls would cause users to invent ad-hoc ways of
representing missing information. This will increase complexity and decrease
usability in a relational system. See the discussion of
Default Values.
- Many important relational features are enhanced by the use of
nulls:
- Foreign Keys - nulls are used for foreign keys that do not
(currently) reference an existing primary key. They are especially
useful in manipulating self-referencing and cross-referencing
foreign keys. Default values are not effective for this use.
- Automatic Defaults - nulls provide a systematic default for
columns that don't have a default value specified, for instance in
SQL INSERT and ALTER TABLE ADD Column.
- Outer Operations - in relational outer operations like outer join
and outer union, nulls provide a standard way to represent
unfilled columns.
- Aggregate Functions - functions like SUM, AVG and MAX can have
consistent behavior in the presence of nulls. Default values and
other techniques can produce inappropriate results from aggregate
functions. For instance, results for the average function are
skewed when the default value for a column is zero. In addition,
null is a more useful result for the average of an empty list of
columns than zero.
- One of the major simplifying features of nulls is the shorthand
notation possible in expressions. The following expression :
returns a logic value of UNKNOWN when a, b or c are null. UNKNOWN
is a part of the 3-valued logic system discussed in the next
section. Without this shorthand, the expression might need to be :
a IS VALID
AND b IS VALID
AND c IS VALID
AND a + b > c
- SQL is here to stay. While the relational model provides robust
facilities for distribution of data within a database system, the
widespread support for SQL enables combining data from heterogeneous
DBMS's, transparently. It is the Lingua Franca of databases. SQL
has some glaring weaknesses especially in processing of 3-valued logic
(see SQL Problems with 3-Valued Logic),
but these problems should
be repaired through evolutionary changes rather than revolutionary
changes.
Finally, nulls have counterparts in other programming paradigms.
Functional programming systems have the concept of undefined or ground,
represented by _|_. 3-valued logic is used with _|_.
3-Valued Logic (3VL)
is discussed in the next section.
Return to Contents Page: In Defense of Nulls
Return to Issues Page
Copyright © 1998 FFE Software, Inc. All Rights Reserved WorldWide