| This article is also published (with modifications) on Database Debunkings under the title, On Jim Gray's "Call to Arms" |
This response examines a number of his points beginning with his indictment
of relational technology and attempts to separate reality from opinion.
This is a daunting task since Gray's article tries to encompass most
aspects of current computing techniques.
Rather than cover all his proposals, this article will concentrate on
several salient items while touching on a few other topics.
The major SQL DBMS vendors pay little more than lip service to SQL Standards
and, more importantly, virtually ignore the principles of the Relational Model.
SQL itself is notorious for weaknesses in its relational fidelity,
examples include poor primary key and domain support and the Exists bug.
SQL92 did make some improvements in relational compliance over SQL89 and SQL86
(but didn't fix the above problems.)
However, SQL99 and later standards made a sharp turn away from relational
concepts in introducing features that are contrary to the relational model.
The major SQL vendors have numerous incompatibilities with the SQL Standard
in their products. For example, both Oracle and SQL Server treat an empty
string as a database null.
The major open source SQL DBMS is even worse, almost defiant about
non-conformance.
Rather than "polishing the round ball" (in the words of Michael
Stonebraker), the major vendors have concentrated on marketing, differentiation,
non-database features and quirky optimizers requiring vendor-specific tuning
that lock customers into their proprietary solutions.
Improved relational compliance is the lowest priority.
In ComputerWorld in the middle '80s, Ted Codd published his 12 rules to
measure relational fidelity. Vendors have mostly ignored these guidelines and
made little movement towards complying with them.
Only one or two smaller vendors have the goal of moving their implementations
towards improved relational capabilities.
Yet full compliance or, at the least, better compliance with the relational
model offers true solutions to many of the issues that Gray raises.
He disregards this, instead treating the weakness of current implementations
as evidence of the failings of the relational model.
He offers XML and XQuery as alternatives, supposedly based on 'developer'
preferences. This is a rash judgment, not supported by current experience.
XML and XQuery are insolubly flawed. XML is just another incarnation of the
discredited Hierarchical Data Model.
XQuery brings excessive complexity.
A relational query language uses only method to access information --
by value.
XQuery has 3 -- by value, by navigating the hierarchy and through external
linkage. It is 3 times as complex.
XML is certainly appropriate for data transfer between systems, for streaming
operations. XML is weak as a medium for persistent storage of data, except
perhaps for property sets.
When used for more complex data, data that must be accessed and manipulated
in a variety of ways, the problems mentioned above in connection with
XQuery become overwhelming.
Note also the XML exhibits a strong dichotomy between data and algorithms.
Perhaps Gray is thinking of the fact that RM rejects procedural code for
most database processing and requires declarative interfaces.
There are very good reasons for this stance.
Procedural code is convoluted and obtuse.
It is difficult for a human to verify its correctness and even to understand
its purpose. Machine verification of procedural code is still not possible
and is not likely to occur in the near future.
Another major problem with procedural code is that optimization techniques
yield limited gains in efficiency, even after years of research in this area.
Functional, logic and declarative code allow much greater improvements.
An optimizer for a declarative language can yield
multiple levels of magnitude improvement.
Similarly, declarative is much easier to verify as correct.
Note: the major Object-Oriented Languages -- Java, C#, ..., are primarily
procedural in nature.
The above reasons illustrate why procedural code is generally a poor choice
for binding with the database.
Even so, most SQL DBMSs support stored procedures and triggers written
in procedural languages.
The actual impetus for adding stored procedures and triggers is weaknesses
in SQL as a full declarative, relational language.
This brings the question -- "Why not improve the relational capability
of SQL or invent a better relational query language?".
Gray ignores this choice, leaning of the assumption that we've
polished the round ball enough.
In general, he looks past the power of RM to Object-Oriented and XML query
facilities for solutions, certainly a popular view.
More on this direction later.
Even with an improved relational language, there is a place for stored
procedures and triggers, as well as functions used in expressions.
An Object-Oriented language is not a bad choice as the implementation
language for these capabilities in database systems.
The next section examines the reasons in more detail.
Embedding a procedural style language in the DBMS has a number of potential
uses. Examples include -- logging, soundex, statistical and financial formulas,
spatial and temporal calculations and other domain specific algorithms.
While the DBMS could provide these functions natively, pluggable packages,
including 3rd party ones, are a more flexible arrangement.
There are dangers to this type of embedding, however.
Inappropriate procedures/functions could be resource intensive to the point
of starving other actions in the DBMS, could open security holes, could
crash the system and could engage in improper crosstalk that violates
database principles.
It is responsibility of the DBMS and system designers/administrators to
guard against such circumstances.
One reason for this so-called mismatch is the fundamental distinction between
the concerns of an individual application and those of a shared database
system.
An individual application works within a subset of the total problem space.
The data it manipulates is a private working-set and is transient.
A database system concerns itself with the persistence of shared data and
with maintaining the security, integrity and accessibility of the data.
The database system services a variety of applications, reporting capabilities,
batch processing and ad-hoc access all using the same data.
The upshot is that different applications may use different object
representations of the same data internally.
Gray touts SQLJ (an OO style embedded SQL) as a 'nice' integration of SQL and
Java. The reality is that SQLJ is not so nice and rarely used.
Most OO applications either access the database API and use the values
directly or construct their own objects, or they use Object/Relational
(OR) wrappers like Hibernate for Java.
All this occurs on the client, so the DBMS backend is not involved.
Unfortunately, the special requirements of OR wrappers can sometimes drive
the design of the database, to the detriment of general usability of the
shared resource.
A second solution to combining objects and the database is an
Object/Relational Database Management Systems (ORDBMS).
The integration of OO and RM in such systems vary.
Gary suggests a complete inversion -- records [sic] as vectors of objects
(fields), tables as sequences of record objects and databases as collections
of tables.
This is a real problem because it dilutes/removes the power of RM, substituting
OO which has no viable data model.
A more reasonable structure for an ORDBMS is to integrate at a lower level.
The ORDBMS can use objects to provide complete support for domains, an area
where SQL is lacking.
Domains are extended, complex datatypes for column in addition to the basic
datatypes - character, numeric, date, interval, etc..
The OO language provides the implementation for the extended column datatypes.
Stored procedures, triggers and expression functions would also use the same OO
language for their implementation.
This allows them to manipulate domain objects directly.
Procedures and functions can receive them as arguments and return domain
objects.
In this way, an ORDBMS can provide improved capabilities completely within
the Relational Model.
These languages also offer greater security.
The runtime environment employs a verifier for the intermediate code to
protect against malicious code or inadvertent corruption.
The compiled code runs in a 'sandbox' that prevents interference with other
parts of the DBMS.
The FirstSQL/J ORDBMS is an example of a database system using this approach.
FirstSQL/J uses Java as the OO implementation language for domains,
stored procedures, triggers and expression functions.
For more details on the FirstSQL/J implementation, see
An ORDBMS that is Truly Relational.
There is much room for improvement here.
In-memory storage is a promising technique supported by some newer RDBMSs.
In-memory structures are direct access and can avoid conversion/serialization
to achieve high-performance processing.
This is especially important for systems with no or inefficient
random access storage.
If each participating database system is from the same vendor
(homogeneous databases), they can communicate to achieve some reasonable
optimization plan.
However, this isn't feasible for heterogeneous databases.
A standard optimization dialogue/protocol to support this is not possible
because of divergent, proprietary implementations.
An efficient federation of heterogeneous databases is a pipe dream.
Schema changes are much less common in a relational database.
Relational theory and normalization techniques create flexible
designs that can survive environment evolution.
Should structural changes do become necessary, the normalized design and
relational facilities like views can ease the transition and limit the
effects of the change.
Some applications may require no changes at all.
Is Relational becoming Irrelevant?
Are classic relational database architectures (as represented by SQL DBMSs)
really sagging at the knees? Or, are the major SQL vendors not keeping their
eye on the ball?
Dichotomy between Data and Procedures
Gray claims there is an artificial separation between data and procedures
(or later in his article, algorithms) in 'traditional' relational databases.
He asserts that this dichotomy began with the COBOL language and the
COBOL DBTG (Database Task Group).
He is distorting history and current reality.
The Relational Model (RM) has nothing to do with the COBOL and actually
predates DBTG.
RM is not pure data; it defines powerful operators for manipulating data
expressed in relational algebra and relational calculus languages.
SQL is a type of relational calculus. It is primarily declarative rather than
procedural.
Synergy between Objects and Database
Object-Orientation (OO) is the dominant programming paradigm today, and
the Relational Model (RM) is the dominant one for database systems.
Together, they power most applications in use.
However, many consider this marriage to be in trouble.
They describe the problem as an 'impedance mismatch' between OO applications
and RM or more accurately, SQL database systems.
They usually attribute this to the difference in the way the OO model
represents information -- as a physically linked lattices of objects and
the way RM represents information -- as logically linked tables.
The modern OO languages, such as Java and C#, compile to an intermediate
form which then is compiled to machine code at runtime.
This allows full portability and greater efficiency.
A runtime compiler can analyze usage patterns for improved optimization
and can recompile portions dynamically as it gathers newer statistics.
Other Topics
This section briefly looks at other areas in Gray's article.
Most, but not all, show promise in improving current database systems.
Self-Managing and Always-Up Database Systems
Like many of the concepts mentioned in Gray's article,
self-managing and always-up database systems are not contrary to the
Relational Model.
RDBMSs already exist with these characteristics,
proving again that Gray is overstating his case in claiming
relational database architectures are slowly sagging to their knees.
In fact, relational databases are uniquely able to deal with these situations
because of their logical orientation.
Additional information on this topic can be found in
Highly Available Databases.
Note that the major SQL DBMSs have limited capabilities in this area.
Improved Physical Structures
The Relational Model and SQL Standard define things at a logical level and
are silent about physical details
This allows implementations complete freedom in the physical structures used.
However, current SQL DBMSs tend to use simple structures and access paths
(primarily -- indexing, hashing and sorting).
They use random access disks as the storage medium, converting data to
octets (8 bit bytes) using serialization.
Random access and serialization can be expensive, especially for objects.
Active Databases and Publish/Subscribe Capabilities
An active database is a database system that keeps the client
up-to-date about any changes to queried data as they occur.
For example, the client sends a query to the database selecting and retrieving
a set of data. As long as the client keeps the query open,
it will receive notification of any change (Insert, Update, Delete) made to
the backend database.
A publish/subscribe facilities provides a similar capability -- retrieving
an initial data set and then receiving notifications of changes to the set.
Note: Relational implementations of active databases have existed since
the '90s.
Federated Databases
A federated database system combines 2 or more distinct servers under a
single interface.
A number of implementations have been developed.
A common example of their use is a single query that joins tables from
different databases.
The problem with federated databases is they are not efficient in the general
case, because only limited optimization is possible.
Evolving Schemas
Gray is using an OO mindset when referring to schema evolution.
Because OO lacks a coherent data model, constant restructuring of the object
design becomes almost a necessity.
This is also an essential component in the Extreme Programming (XP)
paradigm.