Database Testing: How to Regression Test a Relational Database

by admin on June 22, 2011

Relational database management systems (RDBMSs) often
persist mission-critical data
which is updated by many applications and potentially thousands if not
millions of end users.  Furthermore, they implement important
functionality in the form of database methods (stored procedures, stored
functions, and/or triggers) and database objects (e.g. Java or C#
instances).  The best way to ensure the continuing quality of these
assets, at least from a technical point of view, you should have a full
regression test suite which you can run on a regular basis. 

In
this article I argue for a fully automated, continuous regression
testing based approach to database testing.  Just as agile software
developers take this approach to their application code, we should also
do the same for our databases.

 

 

Table of Contents

  1. Why test an RDBMS?

  2. What should we test?

  3. When should we test?

  4. How should we test?

  5. Who should test?

  6. Introducing database testing into
    your organization

  7. Database testing and data
    inspection

  8. Best practices

 

1. Why Test an RDBMS?

There are several reasons why you need to develop a
comprehensive testing strategy for your RDBMS:

  1. Data is an important corporate asset
    Doesn’t it make sense to invest the effort required to validate the quality
    of data via effective testing?  My July 2006

    survey into the current state of data management
    indicates that 95.7% of
    respondents believe that data is a corporate asset.  Yet of them only
    40.3% had a database test suite in place to validate the data and of those
    without a test suite only 31.6% had even discussed the concept.

  2. Mission-critical business functionality is implemented in
    RDBMSs
    .  In the

    survey
    ,  63.7% of respondents indicated that their organizations
    did this, but of those only 46% had regression tests in place to validate
    the logic. Shouldn’t we be doing better?

  3. Current approaches aren’t sufficient.  The current state of
    the art in many organizations is for data professionals to control changes to
    the database schemas, for developers to visually inspect the database
    during construction, and to perform some form of formal testing
    during the test phase at the end of the lifecycle.  Unfortunately,
    none of these approaches prove effective.  Application developers will
    often go around their organization’s data management group because they find
    them too difficult to work with, too slow in the way they work, or sometimes
    they don’t even know they should be working together.  The end result
    is that the teams don’t follow the desired data quality procedures and as a
    result quality suffers.  Although visual inspection

    of query results is a good start it is little more than a debugging technique in
    practice that will help you to find problems but not prevent them.  Testing late in the lifecycle is better than nothing, but as Barry
    Boehm noted in the early 80s it’s
    incredibly
    expensive to fix any defects
    you find at that point.

  4. Testing provides the concrete feedback required to
    identify defects
    .  How do you know how good the quality of your
    source data actually is without an effective test suite which you can run
    whenever you need to?

  5. Support for evolutionary development.  Many

    evolutionary development
    techniques, in particular
    database
    refactoring
    , are predicated upon the idea that it must be possible to
    determine if something in the database has been broken when a change has
    been made.  The easiest way to do that is to simply run your regression
    test suite.

 
Uncomfortable Question:

Isn’t it time that we stopped talking about data
quality and actually started doing something about it?

Here’s a few interesting questions to ask someone who isn’t convinced that
you need to test the DB:

  1. If you’re implementing code in the DB in the form of stored procedures, triggers,
    … shouldn’t you test that code to the same level that you test your app code?
  2. Think of all the data quality problems you’ve run into over the years.
    Wouldn’t it have been nice if someone had originally tested and discovered those
    problems before you did?
  3. Wouldn’t it be nice to have a test suite to run so that you could determine
    how (and if) the DB actually works?
     

I think that one of the reasons that we don’t hear much about database
testing is because it is a relatively new idea within the data community. Many
traditional data professionals seem to think that testing is something that
other people do, particularly test/quality assurance professionals, do. This
reflects a penchant for

over-specialization
and a
serial approach towards
development
by traditionalists, two ideas which have also been shown to be
questionable organizational approaches at best.

 

2. What Should We Test?

Figure 1 indicates what you should
consider testing when it comes to relational databases.  The diagram is drawn
from the point of view of a single database, the dashed lines indicate
threat
boundaries
, indicating that you need to consider threats both within the
database (clear box testing) and at the interface to the database (black box
testing).  Table 1
lists the issues which you should consider testing for both internally within
the database and at the interface to it. 
For details, read the article
What To Test in an
RDBMS
.

 

Figure 1. What to test.

 

Table 1.  What to test in an RDBMS.

Black-Box Testing at the Interface White/Clear-Box Testing
Internally Within the Database
  • O/R mappings (including the meta data)
  • Incoming data values
  • Outgoing data values (from queries, stored
    functions, views …)
  • Scaffolding code (e.g. triggers or updateable
    views) which support refactorings
  • Typical unit tests for your stored procedures,
    functions, and triggers
  • Existence tests for database schema elements
    (tables, procedures, …)
  • View definitions


  • Referential integrity (RI)
    rules
  • Default values for a column
  • Data invariants for a single column
  • Data invariants involving several columns

 

 

3. When Should We Test?

Agile software developers take a test-first approach to development
where they write a test before you write just enough production code to fulfill
that test. 
The
steps of test first development (TFD) are overviewed in the UML activity diagram of
Figure 2.  The first step is to quickly add a test,
basically just enough code to fail.  Next you run your tests, often the
complete test suite although for sake of speed you may decide to run only a
subset, to ensure that the new test does in fact fail.  You then update
your functional code to make it pass the new tests.  The fourth step is to
run your tests again.  If they fail you need to update your functional code
and retest.  Once the tests pass the next step is to start over.

 

Figure 2. The Process of Test First Development (TFD).


 

Test-driven
development (TDD)
is an
evolutionary approach to development which combines test-first development and

refactoring
. 

When an agile software developer goes to implement a new feature, the first
question they ask themselves is “Is this the best design possible which enables
me to add this feature?”  If the answer is yes, then they do the work to
add the feature.  If the answer is no then they refactor the design to make
it the best possible then they continue with a TFD approach.  This strategy
is applicable to developing both your application code and your database schema,
two things that you would work on in parallel.

When you first start following a TDD approach to development you quickly
discover that to make it successful you need to automate as much of the process
as possible?  Do you really want to manually run the same build script(s)
and the same testing script(s) over and over again?  Of course not. 
So, agile developers have created OSS tools such as
ANT,

Maven, and Cruise Control
(to name a few) which enable them to automate these tasks.  More
importantly, it enables them to automate their

database testing script into the build procedure itself.

Agile developers realize that testing is so important to their success that
it is something they do every day,
not just at the end of the lifecycle.  They test as often and early as possible, and better yet
they test first.  As you can see with the
agile system development
lifecycle (SDLC)
of Figure 3 testing is in fact
something that occurs during the development and release cycles, not just during
release.  Furthermore, many agile software developers realize that you can
test more than just your code, you can in fact validate every work product
created on a software development project if you choose to.  This
philosophy is exemplified by the
Full
Lifecycle Object-Oriented Testing (FLOOT) Methodology
.

 

Figure 3. The Agile Lifecycle.


 

4. How to Test

Although you want to keep your database testing efforts as simple as
possible, at first you will discover that you have a fair bit of both learning
and set up to do.  In this section I discuss the need for various
database sandboxes in which people will test:
in short, if you want to do database testing then you’re going to need test
databases (sandboxes) to work in.  I then overview how to

write a database test and more importantly
describe setup strategies for database tests
Finally, I overview several database testing tools which
you may want to consider.

 

4.1 Database Sandboxes

A common


best practice on
agile teams is to ensure that developers have their own “sandboxes” to work in.  A sandbox is basically a technical environment whose
scope is well defined and respected.
 
Figure 4 depicts the various types of sandboxes
which your team may choose to work in.  In each sandbox you’ll have a copy
of the database.  In the development sandbox you’ll experiment, implement
new functionality, and refactor existing functionality, validate your changes
through testing, and then eventually you’ll promote your work once you’re happy
with it to the project integration sandbox.  In this sandbox you will
rebuild your system and then run all the tests to ensure you haven’t broken
anything (if so, then back to the development sandbox).  Occasionally, at
least once an iteration/cycle, you’ll deploy your work to the level (demo and
pre-production testing), and rerun your test suite (including database tests)
each time that you do so to ensure that your changes integrate with the
changes made by other developers.  Every so often (perhaps once every six to twelve
months) into production. The primary advantage of sandboxes are that they help to reduce the risk
of technical errors adversely affecting a larger group of people than is
absolutely necessary at the time. 

Figure 4. Sandboxes.

 

4.2 Writing Database Tests

There’s no magic when it comes to writing a database test, you write them
just like you would any other type of test.  Database tests are typically a three-step process:

  1. Setup the test.  You need to put your database into a known
    state before running tests against it.  There are
    several strategies for doing so.
  2. Run the test.  Using a database regression
    testing tool
    , run your database tests just like you would run your
    application tests.
  3. Check the results.  You’ll need to be able to do “table
    dumps” to obtain the current values in the database so that you can compare
    them against the results which you expected.

The article
What To Test in an
RDBMS
goes into greater detail.

 

4.3 Setting up Database Tests

To successfully your database you must first know the exact
state of the database, and the best way to do that is to simply put the database
in a known state before running your test suite.  There are two common
strategies for doing this:

  1. Fresh start.  A common practice is to
    rebuild the database, including both creation of the schema as well as
    loading of initial test data, for every major test run (e.g. testing that
    you do in your project integration or
    pre-production test sandboxes
    ). 

  2. Data reinitialization.  For testing in
    developer sandboxes, something that you should do every time you rebuild the
    system, you may want to forgo dropping and rebuilding the database in favor
    of simply reinitializing the source data. You can do this either by erasing
    all existing data and then inserting the initial data vales back into the
    database, or you can simple run updates to reset the data values.  The
    first approach is less risky and may even be faster for large amounts of
    data.

An important part of writing database tests is the creation
of test data.  You have several
strategies for doing so:

  1. Have source test data. 
    You can maintain an external definition of the test data, perhaps in flat
    files, XML files, or a secondary set of tables.  This data would be
    loaded in from the external source as needed.

  2. Test data creation scripts. 

    You develop and maintain scripts, perhaps using data manipulation language (DML)
    SQL code or simply application source code (e.g. Java or C#), which does the
    necessary deletions, insertions, and/or updates required to create the test
    data.

  3. Self-contained test cases. 
    Each individual test case puts the database into a known state required for
    the test. 

These approaches to creating test data can be used alone or
in combination.  A significant
advantage of writing creation scripts and self-contained test cases is that it
is much more likely that the developers of that code will place it under configuration
management (CM) control
.  Although
it is possible to put test data itself under CM control, worst case you generate
an export file that you check in, this isn’t a common practice and therefore
may not occur as frequently as required.  Choose
an approach that reflects the culture of your organization.

Where does test data come from?  For unit testing, I prefer to
create sample data with known values.  This way I can predict the
actual results for the tests that I do write and I know I have the
appropriate data values for those tests.  For other forms of
testing — particularly load/stress, system integration, and function
testing, I will use live data so as to better simulate real-world
conditions.

 
Beware Coupling:

One danger with database regression testing, and
with regression testing in general, is coupling between tests.
If you put the database into a known state, then run several
tests against that known state before resetting it, then those
tests are potentially coupled to one another.  Coupling
between tests occurs when one test counts on another one to
successfully run so as to put the database into a known state
for it.  Self-contained test cases do not suffer from this
problem, although may be potentially slower as a result due to
the need for additional initialization steps.

 

4.4 What Testing Tools Are Available?

I believe that there are several critical features which you need to
successfully test RDBMSs.  First, as Figure 1

implies you need two categories of database testing tools, one for interface tests
and one for internal database tests.  Second, these testing tools should support the language that you’re developing
in.  For example, for internal database testing if you’re a Microsoft SQL Server developer, your T-SQL
procedures should likely be tested using some form of T-SQL framework. 
Similarly, Oracle DBAs should have a PL-SQL-based unit testing framework. 
Third, you need tools which help you to put your database into a known state,
which implies the need not only for test data generation but also for managing
that data (like other critical development assets, test data should be under

configuration management control
).

To make a long story short, although we’re starting to see a glimmer of hope
when it comes to database testing tools, as you can see in
Table 2
, but we still have a long way to go. Luckily there are some good tools being developed by the open source software
(OSS) community and there are some commercial tools available as well. 
Having said that, IMHO there is still significant opportunity for tool
vendors to improve their database testing offerings.

Table 2. Some database testing tools.

Category Description Examples
Unit testing tools Tools which enable you to regression test your database.
Testing tools for load testing Tools simulate high usage loads on your
database, enabling you to determine whether your system’s architecture
will stand up to your true production needs.
Test Data Generator Developers need test data against which to validate
their systems.  Test data
generators can be particularly useful when you need large amounts of data,
perhaps for stress and load testing.

 

5. Who Should Test?

During development cycles, the primary people responsible for doing database
testing are
application developers
and
agile DBAs
They will typically pair together, and
because they are hopefully taking a TDD-approach to
development the implication is that they’ll be doing database unit testing on a
continuous basis.  During the release cycle your testers, if you have any,
will be responsible for the final system testing efforts and therefore they will
also be doing database testing.

The role of your
data
management (DM) group
, or IT management if your organization has no DM
group, should be to support your database testing efforts.  They should
promote the concept that database testing is important, should help people get
the requisite training that they require, and should help obtain
database testing tools for your organization.  As you
have seen, database testing is something that is done continuously by the people
on development teams, it isn’t something that is done by another group (except
of course for system testing efforts).  In short, the DM group needs to
support database testing efforts and then get out of the way of the people who
are actually doing the work.

 

6. Introducing Database Regression Testing into Your Organization

Database testing is new to many people, and as a result you are likely to
face several challenges:

  1. Insufficient testing skills. This problem can be overcome through
    training, through pairing with someone with good testing skills (pairing a DBA
    without testing skills and a tester without DBA skills still works), or simply
    through trial and error. The important thing is that you recognize that you need
    to pick up these skills.
  2. Insufficient unit tests for existing databases. Few organizations have
    yet to adopt the practice of database testing, so it is likely that you will not
    have a sufficient test suite for your existing database(s). Although this is
    unfortunate, there is no better time than the present to start writing your test
    suite.
  3. Insufficient database testing tools. As I said earlier, we still
    have a way to go with respect to tools.
  4. Reticent DM groups.  My experience is that some data
    management (DM) groups may see the introduction of database regression testing, and agile
    techniques such as test-first development (TFD) and

    refactoring
    , as a threat.  Or, as my July 2006 “state
    of data management
    ” survey shows, a large percentage of organizations
    are not only not doing any database testing at all they haven’t even
    discussed it.  For many in the data management community the idea of
    doing database testing is rather new and it’s simply going to take a while
    for them to think it through.  I’m not so sure that you should wait to
    do such obvious
    process improvement.

In general, I highly suggest that you read my article
Adopting Evolutionary/Agile Database Techniques
and consider buying the book
Fearless Change
which describes a pattern language for successfully implementing change within
organizations.

 

7. Database Testing and Data Inspection

A common quality technique s to use data inspection tools to examine existing
data within a database.  You might use something as simple as a SQL-based
query tool such as DB Inspect to select
a subset of the data within a database to visually inspect the results. 

For example, you may choose to view the unique values in a column to determine
what values are stored in it, or compare the row count of a table with the count
of the resulting rows from joining the table with another one.  If the two
counts are the same then you don’t have an RI problem across the join.

As Richard Dallaway
points out, the problem with data inspection is that it is often done manually
and on an irregular basis.  When you make changes later, sometimes months
or years later, you need to redo your inspection efforts.  This is costly,
time consuming, and error prone.

Data inspection is more of a debugging technique than it is a testing
technique.  It is clearly an important technique, but it’s not something
that will greatly contribute to your efforts to ensure data quality within your
organization.

 

8. Best Practices

I’d like to conclude this article by sharing a few database testing “best
practices
” with you:

  1. Use an in-memory database for regression testing
    You can dramatically speed up your database tests by running them, or at
    least portions of them, against an in-memory database such as
    HSQLDB.  The challenge with this
    approach is that because database methods are implemented differently across
    database vendors that any method tests will still need to run against the
    actual database server.

  2. Start fresh each major test run.  To ensure
    a clean database, a common strategy is that at the beginning of each test
    run you drop the database, then
    rebuild it from scratch taking into account all
    database
    refactorings
    and transformations to that point, then reload the test data,
    and then run your tests.  Of course, you wouldn’t do this to your
    production database.  😉

  3. Take a continuous approach to regression testing
    I can’t say this enough, a
    TDD approach to
    development is an incredibly effective way to work.

  4. Train people in testing.  Many developers
    and DBAs have not been trained in testing skills, and they almost certainly
    haven’t been trained in database testing skills.  Invest in your
    people, and give them the
    training
    and education
    they need to do their jobs.

  5. Pair with novices with people that have database testing experience
    One of the easiest ways to gain database testing skills is to pair program
    with someone who already has them.

 

#SQASolutionShare on FacebookShare on Google+Share on LinkedInTweet about this on TwitterEmail this to someone

Previous post:

Next post: