CommitDBA

Database Management Simplified

Ross Group Inc  MemberzPlus  Blogs  |  News  |  Careers  |  Contact    Twitter CommitDBA LinkedIn CommitDBA

CommitDBA Blogs


See What we are Talking About!

CommitDBA Blog


Frank Wright, CommitDBA Senior SQL Server Database Administrator

Ad Hoc Data Archiving

By: Frank Wright, CommitDBA staff Database Administrator

November 11, 2011


Situation

You’ve been requested to take an ad-hoc backup of a large table before applying a manual update on a subset of rows to fix a logical data corruption problem. The request is made in this manner (vs. just taking a backup of the database) to allow for online access to the original version of the data. The request could come from anyone of the following: User community, IT Management, or your own personal conscience. If you’re a DBA (part/full time) or of a measure twice - cut once mentality, the source is probably the last of the three.

Problem

The table is huge, relatively speaking. You could be working for a company with a large SAN infrastructure with the rows in the table number in the billions or you could be working on a standalone server and the company has no money in the budget for buying more disk(s).

Making an entire copy of the table is a quick and dirty task, but presents problems especially when disk space is at a premium. It makes it easy to look at a snapshot of what the data looked like prior to the update for comparison purposes. As long as the system is locked down it provides an easy method for restoring if a post operation review doesn’t meet with approval. However, if any considerable length of time has passed since the backup and user activity has resumed, a complete table restore is unacceptable as you will lose your most recent activity. It is possible to perform comparisons with only a partial backup as described below. Due to these reasons it doesn’t make sense to make a backup of the entire table if you are only updating a small subset of the records.

Solution & Tips

  1. Obtain the WHERE clause for the update statement you are about to perform and append it to a SELECT statement from the table you want to modify.
  2. Append that to a Create Table As Select (CTAS) statement. For the new table name, take the name of the table you are going to be updating and append a datestamp to the name of the table in the script. (e.g. _20090615). You can alternatively append a ticket number if you have change control processes in place and want to relate the table to a specific ticket. You can always query the metadata to get the creation date of this table.
    1. SQL Server: SELECT A.* INTO <TableA_20090615> FROM <TableA> as A WHERE <x>
    2. Oracle: CREATE TABLE <TableA_20090615> AS SELECT A.* FROM <TableA> as A WHERE <x>
  3. Apply Change.
    1. Run the above statement.
    2. Verify the row count in this table against the expected row count to be updated.
    3. Run the update statement against the originaltable.
  4. Perform Data Comparisons. You will want to examine your updates compared to the original data for validation purposes.
    1. Side By Side Comparison: If this table has a PK then this is a straight forward join. You don’t need any other criteria since the backup version of the table is already restricted to the set of rows that you updated. If only a few columns were updated, this is a fairly quick query to write. Otherwise look to the next example for comparing data

      SELECT A.<PK cols>, A.<ModCol1>,A2.<ModCol1> [, A.<ModCol2>,A2.<ModCol2>]…
      FROM <TableA> as A
      INNER JOIN <TableA_20090615> as A2
      ON (A.<PKcol> = A2.<PKcol> [AND A.<PKcol2> = A2.<PKcol2>]…)

    2. Stacked Comparison: This comparison is easier when you want to compare the entire row regardless of how many columns were updated. The label identifies the source as wellas consistently sorts the sources within all instances of matching keys.

      SELECT ‘1-LIVE’ as label, A.*
      FROM <TableA> as A
      WHERE EXISTS (SELECT 1 FROM <TableA_20090615> as A3
      WHERE A3.<PKcol> = A.<PKcol> [ AND A3.<PKcol2> = A.<PKcol2> ]…)
      UNION ALL
      SELECT ‘2-BKUP’ as label, A2.*
      FROM <TableA_20090615> as A2
      -- the 2-N represents the PK columns and 1 (last) is the label
      --assumes PK columns are front loaded in the column list
      ORDER BY 2[,3,4,…N],1

    3. Full Table Comparison: You may need to validate the data against a full copy of the table. For example there may be a report you need to run and use that to compare the before and after (and you forgot to run the “before” report before you updated the data :0). If you have the query handy that the report uses, you can run that query with a modification to retrieve a before image.

      Report Query excerpt:

      SELECT <cols>
      FROM <TableA> as A
      INNER JOIN <TableB> as B

      WHERE …

      Replace <TableA> in the FROM clause with the following (including parentheses) leaving the alias “as A” intact. This will merge the records from the backup into the results from the live table (minus the records you updated)

      (SELECT A2.*
      FROM <TableA_20090615> as A2
      UNION ALL
      SELECT A1.*
      FROM <TableA> as A1
      WHERE NOT EXISTS (SELECT 1 FROM <TableA_20090615> as A3
      WHERE A3.<PKcol> = A1.<PKcol> [AND A3.<PKcol2> = A1.<PKcol2>]…))

  5. Capture After Image (optional for recovery): You may need to restore the original values long after the system has resumed normal operation. If your table doesn’t have an audit timestamp column (i.e., LastUpdate) you might consider this step as it will be used to see if the rows you want to (re) update with original values have been updated since you applied this manual update. This implementation is borrowed from the concept of Optimistic Locking.
    1. Rerun an alternate version of the CTAS statement from step 2. You might not be able to use the same criteria if any of the columns in your criteria were part of the SET clause (i.e., they were updated). Use the following criteria to match on the Keys

      CREATE TABLE <TableA_20090615_AFT> AS
      SELECT A.* FROM <TableA> as A
      WHERE EXISTS (SELECT 1 FROM <TableA_20090615> as A2
      WHERE A2.<PKcol> = A.<PKcol> [ AND A2.<PKcol2> = A.<PKcol2> ]…)

    2. Later, if requested, run the following update to only revert rows that haven’t been modified since the manual update

      UPDATE A
      SET A.<ModCol1>=A2.<ModCol1>[,A.<ModCol2>=A2.<ModCol2>,…]
      FROM <TableA> as A
      INNER JOIN <TableA_20090615> as A2
      ON (A.<PKcol> = A2.<PKcol> [AND A.<PKcol2> = A2.<PKcol2>]…)
      INNER JOIN <TableA_20090615_AFT> as A3
      ON (A.<PKcol> = A3.<PKcol> [AND A.<PKcol2> = A3.<PKcol2>]…)
      WHERE A.<NonPKCol1> = A3.<NonPKCol1> [AND A.<NonPKCol2> = A3.<NonPKCol2>…]

From database assessments to complete database management, CommitDBA can assist your company today. For more information about how CommitDBA can work for you – give us a call at (800) 734-9304.



Got database support needs?
Professional and Affordable DBA and Data Services

Contact us Today