CommitDBA

Database Management Simplified

Ross Group Inc  MemberzPlus  Blogs  |  News  |  Careers  |  Contact    Twitter CommitDBA LinkedIn CommitDBA

CommitDBA Blogs


See What we are Talking About!

CommitDBA Blog


Frank Wright, CommitDBA staff Database Administrator

Strategic Data Archiving

By: Frank Wright, CommitDBA staff Database Administrator

November 24, 2011


Situation

Your company has been running an Order Entry system for close to five years. The production database contains every transaction along with every other element of data ever collected and entered since the system went live. A user could view, if they wanted to, the very first customer and/or order ever entered into the system. Either your business community stated in the past that they want to keep all data or that the topic of retention and archiving was never broached.

Problem

The database continues to grow each year (actually every day). It continues to consume premium disk space providing the highest level of service and speed of access to every bit of data. The application and reporting performance slowly degrade as queries that run tens of thousands times a day are required to access tables and indexes that are continually growing.

The database continues to grow each year (actually every day). It continues to consume premium disk space providing the highest level of service and speed of access to every bit of data. The application and reporting performance slowly degrade as queries that run tens of thousands times a day are required to access tables and indexes that are continually growing.

Every day user activity is usually focused on the most recent data. A CSR, for example, has to query to verify an order just entered or one that was placed a week ago when the customer called to check the status. Reports usually focus on the more recent past, such as a rolling three to 12 month sales report, or on the immediate future, such as forecasting and planning.

As the problem (literally) continues to grow, this not only reduces employee productivity due to reduced response time of the system but also incurs more direct costs. These come in the form of additional hardware purchased for scalability to help combat the problem.

Retention – The level of retention of data for each area/type of data must be determined. This is usually handled by the age of the data and driven by two functional areas: legal and user community. You don’t have to decide on one all inclusive date rule to apply to all areas considered for archiving. For some areas it might be ok to get at data older than one year from an alternate location whereas other areas might require the data to be accessible via the online system going back 5-7 years.

Legal requirements stem mostly from government reporting and lawsuits. I will not delve into legal requirements of data retention so you should probably engage in-house or retained legal counsel for directives in this area. One piece of advice; don’t do something that contradicts anything they stipulate. One good note to consider is that as long as you are just relocating the data and it is still in some way accessible electronically, you probably aren’t violating any legal requirements, but again confirm this with legal counsel.

You have influence in helping to decide retention levels set by business and user requirements, unlike legal requirements which are decided by factors outside your control, as long as they don’t violate legal requirements. You will need to educate your user community on what is considered reasonable retention of data. Users are found on the extremes, either being short-sighted or demanding. Some users may be lackadaisical about retention levels until the first batch of data gets aged off and they now need this data back ASAP. Also they may take an initial stance that absolutely no data is to be archived. You may need to perform significant upfront effort in a non-production environment to show the users that the archived data is still accessible via special reports or enhancing existing reports to include the archived data. Also, you need to show concrete improvements in system response time if older data is archived to get user approval and signoff.

Location – After you have decided how much data to archive, the decision of where to archive it begins. To keep the database from growing you could extract the data to external delimited text files to be kept on a fileserver or on optical storage (cd/dvd). Keeping data on optical storage only would make it extremely inconvenient for users to gain and share access. Optical storage, and other offline storage media, is useful for making a copy and placing in a secured storage area such as a vault or an offsite facility. Text files kept on a shared fileserver eliminates the accessibility problem but doesn’t eliminate other problems. Usually the data is kept in separate files one per archiving batch. This makes it cumbersome to search all these files looking for data about a single customer or any other criteria that would span multiple files. Also this storage methodology doesn’t lend itself to very effective reporting capabilities.

The most effective place to keep the data is inside the database. At minimum, just archiving to a separate table (partition) will allow for performance improvements to the online system, however, you might want to consider segregating this data to a separate physical location. RDBMSs allow for additional databases/filegroups/tablespaces that can be stored on separate disks from the other datafiles in the RDBMS. If you don’t have an infrastructure that provides different tiers of storage then don’t worry about this and just create your archive tables in the same location as your online tables. If you do have the infrastructure, look to place the archive tables on lower tier (i.e., slow and cheap) disk preferably in a RAID 5 configuration. This will free up your premium disk (i.e., fast and expensive) for your online OLTP needs and reduce disk contention when the archived content is queried. Utilizing a RAID 5 configuration will maximize usable disk space while supplying some level of fault tolerance. Once the data is archived it should be read only sothe write performance penalty of RAID 5 won’t factor into the equation when users access this data.

Techniques – If you can procure a significant budget for this data archiving project, you might want to investigate some third party tools. They facilitate configuring, implementing and scheduling the archiving. Additionally, some of these tools are relationally aware of major ERP systems such as Peoplesoft and Oracle E-Business suite. If you are like most companies, you probably are going to have to resort to doing this in house or hire a consultant to implement custom data archiving.

There are two major strategies when archiving data within the database: creating a separate table and partitioning. The major benefit of partitioning is that it implements physical structure behind a logical entity to maintain transparent availability. Current OLTP activity will be automatically isolated to the partition with the most current data. Reports and other ad-hoc inquires can access the full spectrum of data without modifications to any programs, scripts or reports. However, partitioning has its downsides. It requires a DBA with plenty of experience to handle the implementation and ongoing maintenance required by partitions. It also requires an elevated (more expensive) edition of the RDBMS.

If additional costs, such as purchasing a tool or upgrading your database, are obstacles to this effort then you are left with implementing separate tables as your strategy. The basic implementation is pretty straight forward as you will make a structural copy of each table you want to archive. Keep in mind that as you make changes to the live table, that those same changes need to be made to the archive table and the script/program that handles the archiving. Please review all reports that access any of these tables and decide, with your user community, if this report needs to show archived data. If so, then the data source behind the report will need to be modified to merge in the archived data with the live data. Also, decide if a separate report to show just archived data might be more appropriate to allow the live version to generate faster and be less of a burden on the system. Since the archive data should be read-only, don’t bother reviewing any batch jobs as they shouldn’t ever need to modify this data. It has already been decided that the online system no longer needs access to this archived data so no review is necessary. You will need to pick a schedule and batch size. Monthly, fiscal or calendar is usually the best suited frequency as to not burden the system with managing too many rows of data at one time (except the first time).

Conclusion – Implementing data archiving is a project, not a task. Perform due diligence to get legal and company signoff on retention levels. Make sure to provide concrete evidence to get users to see the benefits of paring down the live system. A separate fault tolerant disk sub-system is recommended; otherwise, keep the archived data on the same disks as the live data. Finally, if you’re not experienced, consider engaging a professional services firm in this area as you don’t want to go through this effort and not see any benefits or worse, accidentally lose data.

From database assessments to complete database management, CommitDBA can assist your company today. For more information about how CommitDBA can work for you – give us a call at (800) 734-9304.



Got database support needs?
Professional and Affordable DBA and Data Services

Contact us Today