Thinking Through Database Recovery for DBAs

Introduction

Database Recovery involves the restoration of a database, and the data, to a stable state in the event of a system crash caused by a hack or accidental damage. This typically involves restoring the data from backup files to a point at or near the time of the crash. Because there are certain steps that can help to mitigate the resulting damage from a crash, carefully consideration of database recovery techniques are a must when thinking through database recovery DBAs.

Thinking Through Recovery

It is the obligation of the database administrator to determine if a recovery is warranted when a database failure occurs. While it is not very probable, it is possible that a failure of an active database will not effect the stored data. If recovery is required, the administrator will have to figure out what resources and backups are available and how to execute the recovery. If failure should occur, there are a number of questions that must be addressed to properly ascertain the extent of the database failure. The answers to the following questions will determine how to best effect recovery:

  • Was the nature of the failure a media, transactional or a database instance?
  • What caused the failure to occur?
  • Was the database failure the result of an abort, a crash or a normal shutdown?
  • Were any operating-system errors taking place?
  • Was the server rebooted?
  • Were any errors found in the operating system, alert or database logs?
  • Did the failure result in a data dump?
  • Was there a generation of trace files?
  • How important was the data that was lost?
  • Has there yet been an attempt at recovery? If so, what steps have been taken?
  • Are there full and/or incremental backup files?
  • Does the full database require recovery, or just certain parts, such as a tablespace, single table or index?
  • Does the system’s current backup scheme make the type of required recovery possible?
  • If cold backups are available, how was the DB shut down when the backups were captured?
  • Are all of the archived database logs available for recovery?
  • Is there a recent logical backup?
  • Were any concurrent programs operating when the system failure occurred?
  • Can the DBMS instance be recalled?
  • Can the database objects be addressed?
  • How much data was lost?

Additionally, database management system version migration can have an impact on the ability of the systems to recover data. For example, migrating to a newer version of a DBMS after the last backup occurred can cause recovery issues. Occasionally, system vendors will modify data-file format or image-copy backup logs or files. This will typically render any recovery attempt using the old recovery format extremely challenging, if not outright impossible. This can happen if the new backup version significantly altered the functionality of the recovery utility enough so that the system could not interpret the older backup files. Irrelevant of how it happened, the database administrator is in a bad position without a valid backup to use for recovery. To avoid this, it is critical for DMAs to examine the particulars of a backup and recovery plan for all new DBMS versions before migration as well as capturing fresh image-copy backups after migration is completed, in the event the old backups become unusable.

Be Prepared To Tackle These Issues

These are only a few of the issues a DBA must be ready to deal with in able to recover a data. Additionally, the database administrator must understand all of the specific DBMS details. The following are the typical steps used for most database recovery operations:

  1. Outage detection typically happens when the database stops responding or the database management system displays an error message. However, some failures are more subtle, like a corrupt control file, and can be harder to detect.

  2. The administrator will have to determine the cause of the type error and scope of the failure.

  3. After a thorough analysis of the error the administrator will have to decide on a course of action and a recovery method.

  4. The administrator must next determine what database components are failing and then come up with a script suitable for the recovery of each component. Executing this script is often the most time-consuming part of the process.

  5. As the failure of one component will often impact other components in the database, such as indexes or tables, the DBA must identify dependencies between database objects that must be recovered. This is critical as restoring a database to a previous point will often effect related-database objects.

  6. Locating backups. The further the backup is from the recovery point the more time it will require to recover.

  7. Use the image-copy backups with the database recovery utility or file system recovery command to restore the database.

  8. The database logs must be processed to recover the database to a point in time after the backup was taken.

Virtually every database recovery operation will involve most of the above steps. However, as each situation is unique, certain steps may require modifying or even elimination, depending on the situation and the type of recovery.

Conclusion

Thinking through database recovery DBAs, it is imperative for DBAs to figure out what resources they will need and what steps need to be taken to recover a database after a crash. Because a crash is virtually always unexpected, the most important step in recovering from a database crash is making sure there are always adequate up-to-date backups of image copies of the database available, and this is especially crucial after performing any system updates.

Pilot the ObjectRocket Platform Free!

Try Fully-Managed Redis,
MongoDB & Elasticsearch

Get Started

OR

Try CockroachDB
in Beta

Get Started

Keep in the know!

Subscribe to our emails and we’ll let you know what’s going on at ObjectRocket. We hate spam and make it easy to unsubscribe.