“Failure is not an option”
Did you know that:
- “94% of companies suffering from a catastrophic data loss do not survive – 43% never reopen and 51% close within two years. (University of Texas)”. 1
- “7 out of 10 small firms that experience a major data loss go out of business within a year. (DTI/Price waterhouse Coopers)”
The importance of backups cannot be overstated for the continued existence of your business.
A backup strategy by definition solves your recovery requirements. So when defining a backup strategy the focus is on recovery.
- How long have you to perform recovery?
- What are you trying to recover for?
- What do you need to recover?
- Where do you keep your backups?
- How long do you keep your backups available for?
All of these questions are directly linked and will be used to determine the requirements for your recovery and help you define your backup strategy.
We will look at each of these questions to show the impact on your backup strategy.
How long is available to perform the recovery?
Like any operations performed by external vendors, service level agreements (SLA’s) are defined to set expectations on delivery. The same applies to recovery. Many times a backup strategy is defined without looking at the recovery implications. This is key for any backup strategy and will potentially have the biggest impact on the hardware and budget needed for putting in place the backup strategy.
For instance, performing a recovery within minutes will require different (and more expensive) hardware than a recovery within a day.
Any SLA will specify a timeframe in which a recovery will take place. It will typically not make any references to the data volume that will need to be restored. Hence, the backup strategy should cater for the current and projected data volume growth over a 2 to 3 year period.
To give you an idea of data volume growth, research shows that data volume is doubling every 2/3 years2.
The recovery time should also be a function of the criticality of your data to your business. The more important your data is to your business the quicker the recovery time.
What are you recovering for?
There are two types of failures that could cause data loss:
- A physical failure
- A logical failure
In most cases the focus is on a physical failure i.e. where a disk component has failed and a recovery is necessary. A recovery should be performed up to the time the disk failed.
There are things that can be done in order to mitigate disk failures, such as disk mirroring.
The second case is a logical failure. This occurs quite frequently and unfortunately it is not as noticeable as a failed disk. A logical failure is where the data has become corrupted or has been inadvertently removed. It may be several days/weeks or even months before it is discovered and we may still need to perform a recovery back to the time prior to the data removal.
There is no hardware solution available to mitigate a logical failure but daily monitoring and detailed reporting can help discover this early.
Ultimately the decision on what to recover may even have an impact on the type of database that you use with your application. There are two types of database backups offline and online. An offline backup is where the database server is offline (down) and a copy of all the database files completes your backup. The benefits is that you have a consistent backup, the disadvantage is that your database is not available for the duration of your backup. The alternative solution is an online backup. This is where the database remains available but database internals allows you to get a read consistency. Another advantage of an online backup is that you can have point in time recovery. This is where you say restore the database to a consistent state at a predetermined time.
What do I need to recover?
On the surface this appears to be a simple question. The typical response is “your data”. There are two questions that immediately spring to mind: “what is considered your data?” and “can you read your data without your applications?”.
The data that is being backed up typically will include a database. However, the design of your application will determine the various data sources, one of which will be your database. For instance, what if your database consisted of a list files which existed in a particular folder (or directory)? Both would need to be backed up and would need to be restored as a single recovery.
Backing up data from different sources (or even in the same database) poses another problem, read consistency. Lets imagine we backup the folder of files and then the database. If more files are added while the database is being backed up we will have inconsistencies between the backup taken of the folder and of the database.
In larger organizations, a subset of data in one database may be used in another. This immediately builds up dependencies between databases (and applications). Will this mean that a single failure will require all dependent databases to be recovered. The inconsistencies between the dependent databases would need to be determined by the business as to what should be done. Designing the applications with dependency monitoring to easily identify the data sources and what should be backed out will greatly help during the recovery process.
Where are you storing your backups?
I have audited many backup strategies and one common occurrence is that the media on which the backups are taken exist in the same data centre as their server. In most cases this is adequate, however, in the event of a fire or natural disaster your business could be wiped out.
With the advent of fast internet access remote backups provide a means of reducing the risk by keeping your core data at two separate locations.
Even a simple backup strategy which I recommended to one of my clients was simply having two external hard drives which he alternates weekly, bringing one home and the other in the office. Worst case scenario they would lose a week of work (which was acceptable for their business).
How long do you keep your backups available for?
If we were dealing with simply data loss due to a physical crash of a hard disk then we would only ever need storage capacity for two backups, one for the previous backup and one for the current backup. The reason for the two is that if you are writing a backup when a physical crash occurs you could potentially be left with nothing.
However, logical backups where data loss may not be discovered for a considerable amount of time would benefit by keeping the backups for a longer period. For example, imagine you are working on a document that is backed up nightly. Would it be more beneficial to have a copy taken nightly to have available in case of a problem (or where paragraphs have been accidentally removed). How long should these be maintained?