Designing a backup strategy
A backup can be understood as a copy of used data pages or a copy of transaction log records. Backups should be done regularly. Backups are needed not only to restore databases in the case of physical or logical failure but for when, for example, we want to make a copy of the database or migrate the database to another SQL Server instance. To have our backups proper and complete, we need to consider the following points:
- How to configure the database
- Which types of backups to combine
- How to use backup strategies in conjunction with database usage
Let's first look at the types of backup supported by SQL Server.
Backup types
SQL server basically supports three main types of backup:
- Full backup
- Transaction log backup
- Differential backup
Each of these types has several variants and we will go through each of them.
Full backup
A full backup is simply a backup of a complete database. When performing a full backup, SQL Server stores metadata of the database (its name, creation date, all options set to the database, paths to all files belonging to the database, and so on), the used data pages of every data file, and also the active part of the transaction log (which means all the transactions that are not checkpoints yet and all running transactions, even if they are not finished).
At the end of the backup process, SQL Server stores the last Log Sequence Number (LSN) for possible additional backups.
Note
A full backup never clears the transaction log file!
A full backup can be performed without following a recovery model set. However, the correct option is to have a SIMPLE recovery model if we do not intend to add additional backups to our backup strategy.
The command for a full database backup is as follows:
BACKUP DATABASE <database name> TO DISK = '<full path to backup>'
We can write the following as an example to back up an AdventureWorks database:
BACKUP DATABASE AdventureWorks TO DISK = 'D:\myBackups\AdventureWorks.bak'
Let's describe the preceding example in more detail. A database called AdventureWorks must exist on the server and it must be in a consistent state; in other words, we never can back up a database (with any kind of backup) that is not online or that is not working normally. The second mandatory condition is that the D:\myBackups path must exist on the filesystem. Backups are not installers; they never create folders.
The filename for the backup is arbitrary; the .bak extension is recommended. The file itself need not exist; it is created with the first backup. It is possible to store more backups in one file. When we want to have more backups in one file, we must add a new option to the backup command:
BACKUP DATABASE AdventureWorks TO DISK = 'D:\myBackups\AdventureWorks.bak'WITH NOINIT
When the preceding example is run for the first time and the .bak file does not exist, it will be created. When the same command is executed a second time with the same path and filename, the backup file will grow in volume because additional backups will be added to it. This scenario is not useful for big database backups because of the quickly growing size of the backup file. But since many databases are not very big, we can use the NOINIT option to store a full backup, along with the transaction log backups of the small database, in one backup file. Recursively, when we want to erase all the backups from the backup file and start a new backup cycle, we can change the NOINIT option with the INIT option. The INIT option erases all the backups from the backup file and stores only the new backup in it.
Full backups tend to have a big volume. This could lead to disk insufficiency, as well as the backup operation being time-consuming. That is why it is highly recommended to compress backups. There are two ways to do this. The first way is to set the server level to default for backup compression. The command for this server setting is as follows:
EXEC sp_configure 'backup compression default', 1
GO
RECONFIGURE
GO
The sp_configure system stored procedure is used in many cases, as seen in the preceding example. The first parameter, 'backup compression default', is the name of the configuration property. It is quite hard to remember all the configuration parameters by name, so we should remember that we can execute sp_configure just as is, without parameters. This procedure will return the result set, along with a list of parameter names and currently configured values.
The second parameter in the preceding script sample (the number 1) is a bit value indicating that we want to switch the backup compression at the instance's level on. For some configuration values, only bit is used; for example, when setting the maximum degree of parallelism, the integer value indicates how many CPUs can be used for the parallel processing of one TSQL request.
The RECONFIGURE command causes the configured property to load immediately, without the need for a service restart.
The second way of setting compression is to simply add another option directly to the BACKUP command:
BACKUP DATABASE AdventureWorks TO DISK = 'D:\myBackups\AdventureWorks.bak'
WITH NOINIT, COMPRESSION
SQL Server compresses the backup file itself when it is written (no particular backup stored in the file), so we cannot have a part of the file uncompressed and the rest of the same file compressed. If we do not want to compress some backups, the opposite option is NO_COMPRESSION.
Note
To use backup compression, let's take an uncompressed backup. In this case, the INIT option of the BACKUP command is too smooth. We need to replace the INIT option with the stronger FORMAT option. The FORMAT option physically deletes the backup file and creates a new one. Use the FORMAT option carefully because it will cause all your backups in certain backup files to be lost forever.
A full backup serves as a baseline for more advanced backup strategies. It is often combined with transaction log backups, and this dependency is driven by the last LSN written to every backup. When an additional transaction log backup is executed, SQL Server remembers the last LSN from the previous backup and starts the current backup operation from the next LSN. Hence, when a full backup is executed, the last remembered LSN is replaced with a new one and the backup strategy obtains a new baseline.
In some cases, this is undesired behavior; for example, in situations where we need to create a copy of a certain database with a full backup, but without breaking out of the backup sequence. For this case, one more full backup variant exists:
BACKUP DATABASE AdventureWorks TO DISK = 'D:\myBackups\tempBackupOfAdventureWorks.bak'
WITH COPY_ONLY
The COPY_ONLY option in the preceding command causes the LSN sequence that was tracked for backup consequences to not restart and the exceptional full backup to not establish a new baseline for the backup strategy.
A full backup is relatively straightforward but less efficient when we need to minimalize potential data loss. That is why we need to have a stronger mechanism, such as the transaction log backup, to keep our data safe and sheltered against physical as well as logical damage.
Transaction log backup
As mentioned in the previous section, a full backup establishes a baseline for more efficient backup strategies. In other words, one full backup must be created before we can start a backup transaction log. A transaction log backs up all transaction log records from the last LSN, which is contained in the previous backup.
In other words, a full backup is a backup of the state of the database, while a transaction log backup is a backup of the additional changes from the last LSN that was stored by a previous backup. Using a transaction log backup ensures that the recovery point objective (RPO) that the database could be restored to is very close to the moment when the database was damaged.
Another important property of transaction log backups is that this backup type erases the inactive virtual log files (logical parts in the transaction log file) of the transaction log file. It keeps the transaction log file at a reasonable size.
To be able to use a transaction log backup, the database's recovery model property must be set to the BULK_LOGGED or FULL value. Remember that unlike the FULL recovery model, the BULK_LOGGED recovery model does not allow you to restore the database at a certain point in time.
When the recovery model is set correctly and a full backup is executed, we can start a backup of the transaction log on a regular basis. The basic syntax for a transaction log backup is as follows:
BACKUP LOG AdventureWorks TO DISK = 'D:\myBackups\AdventureWorksLog.bak' WITH <additional options>
As seen in the preceding code example, the BACKUP LOG syntax is very similar to the BACKUP DATABASE syntax. The database must already exist, and it must be in an online state; the path to the .bak file must exist in the filesystem as well. If the .bak file does not exist, it will be created when the BACKUP LOG statement is executed for the first time.
The additional options are basically almost the same as the full backup statement:
- The INIT/NOINIT pair controls whether the content of the backup file will be replaced.
- The FORMAT/NOFORMAT pair is a stronger variant for INIT/NOINIT options.
- The COMPRESSION/NO_COMPRESSION pair controls the backup's compression.
The meaning of these options is the same for all backup types.
Now that we have enough information about basic backup types, we can go through more complex examples. The following code sample shows you a set of backup statements and their sorting. The only difference in the real world is that every statement is executed separately and that, typically, the execution is planned by SQL Server Agent.
Note
SQL Server Agent will be described later, in Chapter 9, Configuring Always On High-Availability Features.
Let's take a look at this assignment. The AdventureWorks database is used as a typical operational database with lots of incoming transactions. These transactions must not be lost because the clients of the database write their data through a sales web application. The backup cycle will be restarted every day. The AdventureWorks database is relatively small, so all the backups can be stored in the same file. An important business need is that the database must be recoverable to a certain point in time. How do we prepare the AdventureWorks database for proper backups and which backup statements do we use? The following recipe shows the complete process:
- We must ensure that the database is in FULL recovery mode:
-- this statement will be run once only
-- if database is in FULL recovery model already, nothing happens
ALTER DATABASE AdventureWorks SET RECOVERY FULL
GO
- Every day at, for instance, 3 a.m., we will execute a full backup:
-- following statement will reset content of the backup file
BACKUP DATABASE AdventureWorks TO DISK = 'D:\backups\AdventureWorks.bak'
WITH INIT
GO
- Every hour or maybe more often, if needed, we will repeat the transaction log backup:
-- following statement will append the backup to the backup file and clears
-- transaction log
BACKUP LOG AdventureWorks TO DISK = 'D:\backups\AdventureWorks.bak'
WITH NOINIT
GO
As seen in the previous code sample, it is not very complicated to create a simple and strong backup strategy. The transaction log backup should be executed often to maintain the transaction log file's size. This will improve the runtime of the backup operation, making it fast and small, and not in conflict with regular user requests.
So far, everything has just been routine, but what if damage occurs? A very common mistake is to think that the only kind of damage is physical damage, for example, file corruption. However, we should keep in mind that another kind of damage is logical damage, for example, accidental deletion of data or some structures. When such logical damage occurs, SQL Server does not detect the problem and the database remains online. However, for the user, the database is useless and damaged.
For either type of corruption, SQL Server provides a special transaction log backup called a tail-log backup. The tail-log backup is a variant of the transaction log backup. It backs up transaction log records written to the transaction log file up to the moment of their corruption, hence why it is called a backup of the tail of the transaction log. The tail-log backup switches the state of the database to restoring. The restoring state of the database causes the database to be inaccessible to users. It is very important to use the tail-log backup in case of logical corruption. It is not probable that all the data in the database can be logically damaged at the same moment. However, we still need to stop a user from working on the rest of the data because we know that the database is going to be restored, and all user changes will be lost. An example syntax to execute a tail-log backup is as follows:
BACKUP LOG AdventureWorks TO DISK = 'D:\backups\tailLog.bak' WITH NORECOVERY
The NORECOVERY keyword is the option that forms the tail-log backup. The preceding syntax is just enough for logical accidents such as unwanted deletes of data. But for every backup operation, the database must be in a consistent and online state. What if the database is in a suspect state?
The suspect state of a database is set by SQL Server in situations where the database is somehow corrupted physically and not accessible to users. In this case, we have two additional options that can be added to the BACKUP LOG statement:
BACKUP LOG AdventureWorks TO DISK = 'D:\backups\taillog.bak'
WITH
NORECOVERY, NO_TRUNCATE, CONTINUE_AFTER_ERROR
Let's describe these new options in more detail.
When the database is corrupted, no backup can be executed apart from the preceding code. The CONTINUE_AFTER_ERROR option says to SQL Server that we know about the corruption, but we want to keep all possible transaction log records captured by the transaction log until the moment of damage. Even if the transactions are incomplete or some of the transaction log records are not readable, the rest of the transactions will be kept by the BACKUP LOG statement. If we do not use the CONTINUE_AFTER_ERROR option, SQL Server assumes that the database is in a consistent online state and the BACKUP LOG statement will fail.
The second NO_TRUNCATE option causes no maintenance to be done by SQL Server on completion of the backup. This is the intended behavior because we know that the database is in an unstable state and it is probable that any write operation will fail. We also know that, after the tail-log backup's completion, we will start the restore process of the database, so any additional maintenance is wasteful.
Differential backup
SQL Server maintains an internal structure called a differential map. This structure keeps track of all changes made upon user requests in database extents since the last full backup. This is very useful in cases where just a portion of a database is updated frequently. We can use differential backups in conjunction with frequent transaction log backups to speed up the process of restoring later when the need occurs. A differential backup has the following characteristics:
- It is a kind of full backup (backs up extents changed from the last full backup and does not maintain the transaction log).
- It is cumulative (backs up extents changed from the last full backup, which allows you to skip more transaction log backups during a restore).
- It is faster and smaller than a full backup (does not slow down the database for a long time and can be executed concurrently for common user work without decisive influence on performance).
- It does not need any additional settings at the database or server level.
The time a differential backup takes depends on how much part of the database has changed since the last full backup. We can use the sys.dm_db_file_space_usage view and execute the following query to test how many extents have changed:
SELECT total_pages_count, modified_extent_page_count FROM sys.dm_db_file_space_usage
The preceding query returns two columns:
The first column, total_page_count, shows the total number of data pages in the database.
- The second column, modified_extent_page_count, shows how many data pages were modified since the last full backup was executed.
As the modified_extent_page_count value goes closer to the value of total_pages_count, the more extents will be backed up by the differential backup and the differential backup operation will slow down.
The syntax for a differential backup is as follows:
BACKUP DATABASE AdventureWorks TO DISK = 'D:\myBackups\AdventureWorksDiff.bak'
WITH DIFFERENTIAL
From a syntactical point of view, a differential backup is just a database backup with one more option. Other options such as INIT/NOINIT are also possible. If the use of differential backups is recognized, the timeline of backups will be according to the following table. This table describes a daily-based strategy for smaller databases, with all the backup types stored in the same backup file:
The preceding table summarizes the flow of the database backups within a 1-day cycle. The cycle starts every day at 3 a.m. with a full backup. The following transaction log backups make a chain of changes made to the database's data. Differential backups are executed at 9 a.m., 12 a.m., and, let's say, at 3 p.m. and 6 p.m., which allows us to speed up the process of the database restore.
So far, we have explained the backup options used to maintain the recoverability of a simple database. We will now use the same options in more complicated scenarios.
Advanced backup scenarios
Now that we have understood all three basic types of backup, we can decide how to summarize our needs and choose the right backup strategy. It is also very important to ensure that our backup is reliable and fast. We must also maintain backups for larger databases composed of more files or filegroups. SQL Server provides you with a set of features that cover all three of these needs.
Backup media
In previous chapters, we worked with backups stored on disk files. It is a very common destination for backups because a tape, as a backup destination, must be attached directly to the server. Due to the usual usage of backup tape devices to back up overall company infrastructures, SQL Server does not improve tape backup possibilities and relies on third-party backups. Another target of backups could be the blob container in Azure. We will describe backups to Azure, as well as other scenarios involving Azure, in Chapter 11, Combining SQL Server 2019 with Azure. That is why all the examples in this chapter will use just disk files as their backup devices.
Note
If SQL Server databases are going to be backed up by third-party backup devices, never mix the execution of these backups with SQL Server's native backup!
When we need to improve the speed of the backup, we can join more backup places into one set, called the media set. The media set is formed of one or more devices of the same type. SQL Server spreads data across all devices in the media set evenly. We can imagine the media set as a striped disk. The media set is created the first time the backup is executed. The following example creates a media set:
BACKUP DATABASE <database name> TO
DISK = '<path to first file>',
DISK = '<path to second file>'
WITH
MEDIANAME = '<name of the media set>'
Every backup saved to the same media set is then called a backup set. Once the media set is created, all files in the backup set must be used together. An attempt to use one of the files for additional backup without using the whole media set will fail. Using media sets makes backup operations faster, but it also increases the risk of backup loss.
Let's look at the following example:
BACKUP DATABASE <database name> TO
DISK = '<path to first file>'
-- second file from previous example is not used
WITH
FORMAT
The FORMAT option causes the media set to break, and all the backups saved there are lost! Use media sets carefully.
Backup reliability
Everything saved on disk could be somehow broken or unreadable. That is why SQL Server provides you with two features to improve backup reliability:
- The first option is to use a backup with the CHECKSUM option. This option is simple to use and causes the computation of the checksum value upon backup completion. This value is saved in the backup and when we prepare for the restore process, we can test the backup's readability using the CHECKSUM option:
BACKUP DATABASE <database name> TO DISK = '<path to file>'
WITH
CHECKSUM
We can also turn on the checksum option at the instance's level using the sp_configure system stored procedure. The following script shows how to use it:
EXEC sp_configure 'backup checksum default', 1
RECONFIGURE
- Another option used to distribute backups across more devices is called mirrored backup. Mirrored backup is an enterprise feature of SQL Server, and when we use it, two identical backups are written synchronously to two backup devices. When we use backup mirroring, the syntax looks as follows:
BACKUP DATABASE <database name> TO
DISK = '<path to file>'
MIRROR TO DISK = '<path to file>'
WITH <additional options like CHECKSUM>
The preceding example adds a second path to the backup medium, followed by the MIRROR TO keyword.
File or filegroup backup
One of the reasons why a database should be distributed into more files or filegroups is better manageability. As the database's size grows from time to time, the backup size also increases, even if backup compression is used. In this case, SQL Server provides a very useful feature called file backup, or filegroup backup.
The following examples are being shown for filegroup backups because file backups are almost the same except that we use the FILE keyword instead of the FILEGROUP keyword. The syntax of the file/filegroup backup uses the logical filenames or filegroup names in the header of the backup. First of all, let's create a database called BiggerSystem. We can use the following script (remember that the paths to the physical files should be set accordingly with your existing drives and folders):
CREATE DATABASE BiggerSystem
ON
(name = 'BiggerSystem_Data', filename = 'D:\SqlData\BiggerSystem.mdf')
LOG ON
(name = 'BiggerSystem_Log', filename = 'L:\SqlLogs\BiggerSystem.ldf')
ALTER DATABASE BiggerSystem ADD FILEGROUP OPERDATA
ALTER DATABASE BiggerSystem ADD FILEGROUP ARCHIVE2016
ALTER DATABASE BiggerSystem
ADD FILE
(name = 'BiggerSystem_Data2', filename = 'D:\SqlData\BiggerSystem2.ndf')
TO FILEGROUP OPERDATA
ALTER DATABASE BiggerSystem
ADD FILE
(name = 'BiggerSystem_Data3', filename = 'D:\SqlData\BiggerSystem3.ndf')
TO FILEGROUP ARCHIVE2016
The preceding script creates the example database called BiggerSystem, along with the following filegroups:
- PRIMARY (mandatory in every database)
- OPERDATA (filegroup containing hot data instantly used for transactions)
- ARCHIVE2016 (filegroup containing cold data without any DML operations on it)
The setting for the recovery model option for this database is set to FULL.
The filegroup backup syntax is as follows:
BACKUP DATABASE <database name>
FILEGROUP = <filegroup name>, FILEGROUP = <another filegroup name>
TO DISK = '<file path>'
WITH
<additional options>
In the following example, we are using the BiggerSystem database and its filegroups:
-- monday 3 a. m.
BACKUP DATABASE BiggerSystem
FILEGROUP = 'PRIMARY'
TO DISK = 'L:\backups\bigsysprimary.bak'
WITH INIT, CHECKSUM, COMPRESSION
-- monday every hour
BACKUP LOG BiggerSystem TO DISK = 'L:\backups\bigsyslogs.bak'
WITH NOINIT, CHECKSUM, COMPRESSION
-- tuesday 3 a. m.
BACKUP DATABASE BiggerSystem
FILEGROUP = 'OPERDATA'
TO DISK = 'L:\backups\bigsysoper.bak'
WITH INIT, CHECKSUM, COMPRESSION
-- tuesday every hour
BACKUP LOG BiggerSystem TO DISK = 'L:\backups\bigsyslogs.bak'
WITH NOINIT, CHECKSUM, COMPRESSION
-- wednesday 3 a. m.
BACKUP DATABASE BiggerSystem
FILEGROUP = 'ARCHIVE2016'
TO DISK = 'L:\backups\bigsysarch2016.bak'
WITH INIT, CHECKSUM, COMPRESSION
-- wednesday every hour
BACKUP LOG BiggerSystem TO DISK = 'L:\backups\bigsyslogs.bak'
WITH NOINIT, CHECKSUM, COMPRESSION
-- and so on, thursday we start to backup the PRIMARY filegroup again
As seen in the preceding example, we must not miss any filegroup out from a certain database.
Let's assume that the filegroup called ARCHIVE2016 is not used for DML operations and that, in such cases, its repeating backup becomes unnecessary. SQL Server provides an enterprise feature called partial backup. This partial backup saves the primary filegroup, all read-write filegroups, and explicitly written read-only filegroups. That is why it is very useful to set filegroups containing historical or other read-only data as read-only.
Let's go through one more example (for the sake of simplicity, the transaction log backups in the following code sample have been omitted):
-- run once: set the ARCHIVE2016 filegroup as read-only
ALTER DATABASE BiggerSystem MODIFY FILEGROUP ARCHIVE2016 READONLY
-- first time backup after setting the filegroup read-only
BACKUP DATABASE BiggerSystem
READ_WRITE_FILEGROUPS, FILEGROUP = 'ARCHIVE2016'
TO DISK = 'L:\backups\bigsys.bak'
WITH INIT, CHECKSUM, COMPRESSION
-- transaction backups follow for the rest of day
-- next daily backups
BACKUP DATABASE BiggerSystem
READ_WRITE_FILEGROUPS
TO DISK = 'L:\backups\bigsysadd.bak'
WITH INIT, CHECKSUM, COMPRESSION
-- transaction backups follow
The preceding SQL sample consists of three statements:
- The first statement sets ARCHIVE2016 filegroup to the read-only state. It ensures that there is no way to modify data placed within ARCHIVE2016 filegroup.
- The second statement adds ARCHIVE2016 filegroup to backup. It is the last backup of this filegroup.
- The third statement will back up all read-write filegroups; ARCHIVE2016 filegroup is not backed up from now on.
So far, we've discussed how to back up user databases. Now, let's focus on system databases. The following sections will explain how to back up system databases properly.
Backing up system databases
In simple words, system databases need backups like user databases do. Backup strategies for system databases are often much more straightforward than for user databases. In the following table, you can see the common backup strategies for every system database:
The preceding table enumerates all visible system databases and suggests how to back up each of the system databases. As we explained in the introductory part of this chapter, each system database plays its unique role on SQL Server. The suggested frequency of backups reflects how many changes in the data are made within every database.
This section was quite big, wasn't it? So, let's briefly summarize the knowledge we've gained throughout this section.
Backup summary
Managing our backups properly is a very important task that we must perform. As we described in this chapter, we need to decide which types of backups to use, how often to use them, how reliable they are, and where to store them. This decision has a strong impact on the ability to restore data in minimal time with minimal loss. The following table describes several types of databases and example backup strategies:
The preceding table is a rough key for you to decide how to plan backup strategies.
In the next section, we will work with database backups and restore corrupted databases in many scenarios.