Previous Page
Next Page

6.3. Making Backups

Hard disks are mechanical devices. They are guaranteed to wear out, fail, and lose your data. The only unknown is when they will fail.

Data backup is performed to guard against drive failure. But it's also done to guard against data loss due to theft, fire, accidental deletion, bad editing, software defects, and unnoticed data corruption.

6.3.1. How Do I Do That?

Before making backups, you must decide:

  • What data needs to be backed up

  • How often the data needs to be backed up

  • How quickly you need to restore the data

  • How far back in time you need to be able to restore

Based on this information, you can develop a backup strategy, including a backup technology, schedule, and rotation. Determining what data to back up

Any data that you want to preserve must be backed up; usually, this does not include the operating system or applications, because you can reinstall those.

Table 6-5 lists some common system roles and the directories that should be considered for backup.

Table 6-5. Directories used for critical data storage in various common system roles
System roleStandard directoriesNotes
Database server (e.g., MySQL) /var/lib/mysql Stop the database server or use snapshots to ensure consistency between tables.
Web server /var/www/etc/httpd/home/*/~public_html Also include any data directories used by web applications.
DNS nameserver /var/named/etc/named.conf This information usually changes slowly.
Desktop system, or any system accessed by individual users /home Exclude cache directories such as /home/*/.mozilla/firefox/*/Cache.
Samba server All directories served by Samba 
CUPS print server /etc/cups Configuration information only; usually changes slowly.
All systems /etc Configuration information for most software and hardware installed on the system. Determining how often to back up your data

Generally, backup frequency should be decided based on how often (and when) the data changes, and how many changes you are willing to lose.

For example, printer configuration data may be changed only a few times a year, and losing the latest change won't cost much in terms of the work required to re-create that change. Word processing documents may be changed daily, and you may want to ensure that you don't lose more than one day's work (or even a half-day's work); on the other hand, orders on a busy web site may be received every few seconds, and you may decide that you can't live with the loss of more than a few minutes worth of data. Determine how quickly you will need to restore your data

How long can you live without your data? The answer probably depends on regulatory and operational issues.

Some types of informationsuch as information about cross-border shipmentsmust be reported to government agencies on a daily basis, for example, and delays are penalized by fines of thousands of dollars per day. This puts a tremendous amount of pressure on the data-recovery process. On the other hand, personal music and photo collections may not need to be restored until weeks or months after the data loss. Determine how far back in time you need to restore

Some types of data loss or corruption may not be realized until weeks, months, or years after they have occurred, while others will be immediately obvious. In some caseswhen data changes quicklyit may be necessary to be able to restore data to the state it was in on a specific date, while in other cases it's sufficient to be able to restore data to the state that it was in at the end of a particular month. Decision 1: Incremental versus full backups, and backup rotation

Files may be selected for backup on an incremental basisonly files that have been changed since the last backup are selectedor a full backup may be performed.

Incremental backups often require significantly less storage space than full backups when dealing with large sets of individual files such as word processing documents because the number of documents that are changed each day is usually fairly small. On the other hand, a small SQL update query may cause all of the files in a database to be modified, nullifying the benefits of incremental backup in that context.

An incremental backup scheme usually involves making full backups periodically and then making incremental backups until the scheduled time of the next full backup. Restoring from an incremental backup therefore requires you to restore a full backup, then restore all of the incremental backups from that point forward. Thus, the time required for a restore operation may be much longer than for a system that uses only full backups. Also, if one of the backups is unusable due to media corruption or damage, you will not be able to reliably perform a full recovery.

Given the choice between full and incremental backups, I recommend using full backups whenever practical. Decision 2: Decide on the backup media

Cost, capacity, and speed usually drive the selection of backup media. There are many options available:


DVD is an attractive medium. Fedora includes software to produce compressed optical discs that are automatically decompressed by the kernel when they are read. The compression ratio will depend on the type of data being backed up; text files may compress by 7590 percent, while data that is already in a compressed format (such as documents) may not compress at all. You can reasonably expect 50 percent compression for a typical mix of user files, and 75 percent for databases containing text data; that means a single-sided DVDR, which costs only a few cents and which has a nominal capacity of 4.7 GB (usable capacity of slightly over 4.3 GB), will hold 8+ GB of regular user files or 16+ GB of database files. DVD is also a fast, random-access medium.


Similar to DVD, with a lower storage capacity and wider deployment. Because higher-capacity DVDs are similarly priced (actually, cheaper in some jurisdictionssuch as Canadadue to music levies on CDs), DVDs are preferred except when backing up a device such as a laptop that has only a CD-RW drive.


Tape is by far the most economical choice for high-volume data backup (>10 GB uncompressed), but it still doesn't come cheap. Tape drives can cost more than the disk drives being backed up, and each backup tape can cost 2550 percent of the price of the corresponding disk storage. Tapes are also fairly slow during search and restore operations due to their sequential nature.


Hard disks can be used for data backup. USB drives are particularly convenient for this purpose, but removable drive trays can also be used with ATA or SATA drives. Hard drives are fast, but expensive and fragile.

Remote storage

Copying an archive of data to a remote system periodically.

Remote mirror

Making an immediate copy of all data written to the local disk drive provides the ultimate backup, but this approach is complicated and does not by itself guard against data corruption or accidental file deletion. For one approach to remote mirroring, see " . . . mirroring to a remote drive as part of a disaster-recovery plan?" in the "What About . . ." section in Lab 6.2, "Managing RAID."

I'm going to focus on DVD and tape storage options in this lab. Decision 3: Decide on media rotation and storage

When using DVDs, you have the option of selecting DVDR media, which can only be written once. This provides an inexpensive, compact, and permanent archive through time; assuming one disc per day, a year's worth of discs will take only about 4L of space and cost less than $100.

For tape and DVDRW media, you'll need to decide on your media rotation strategy. This is a compromise between the number of tapes/discs and how far back in time you wish to restore.

A simple rotation scheme involves buying a set amount of media and rotating through it. For example, 20 discs or tapes used only on weekdays will enable you to restore files to the state they were in during any weekday in the preceding four weeks.

A multilevel scheme permits you to go back farther in time. A simple three-level scheme (known as Grandfather/Father/Son) is shown in Table 6-6.

Table 6-6. Grandfather/Father/Son backup scheme with 20 discs/tapes
LevelMedia usedDiscs or tapes required
A (Son)MondayThursday4
B (Father)Three out of every four Fridays3
C (Grandfather)Fridays not covered by level B13

This scheme uses the same 20 discs or tapes, but permits you to restore to:

  • Any weekday in the preceding week

  • The end of any week in the preceding four weeks

  • The end of any four-week period in the preceding year

Note that level A media will be more frequently used than level B or C media and will therefore need to be replaced more often.

You must also decide where and how you will store your media. Unless the media is stored offsite, a disaster such as fire or theft could result in the loss of both the original storage drives and the backup media, but storing media offsite will slow the restoration process. Simple backup labeling

There are many ways of labeling backups, but one of the easiest is to create a file named system-<hostname> in the root directory immediately before producing the backup, and include that as the first file in the backup volume:

# touch /system-$(hostname)
# ls -l /system-*
-rw-r--r--  1 root root 0 Jul  1 01:34 /

This will identify the originating system name as well as the date and time of the backup (from the file timestamp). Backing up to DVD

To back up data to DVD, use the growisofs command:

# growisofs -Z /dev/dvd -RJ -graft-points /etc=/etc /home=/home /system-*

This will back up the /etc and /home directories to /dev/dvd (the default DVD recorder). -Z indicates that this is the first session on the disc, and -RJ enables long filename handling compatible with Unix/Linux (Rock Ridge) and Windows (Joliet) systems. The graft-points option permits the backed-up directories to be stored in specific directories on the disc. /etc=/etc and /home=/home specify the directories to be backed up, ensuring that each directory is placed in a directory with the same name on the disc. The argument /system-* places the system label file in the root directory of the DVD.

This command will work with DVD-R, DVD+R, DVD-RW, and DVD+RW media.

To create a compressed DVD, use the mkzftree command to create a compressed copy of the origin directories:

# mkdir /tmp/zftree
# mkzftree /home /tmp/zftree/home
# mkzftree /etc  /tmp/zftree/etc

You will need sufficient disk space to hold the compressed image before it is written to the optical disc.

Then use the -z option to growisofs:

# growisofs -Z /dev/dvd -RJz /tmp/zftree /system-*

Putting this all together into a script, and mailing the results to the email alias backup-alert, we get this:

# backup-dvd :: backup selected directories to a compressed DVD

# List of the directories to be backed up 
                     /etc /home"

# Create timestamp file
rm -f /system-*
touch /system-$(hostname)

# Make directory for compressed backup tree
rm -rf /tmp/zftree 2>/dev/null
mkdir /tmp/zftree
    mkzftree $DIR /tmp/zftree${DIR}
    RESULT=$(( $? + $RESULT ))

if [ "$RESULT" -eq 0 ]

    # Burn the DVD
    growisofs -Z /dev/dvd -RJz /tmp/zftree /system-*

    # Eject the disc


    echo "Skipping burn: file compression failed."


# Delete the zftree
rm -rf /tmp/zftree 2>/dev/null

) 2>&1|mail -s "Backup Log $(hostname)" backup-alert

Edit the DIRLIST line so that it contains a list of the directories to be backed up, separated by spaces.

Save this file as /usr/local/bin/backup-dvd and then make it executable:

# chmod u+rx /usr/local/bin/backup-dvd

And be sure to create an email alias for the backup-alert user in the file /etc/aliases:

backup-alert: chris frank

To produce a backup, execute this script:

# backup-dvd

But it's a better idea to configure the system to run this script automatically every night (see Lab 6.4, "Scheduling Tasks"). Backing up to tape

To back up directories to tape, use the tape archiver (tar):

# tar -cf /dev/st0 /system-* /etc /home
tar: Removing leading \Q/' from member names
tar: Removing leading \Q/' from hard link targets

In this command, /dev/st0 is the first tape drive, and /etc and /home are the directories being backed up.

To perform a compressed backup, add the z (for gzip compression) or j (for bzip2 compression) option:

# tar -czf /dev/st0 /system-* /etc /home
tar: Removing leading \Q/' from member names
tar: Removing leading \Q/' from hard link targets

Here is a script that will perform a tape backup:

# backup-tape :: backup selected directories to a compressed tape

# List of the directories to be backed up 
                     /etc /home"

# Create timestamp file
rm -f /system-*
touch /system-$(hostname)

# Produce the tape
tar -czf /dev/st0 /system-* $DIRLIST

# Eject the tape if possible
mt -f /dev/st0 eject

) 2>&1|mail -s "Backup Log $(hostname)" backup-alert

Save this script as /usr/local/bin/backup-tape.

Like the backup-dvd script, this script will send an email report to the email alias backup-alert. To include a list of files in the email report, add the -v option to the tar command:

tar -czvf /dev/st0 /system-* $DIRLIST

To produce a backup tape, run the script from the command line:

# backup-tape

It's best to run this script automatically every night (see Lab 6.4, "Scheduling Tasks"). Restoring files from backups

When restoring from tape, it's a good idea to restore to a location other than the original file location to ensure that critical data is not accidentally overwritten. These commands will perform a full restore of a tape to the directory /tmp/restore:

# mkdir /tmp/restore
# cd /tmp/restore
# tar xvzf /dev/st0

To restore only certain files, specify the filenames as arguments to tar:

# tar xvzf /dev/st0 home/chris/

If the file specified is a directory, all of the files and subdirectories in that directory will be restored.

Restoring from disc is easy: just copy the files that you want to the location that you want. You can do this graphically, or you can restore all of the files on the disc:

# mkdir /tmp/restore
# cd /tmp/restore
# cp -r /media/CDROM/* .
        Viewing the table of contents and verifying a backup

To verify that a tape backup is readable, use tar's t option to view a table of contents of the tape:

# tar tvzf /dev/st0
-rw-r--r-- root/root    0 2006-07-01 01:34:24
drwxr-xr-x root/root    0 2005-09-23 15:01:38 etc/gconf/
drwxr-xr-x root/root    0 2005-03-02 11:59:15 etc/gconf/gconf.xml.mandatory/
drwxr-xr-x root/root    0 2005-08-29 00:53:34 etc/gconf/1/
-rw-r--r-- root/root  840 2005-03-02 11:59:11 etc/gconf/1/path
drwxr-xr-x root/root    0 2006-03-20 01:33:22 etc/gconf/schemas/
...(Lines skipped)...

Since the label file /system-* is the first file on the tape, you can view the originating machine as well as the date and time of the backup by just viewing the first line of the table of contents:

# tar tvzf /dev/st0|head -1
-rw-r--r-- root/root    0 2006-07-01 01:34:24

To verify that all of the files on an optical disc are readable, use find to read each file on the mounted disc:

# find /media/cdrecorder -exec cp {} /dev/null \;

Only errors will be reported.

6.3.2. How Does It Work?

The growisofs command is part of the package dvd+rw-tools, which was originally intended for use with DVD+RW media. Since the original design, it has grown to include support for all DVD media formats. It operates as a frontend to the mkisofs command, which produces a filesystem in the ISO 9660 format that is the standard for optical media, and then writes the mkisofs output to the disc burner.

ISO 9660 is unfortunately limited to eight-character filenames with a three-character extension. The Rock Ridge (RR) extension adds support for long filenames, user and group ownership, and permission mode under Linux; Joliet extensions add similar support for the Windows operating systems. Using the -JR option to growisofs causes the created disk to be compatible with both Rock Ridge and Joliet.

mkzftree makes a recursive copy of a directory structure, compressing any files that would benefit from compression during the copy process. The resulting directory structure can be passed to mkisofs with the -z option, which will cause mkisofs to create additional Rock Ridge records with information about the data compression used. These records in turn enable the kernel's filesystem layer to decompress the files on the fly when reading them from disc.

When backing up to tape, tar converts a directory structure to a continuous stream of bytes. A short header contains the pathname, ownership, permissions modes, size, and timestamps for a file, followed by the data for that file; this is repeated for each file in the archive.

The z option to tar causes it to start gzip and process all data through it. As an alternative, the j option will process the archive stream through bzip2, which may offer better compression in some circumstances.

6.3.3. What About... ...using LVM snapshots in a backup script?

You can simply place the appropriate vgcreate and mount commands at the start of your backup script, and umount and vgremove commands at the end of the script.

Here is a slightly fancier version of the DVD backup script, which accepts a list of vg / lv pairs and creates a compressed DVD backup. Set the LVLIST and SNAPSIZE variables to whatever values you wish to use:

# backup-dvd :: backup selected directories to a compressed DVD

# List of the vg/lv to be backed up 
LVLIST="main/home main/var"

# Amount of space to use for snapshots

# Create timestamp file
rm -f /system-*
touch /system-$(hostname)

# Make directory for compressed backup tree
rm -rf /tmp/zftree 
mkdir /tmp/zftree 

    echo "========= Processing $VGLV..."

    # Get information about the vg/lv
    VG=$(echo $VGLV|cut -f1 -d/)
    LV=$(echo $VGLV|cut -f2 -d/)
     $(grep "^/dev/${VGLV}" /etc/fstab|tr "\t" " "|tr -s " "|cut -f2 -d" ")

    # Create a snapshot
    lvcreate -s $VGLV --name $SNAPNAME --size $SNAPSIZE
    RESULT=$(( $? + $RESULT ))

    # Mount the snapshot
    mkdir -p $NEWMOUNT
    mount -o ro /dev/${VG}/${SNAPNAME} ${NEWMOUNT}
    RESULT=$(( $? + $RESULT ))

    # Place it in the zftree
    mkdir -p /tmp/zftree$(dirname $OLDMOUNT)
    mkzftree ${NEWMOUNT} /tmp/zftree${OLDMOUNT} 
    RESULT=$(( $? + $RESULT ))

    # Unmount the snapshot
    umount $NEWMOUNT

    # Release the snapshot
    lvremove -f ${VG}/${SNAPNAME}

if [ "$RESULT" -eq 0 ]

    # Burn the DVD
    growisofs -Z /dev/dvd -RJz /tmp/zftree /system-*

    # Eject the disc


    echo "Skipping burn: snapshot or file compression failed."


# Delete the zftree
rm -rf /tmp/zftree 2>/dev/null

) 2>&1|mail -s "Backup Log $(hostname)" backup-alert

Each LV to be backed up must have a mount point identified in /etc/fstab. ...putting more than one backup on a tape?

The device node /dev/st0 is the default (first) tape drive on the system, configured to rewind after each use. /dev/nst0 is the same device but without the automatic rewind.

In order to position the tape, Fedora provides the mt command, described in Table 6-7.

Table 6-7. mt tape control commands
mt commandDescription
                                 mt rewind

Rewinds the tape
mt fsf

Forward-skips a file
mt fsf count

Forward-skips count files
mt bsf

Backward-skips a file
mt bsf count

Backward-skips count files
mt status

Displays the drive status
mt offline or mt eject Rewinds and ejects the tape (if possible)

The mt command uses /dev/tape as its default device; create this as a symbolic link to /dev/nst0 if it does not already exist:

# ln -s /dev/nst0 /dev/tape

You can now create a multibackup tape:

# mt rewind
# tar cvzf /dev/tape /home
# tar cvzf /dev/tape /etc
# mt rewind

To read a specific backup on a multibackup tape, rewind to the beginning (just to be sure you're at the start), and then skip any files (backups) necessary to reach the archive you want. These commands will access the table of contents for the second archive, for example:

# mt rewind
# mt fsf
# tar tvzf /dev/tape
...(Lines snipped)... ...backing up multiple systems onto a central tape archive?

Fedora Core includes amanda, a powerful client-server tape backup system that can be used for this purpose. See the amanda manpages for details.

6.3.4. Where Can I Learn More?

Previous Page
Next Page