Distributed Network Backups -
An Alternative to Tape
ArcUser Magazine, Publication Pending
It sounds like a scare ad from an IT trade magazine:
Those backup tapes made daily, changed weekly, and moved offsite regularly, turn up blank or otherwise unusable in your moment of need. The machine wasn't writing properly, the tapes were defective, or, more likely, someone just goofed up: the tapes weren't changed, someone wasn't paying attention, something went wrong. Your data is gone. Someone has to pay. Big time. It'll probably be you.
Scenes like this don't play out every day, but they do happen. In five years, I have seen it twice, in two different agencies, though, luckily, in both cases the problem was detected before data was irreparably lost. Making tape backups is a tedious and dull task, requiring regular attention and supervision, only yielding dividends when the rare disaster strikes. That's why it is so easy to not pay attention, to put it off, to just let it go another week or two.
Maybe there is a better way. If you're like many GIS shops, you've converted from UNIX to Windows NT, and have a whole network of workstations loaded with ArcInfo and/or ArcView reading data from a central server. If your workstations are newer models, they probably have fairly large hard drives. Since the bulk of your data is stored centrally, it's likely that the workstation disks have considerable space available. Why not use that space to make backups of the critical data from your central server?
The concept is simple, and implementation is straightforward. But before we get into the details, let's look at some of the reasons why you might want to consider using your excess hard drive capacity for backups.
- It's easy: By using Task Scheduler or other software to run the backups on a regular schedule, the process can be fully automated. Since there are no tapes to change, once the process is in place it is very low maintenance.
- It's convenient: If you ever need to recover a copy of your data, it is far easier to retrieve it from a machine on the LAN than to recover it from tape.
- It's cheap: If you already have the extra disk capacity sitting idle, the cost is near zero. Even if you decide to purchase an extra hard drive or two, these are cheaper than a tape unit, and could potentially be converted to other uses should your needs change. And, of course, there are no tapes to purchase.
- It's safe: There are fewer points of failure than with tape. If critical data is backed up in five different locations, it is unlikely that all would fail simultaneously. Hard drives are much less temperamental than tape drives, and less human intervention is required. Also, if some of the machines on your LAN are in a different physical location, such as another building, data backed up on
them will be safe from a fire or other catastrophic event, even if you forget to move your tapes off-site.
Of course, if you are currently using tape as a backup medium, it would probably be wise to continue. Networked backups could still be used to provide a bit of extra insurance against the possibility that the tape system should fail, which, sooner or later, it will. Also, in the event that you needed to retrieve data from a recent backup, it would certainly be easier to recover the information from a local archive than from a tape, even when your tape backups are running smoothly.
Implementing a networked backup system is quite straightforward. The first step is to look at the amount of data you want to backup, and your available disk resources. Ideally, you want to backup each piece of data several times, providing many redundancies. Compressing your data will help reduce the space consumed by your backups. Vector data can typically be squeezed to half its size. Imagery is large and (depending on format) does not compress well, but is generally static, and is therefore a good candidate for a one-time backup onto CD-ROM. Imagery rarely needs to be part of your regular backup system.
The second step is to decide on a backup schedule. It may seem tempting to run backups daily, but this is not always the best answer. If your data rarely changes, it might make more sense to run backups weekly, keeping several older versions around. Additional backups could be run manually if a large amount of data changed at one time. Also, you might establish different schedules for each workstation, with one backing up the central server daily, an another doing so weekly, keeping data for a month before dumping it. Alternatively, you might choose to backup rapidly changing data daily onto three machines, and relatively static data weekly or monthly onto a fourth. Your particular mix of data and resources will determine the optimum schedule for your shop.
The third step is the actual implementation. To make this step easier, a Visual Basic Script has been developed to automate the process. It is excerpted below and can be downloaded in full here.
Running the script
The script itself is straightforward. Execution basically consists of a series of calls to the makeBackup routine. This routine takes two parameters: a source directory to backup, and a destination zip file in which to store the data. If you have specified that multiple generations of the backups be retained (with the backups variable), existing backup files are renamed, and, if needed, the oldest copy is deleted. Next, PKZip is run, creating the backup archive. All activity is logged to the file specified by the backupLog variable, and if any errors are detected, a popup message is displayed to alert you to the fact. Otherwise, the script runs silently.
Customizing the script for your use is simple - you can change the name and location of the log file, set the number of old copies of your data to keep, and, of course, change the series of makeBackup calls to fit your specific situation. You will notice that the script makes copies of local data on a remote machine as well as backing up remote data locally. It is anticipated that each machine would have its own customized copy of this script, both copying local data to one or more machines on the network, and bringing in remote data from one or more sources. The script is intended to be run locally, and thus a customized schedule can be established for each machine using NT's at command scheduler.
If this script does not do everything you want it to, you can modify it or write your own. If your data is sensitive, for example, you might encrypt the zip files before storing them on a remote drive. Alternatively, you could check to see if a blank CD was loaded in a particular drive, and if so, write a copy of the data there. You could implement an incremental backup routine, storing a full copy of your data weekly or monthly, and writing only the changes on a daily basis. You could even adjust the number of backup copies dynamically according to disk space available. The possibilities are limitless.
Using the backup strategies outlined above will not work in every instance. Large organizations, for example, may have too much data to implement an effective networked backup system. But smaller GIS shops, especially those with newer workstations and fat hard disks, may find that making networked backups is a good compliment or even a substitute for traditional backups on tape.
Christopher Eykamp is a GIS Consultant working for BTG in Okinawa, Japan. Recent projects include developing a Wartime Runway Management & Repair System using Visual Basic and MapObjects. He is currently involved with implementing a comprehensive GeoDatabase for the US Air Force. He can be reached by email at
, or via his website at http://eykamp.com.
What follows is an excerpt from the Directory Backup Utility Visual Basic Script. The full script can be downloaded here.
' Directory backup utility
' Version 1.0
' Written by Chris Eykamp, May 2000, Okinawa, Japan
' Which archiving program to use must be located on system path
zipProg = "pkzip25"
' Location of the logfile
backupLog = "c:\backup.log"
' How many backup copies of the zip file do we keep?
' 0 = disable backup facility
backups = 3
''''''''' Put customized backup statements here
''''''''' Format: makeBackup "source folder" "destination zip file"
' My backups first
' Now some server backups
makeBackup "V:\AirForce","E:\V Drive Backup\AirForce.zip"
makeBackup "V:\ArcView","E:\V Drive Backup\ArcView.zip"
makeBackup "V:\Kadena_AB","E:\V Drive Backup\Kadena_AB.zip"
''''''''' End of customized backup statements
logMsg "Backup complete on " & now & vbcrlf
. . .
sub makeBackup (src, dest)
dim WshShell, zipCmd, retcode
' Create a shell object in which to run our zip commands
Set WshShell = WScript.CreateObject("WScript.Shell")
. . .
' Change this line if you change compression programs:
zipCmd = zipProg & " -add -path=current -rec -normal " _
& chr(34) & dest & chr(34) & " " & chr(34) & _
src & "\*.*" & chr(34)
retcode = WshShell.Run(zipCmd, 1, TRUE)
if retcode <> 0 then
logMsg "Error " & retcode & " in creation of " & _
chr(34) & dest & chr(34) & "."
errors = errors + 1
logMsg "Files in " & chr(34) & src & chr(34) & _
" successfully backed up to " & chr(34) & _
dest & chr(34) & "."
sub logMsg (msg)
' Write <msg> to the logfile
. . .
sub backupFile (file)
' Make backup copies of <file>
. . .
sub deleteFile (file)
' Delete <file> from the system
. . .
Entire site © 1996-2004 by Christopher Eykamp