I find myself rather surprised that this is a major issue in what is otherwise a really good enterprise-level backup tool. Syncronizing backups just seems to be a basic element to the idea of backups in a corporate environment. Should the building that my backup server resides in burns down, gets hit by a tornado, etc, there should be a process whereby you can have a syncronized backup elsewhere. Also by extension, what happens when you want to have a "cluster" of BackupPC?
The idea that you just run two BackupPC servers each running their own backups may work in some cases, but you are talking about double the transfers on the machines being backed up, and that can be unacceptable in some cases. For example, one of my machines being backed up is a linux server acting as a network drive. Backups of this can take a long time, BackupPC tells me 514 minutes for it's last full backup (naturally, this occurs after business hours). Once its been backed up, it's been deduped & compressed. It would ideally be better, even on a LAN, to transfer this compressed & deduped pool than it would to back it up twice on the same day. In the case of my network drive, worst case it gets bogged down 8hrs a day for backup. I have a small space of time that is considered "off hours" for it. My backup server on the other hand, can be bogged down 24 hrs/day for all I care, no one else is using its services but me.
Jeffrey, what is your latest version of your script? I have 0.1.3, circa Sept '11. Given how your script generally works, could it be made to simply recreate the pool structure on an external drive on the same system, rather than compressing it to a tarball? My end goal here is to be able to simply grab the external drive at a moment's notice, plug it into a new linux machine, and using a tarball of the BackupPC config files, and stand it up long enough to restore everyone's PCs & appropriate servers.
Greg, I would definitely have an interest in seeing the script; anything that will help me achieve a tertiary remote backup...
From: ***@kosowsky.org [mailto:***@kosowsky.org]
Sent: Thursday, February 28, 2013 9:43 PM
To: General list for user discussion, questions and support
Subject: Re: [BackupPC-users] BackupPC Pool synchronization?
Post by Mark Campbell
So I'm trying to get a BackupPC pool synced on a daily basis from a 1TB MD RAID1 array to an external Fireproof drive (with plans to also sync to a remote server at our collo). I found the script BackupPC_CopyPcPool.pl by Jeffrey, but the syntax and the few examples I've seen online have indicated to me that this isn't quite what I'm looking for, since it appears to output it to a different layout. I initially tried the rsync method with -H, but my server would end up choking at 350GB. Any suggestions on how to do this?
The bottom line is that other than doing a block level file system copy there is no "free lunch" that gets around the hard problem of copying over densely hard-linked files.
As many like yourself have noted, rsync bogs down using the -H (hard
links) flag, in part because rsync knows nothing of the special structure of the pool & pc trees so it has to keep full track of all possible hard links.
One solution is BackupPC_tarPCCopy which uses a tar-like perl script to track and copy over the structure.
My script BackupPC_copyPcPool tries to combine the best of both worlds. It allows you to use rsync or even "cp -r" to copy over the pool disregarding any hard links. The pc tree with its links to the pool are re-created by creating a flat file listing all the links, directories, and zero size files that comprise the pc tree. This is done with the help of a hash that caches the inode number of each pool entry. The pc tree is then recreated by sequentially (re)creating directories, zero size files, and links to the pool.
I have substantially re-written my original script to make it orders of magnitude faster by substituting a packed in-memory hash for the file-system inode-tree I used in the previous version. Several other improvements have been added, including the ability to record full file md5sums and to fix broken/missing links.
I was able to copy over a BackupPC tree consisting of 1.3 million pool files (180 GB) and 24 million pc tree entries (4 million directories, 20 million links, 300 thousand zero-length files) in the following time:
~4 hours to copy over the pool
~5 hours to create the flat file mapping out the pc tree directories,
hard links & zero length files
~7 hours to convert the flat file into a new pc tree on the target filesystem
These numbers are approximate since I didn't really time it. But it was all done on a low end AMD dual-core laptop with a single USB3 drive.
For this case, the flat file of links/directories/zero length files is 660 MB compress (about 3.5 GB uncompressed). The inode caching requires about 250MB of RAM (mostly due to perl overhead) for the 1.3 million pool files.
Note, before I release the revised script, I also hope to add a feature that allows the copying of one or more backups from the pc tree on one machine to the pc tree on another machine (with a different pool). This feature is not available on any other backup scheme... and effectively will allow "incremental-like" backups.
I also plan to allow the option to more tightly pack the inode caching to save memory at the expense of some speed. I should be able to fit
10 million pool nodes in a 300MB cache.
I would like to benchmark my revised routine against BackupPC_tarPCCopy in terms of speed, memory requirement, and generated file size...
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics Download AppDynamics Lite for free today:
BackupPC-users mailing list