Discussion:
Using rsync for blockdevice-level synchronisation of BackupPC pools
(too old to reply)
Pieter Wuille
2009-09-02 10:10:54 UTC
Permalink
Hello everyone,

trying to come up with a way for efficiently synchronising a BackupPC archive
on one server with a remote and encrypted offsite backup, the following problems
arise:
* As often pointed out on this list, filesystem-level synchronisation is
extremely cpu and memory-intensive. Not actually impossible, but depending
on the scale of your backups, it is maybe not a practical solution.
In our case of a 350GiB pool containing 4 million directories and 20 miilion
inodes, simply locally copying the whole pool using
cp/rsync/xfsdump/whatever thrashes, gets killed by OOM or at least takes
days, longer than i find reasonable for a remote synchronisation run.
* Furthermore, we want our offsite backup to be encrypted - in the ideal case
using a secret key that is at no single moment ever known at the remote
location - there should only be encrypted files sent to it, and stored there.
Doing this encryption at the file level given such massive amount of small
files, is a very serious additional overhead.
* The alternative to file-level synchronisation is (block)device-level
synchronisation. Many possibilities exist here, including ZFS send/receive
(if you use ZFS), using snapshots (eg. LVM) or temporarily stopping backups,
and do a full copy of the pool to the remote side (if you have enough
bandwidth), etc... Not everyone is willing to use these, or is prepared to
convert to such a system.
* We would like to use Rsync for this, since it will skip identical parts, yet
guarantee that the whole file is byte-per-byte identical to the original.
Unfortunately, as far as I know, rsync doesn't support data on block devices
to be synced, only the block device node itself. In addition to that, rsync
needs to read and process the whole file on the receiver side, calculate
checksums, send them all to the sender side, wait for the sender to
reconstruct the data using the checksums, send this reconstruction, and
apply this reconstruction at the receiver side. This requires at least the
sum of the times to read through the whole data on both sides if it is a
single file (correct me if i'm wrong, i don't know rsync internals). Data
hardly moves on-disk in the case of a BackupPC pool, so we would like to
disable or at least limit the range in which rsync searches for matching data.

To overcome this issue, i wrote a perl/fuse filesystem that allows you to
"mount" a block device (or real file) as a directory containing files
part0001.img, part0002.img, ... each representing 1 GiB of data of the
original device:

https://svn.ulyssis.org/repos/sipa/backuppc-fuse/devfiles.pl

This directory can be rsynced in a normal way with an "ordinary" directory
on an offsite backup. In case a restore is necessary, doing
'ssh remote "cat /backup/part*.img" >/dev/sdXY' (or equivalent) suffices.
Although devfiles.pl has (limited) write support, rsync'ing to the resulting
directory is not yet possible - maybe i can try to have this working if
people have a need for it. This would allow restoration by simply rsync'ing
in the opposite direction.
Doing the synchronisation in groups of 1GiB prevents rsync from searching
too far, and splitting it in multiple files allows some parallellism
(sender transmitting data to receiver, while receiver already checksums
the next file; this is heavily limited by disk I/O however).

In our case, the BackupPC pool is stored on an XFS filesystem on an LVM
volume, allowing a xfsfreeze/sync/snapshot/xfsunfreeze, and using
devfiles.pl on the snapshot. Instead of xfsfreeze+unfreeze, a backuppc
stop/umount + mount/backuppc start is also possible. If no system for making
snapshots is available, you would need to suspend backuppc during the whole
synchronisation.
In fact, the BackupPC volume is already encrypted on our backup server
itself, allowing very cheap encrypted offsite backups (simply not sending
the keyfile to the remote side is enough...)

The result: offsite backups of our 400GiB pool, containing 350GiB data, of
which about 2GiB changes daily, is synchronised 5 times a week with offsite
backup in 12-15 hours, requiring nearly no bandwidth. This seems mostly
limited by the slow disk I/O on the receiver side (25MiB/s).

Hope you find this interesting/useful,

--
Pieter
Daniel Berteaud
2009-09-02 11:56:13 UTC
Permalink
Le mercredi 02 septembre 2009 à 12:10 +0200, Pieter Wuille a écrit :
> Hello everyone,
>
> trying to come up with a way for efficiently synchronising a BackupPC archive
> on one server with a remote and encrypted offsite backup, the following problems
> arise:
> * As often pointed out on this list, filesystem-level synchronisation is
> extremely cpu and memory-intensive. Not actually impossible, but depending
> on the scale of your backups, it is maybe not a practical solution.
> In our case of a 350GiB pool containing 4 million directories and 20 miilion
> inodes, simply locally copying the whole pool using
> cp/rsync/xfsdump/whatever thrashes, gets killed by OOM or at least takes
> days, longer than i find reasonable for a remote synchronisation run.
> * Furthermore, we want our offsite backup to be encrypted - in the ideal case
> using a secret key that is at no single moment ever known at the remote
> location - there should only be encrypted files sent to it, and stored there.
> Doing this encryption at the file level given such massive amount of small
> files, is a very serious additional overhead.
> * The alternative to file-level synchronisation is (block)device-level
> synchronisation. Many possibilities exist here, including ZFS send/receive
> (if you use ZFS), using snapshots (eg. LVM) or temporarily stopping backups,
> and do a full copy of the pool to the remote side (if you have enough
> bandwidth), etc... Not everyone is willing to use these, or is prepared to
> convert to such a system.
> * We would like to use Rsync for this, since it will skip identical parts, yet
> guarantee that the whole file is byte-per-byte identical to the original.
> Unfortunately, as far as I know, rsync doesn't support data on block devices
> to be synced, only the block device node itself. In addition to that, rsync
> needs to read and process the whole file on the receiver side, calculate
> checksums, send them all to the sender side, wait for the sender to
> reconstruct the data using the checksums, send this reconstruction, and
> apply this reconstruction at the receiver side. This requires at least the
> sum of the times to read through the whole data on both sides if it is a
> single file (correct me if i'm wrong, i don't know rsync internals). Data
> hardly moves on-disk in the case of a BackupPC pool, so we would like to
> disable or at least limit the range in which rsync searches for matching data.
>
> To overcome this issue, i wrote a perl/fuse filesystem that allows you to
> "mount" a block device (or real file) as a directory containing files
> part0001.img, part0002.img, ... each representing 1 GiB of data of the
> original device:
>
> https://svn.ulyssis.org/repos/sipa/backuppc-fuse/devfiles.pl
>
> This directory can be rsynced in a normal way with an "ordinary" directory
> on an offsite backup. In case a restore is necessary, doing
> 'ssh remote "cat /backup/part*.img" >/dev/sdXY' (or equivalent) suffices.
> Although devfiles.pl has (limited) write support, rsync'ing to the resulting
> directory is not yet possible - maybe i can try to have this working if
> people have a need for it. This would allow restoration by simply rsync'ing
> in the opposite direction.
> Doing the synchronisation in groups of 1GiB prevents rsync from searching
> too far, and splitting it in multiple files allows some parallellism
> (sender transmitting data to receiver, while receiver already checksums
> the next file; this is heavily limited by disk I/O however).
>
> In our case, the BackupPC pool is stored on an XFS filesystem on an LVM
> volume, allowing a xfsfreeze/sync/snapshot/xfsunfreeze, and using
> devfiles.pl on the snapshot. Instead of xfsfreeze+unfreeze, a backuppc
> stop/umount + mount/backuppc start is also possible. If no system for making
> snapshots is available, you would need to suspend backuppc during the whole
> synchronisation.
> In fact, the BackupPC volume is already encrypted on our backup server
> itself, allowing very cheap encrypted offsite backups (simply not sending
> the keyfile to the remote side is enough...)
>
> The result: offsite backups of our 400GiB pool, containing 350GiB data, of
> which about 2GiB changes daily, is synchronised 5 times a week with offsite
> backup in 12-15 hours, requiring nearly no bandwidth. This seems mostly
> limited by the slow disk I/O on the receiver side (25MiB/s).
>
> Hope you find this interesting/useful,

Hi.

This seems to be an interesting approach to solve the offsite backups
problem. I'll try to test this when I have some time.

thanks

>
> --
> Pieter
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now. http://p.sf.net/sfu/bobj-july
> _______________________________________________
> BackupPC-users mailing list
> BackupPC-***@lists.sourceforge.net
> List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
> Wiki: http://backuppc.wiki.sourceforge.net
> Project: http://backuppc.sourceforge.net/
--
Daniel Berteaud
FIREWALL-SERVICES SARL.
Société de Services en Logiciels Libres
Technopôle Montesquieu
33650 MARTILLAC
Tel : 05 56 64 15 32
Fax : 05 56 64 15 32
Mail: ***@firewall-services.com
Web : http://www.firewall-services.com
Jon Craig
2009-09-02 12:43:11 UTC
Permalink
On Wed, Sep 2, 2009 at 7:56 AM, Daniel
Berteaud<***@firewall-services.com> wrote:
> Le mercredi 02 septembre 2009 à 12:10 +0200, Pieter Wuille a écrit :
>> Hello everyone,
>>
>> trying to come up with a way for efficiently synchronising a BackupPC archive
>> on one server with a remote and encrypted offsite backup, the following problems
>> arise:
>> * As often pointed out on this list, filesystem-level synchronisation is
>>   extremely cpu and memory-intensive. Not actually impossible, but depending
>>   on the scale of your backups, it is maybe not a practical solution.
>>   In our case of a 350GiB pool containing 4 million directories and 20 miilion
>>   inodes, simply locally copying the whole pool using
>>   cp/rsync/xfsdump/whatever thrashes, gets killed by OOM or at least takes
>>   days, longer than i find reasonable for a remote synchronisation run.
>> * Furthermore, we want our offsite backup to be encrypted - in the ideal case
>>   using a secret key that is at no single moment ever known at the remote
>>   location - there should only be encrypted files sent to it, and stored there.
>>   Doing this encryption at the file level given such massive amount of small
>>   files, is a very serious additional overhead.
>> * The alternative to file-level synchronisation is (block)device-level
>>   synchronisation. Many possibilities exist here, including ZFS send/receive
>>   (if you use ZFS), using snapshots (eg. LVM) or temporarily stopping backups,
>>   and do a full copy of the pool to the remote side (if you have enough
>>   bandwidth), etc... Not everyone is willing to use these, or is prepared to
>>   convert to such a system.
>> * We would like to use Rsync for this, since it will skip identical parts, yet
>>   guarantee that the whole file is byte-per-byte identical to the original.
>>   Unfortunately, as far as I know, rsync doesn't support data on block devices
>>   to be synced, only the block device node itself. In addition to that, rsync
>>   needs to read and process the whole file on the receiver side, calculate
>>   checksums, send them all to the sender side, wait for the sender to
>>   reconstruct the data using the checksums, send this reconstruction, and
>>   apply this reconstruction at the receiver side. This requires at least the
>>   sum of the times to read through the whole data on both sides if it is a
>>   single file (correct me if i'm wrong, i don't know rsync internals). Data
>>   hardly moves on-disk in the case of a BackupPC pool, so we would like to
>>   disable or at least limit the range in which rsync searches for matching data.
>>
>> To overcome this issue, i wrote a perl/fuse filesystem that allows you to
>> "mount" a block device (or real file) as a directory containing files
>> part0001.img, part0002.img, ... each representing 1 GiB of data of the
>> original device:
>>
>>   https://svn.ulyssis.org/repos/sipa/backuppc-fuse/devfiles.pl
>>
>> This directory can be rsynced in a normal way with an "ordinary" directory
>> on an offsite backup. In case a restore is necessary, doing
>> 'ssh remote "cat /backup/part*.img" >/dev/sdXY' (or equivalent) suffices.
>> Although devfiles.pl has (limited) write support, rsync'ing to the resulting
>> directory is not yet possible - maybe i can try to have this working if
>> people have a need for it. This would allow restoration by simply rsync'ing
>> in the opposite direction.
>> Doing the synchronisation in groups of 1GiB prevents rsync from searching
>> too far, and splitting it in multiple files allows some parallellism
>> (sender transmitting data to receiver, while receiver already checksums
>> the next file; this is heavily limited by disk I/O however).
>>
>> In our case, the BackupPC pool is stored on an XFS filesystem on an LVM
>> volume, allowing a xfsfreeze/sync/snapshot/xfsunfreeze, and using
>> devfiles.pl on the snapshot. Instead of xfsfreeze+unfreeze, a backuppc
>> stop/umount + mount/backuppc start is also possible. If no system for making
>> snapshots is available, you would need to suspend backuppc during the whole
>> synchronisation.
>> In fact, the BackupPC volume is already encrypted on our backup server
>> itself, allowing very cheap encrypted offsite backups (simply not sending
>> the keyfile to the remote side is enough...)
>>
>> The result: offsite backups of our 400GiB pool, containing 350GiB data, of
>> which about 2GiB changes daily, is synchronised 5 times a week with offsite
>> backup in 12-15 hours, requiring nearly no bandwidth. This seems mostly
>> limited by the slow disk I/O on the receiver side (25MiB/s).
>>
>> Hope you find this interesting/useful,
>
> Hi.
>
> This seems to be an interesting approach to solve the offsite backups
> problem. I'll try to test this when I have some time.
>
> thanks
>
>>
>> --
>> Pieter
>>
>> ------------------------------------------------------------------------------
>> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
>> trial. Simplify your report design, integration and deployment - and focus on
>> what you do best, core application coding. Discover what's new with
>> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
>> _______________________________________________
>> BackupPC-users mailing list
>> BackupPC-***@lists.sourceforge.net
>> List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
>> Wiki:    http://backuppc.wiki.sourceforge.net
>> Project: http://backuppc.sourceforge.net/
> --
> Daniel Berteaud
> FIREWALL-SERVICES SARL.
> Société de Services en Logiciels Libres
> Technopôle Montesquieu
> 33650 MARTILLAC
> Tel : 05 56 64 15 32
> Fax : 05 56 64 15 32
> Mail: ***@firewall-services.com
> Web : http://www.firewall-services.com
>
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> _______________________________________________
> BackupPC-users mailing list
> BackupPC-***@lists.sourceforge.net
> List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
> Wiki:    http://backuppc.wiki.sourceforge.net
> Project: http://backuppc.sourceforge.net/
>

I was having the same thought this morning regarding rsync. There is
a patch available for rsync to allow it to directly work on raw
devices, but its slated for a future release. I found this on another
site:

Standard rsync is missing this feature, but there is a patch for it in
the rsync-patches tarball (copy-devices.diff) which can be downloaded
from http://rsync.samba.org/ftp/rsync/ After appling and recompiling,
you can rsync devices with the --copy-devices option.

Of more interest is the open source package zumastor. It a full blown
snopshot solution for linux. It has the advantage of setting up
ongoing snapshots (like zfs) to be replicated and applied to a remote
server. The downside is that its a "copy-on-write" type solution and
this causes reduced write performance on the source server. This can
be mitigated through the use of NVRAM to hold the filesystem journal,
but the degree of technical difficulty seems to rise quickly in this
solution and may not be appropriate for the SOHO or faint of heart.

--
Jonathan Craig
Les Mikesell
2009-09-02 15:14:05 UTC
Permalink
Pieter Wuille wrote:
>
> To overcome this issue, i wrote a perl/fuse filesystem that allows you to
> "mount" a block device (or real file) as a directory containing files
> part0001.img, part0002.img, ... each representing 1 GiB of data of the
> original device:
>
> https://svn.ulyssis.org/repos/sipa/backuppc-fuse/devfiles.pl
>
> This directory can be rsynced in a normal way with an "ordinary" directory
> on an offsite backup. In case a restore is necessary, doing
> 'ssh remote "cat /backup/part*.img" >/dev/sdXY' (or equivalent) suffices.
> Although devfiles.pl has (limited) write support, rsync'ing to the resulting
> directory is not yet possible - maybe i can try to have this working if
> people have a need for it. This would allow restoration by simply rsync'ing
> in the opposite direction.
> Doing the synchronisation in groups of 1GiB prevents rsync from searching
> too far, and splitting it in multiple files allows some parallellism
> (sender transmitting data to receiver, while receiver already checksums
> the next file; this is heavily limited by disk I/O however).

Thanks for posting this. I've considered a very similar approach using
a VMware .vmx image file using the options to pre-allocate the space and
segment into chunks as an intermediate that would be directly usable by
a vmware guest. I'm glad to hear that the rsync logistics would be
practical.

> In our case, the BackupPC pool is stored on an XFS filesystem on an LVM
> volume, allowing a xfsfreeze/sync/snapshot/xfsunfreeze, and using
> devfiles.pl on the snapshot. Instead of xfsfreeze+unfreeze, a backuppc
> stop/umount + mount/backuppc start is also possible. If no system for making
> snapshots is available, you would need to suspend backuppc during the whole
> synchronisation.
> In fact, the BackupPC volume is already encrypted on our backup server
> itself, allowing very cheap encrypted offsite backups (simply not sending
> the keyfile to the remote side is enough...)
>
> The result: offsite backups of our 400GiB pool, containing 350GiB data, of
> which about 2GiB changes daily, is synchronised 5 times a week with offsite
> backup in 12-15 hours, requiring nearly no bandwidth. This seems mostly
> limited by the slow disk I/O on the receiver side (25MiB/s).
>
> Hope you find this interesting/useful,

The one thing that would bother me about this approach is that you would
have a fairly long window of time while the remote filesystem chunks are
being updated. While rsync normally creates a copy of an individual
file and does not delete the original until the copy is complete, a
mis-matched set of filesystem chunks would likely not be usable. Since
disasters always happen at the worst possible time, I'd want to be sure
you could recover from losing the primary filesystem (site?) in the
middle of a remote copy. This might be done by keeping a 2nd copy of
the files at the remote location, keeping them on an LVM with a snapshot
taken before each update, or perhaps catting them together onto a
removable device for fast access after the chunks update.

--
Les Mikesell
***@gmail.com
Pieter Wuille
2009-09-02 15:40:19 UTC
Permalink
On Wed, Sep 02, 2009 at 10:14:05AM -0500, Les Mikesell wrote:
> Pieter Wuille wrote:
> > In our case, the BackupPC pool is stored on an XFS filesystem on an LVM
> > volume, allowing a xfsfreeze/sync/snapshot/xfsunfreeze, and using
> > devfiles.pl on the snapshot. Instead of xfsfreeze+unfreeze, a backuppc
> > stop/umount + mount/backuppc start is also possible. If no system for making
> > snapshots is available, you would need to suspend backuppc during the whole
> > synchronisation.
> > In fact, the BackupPC volume is already encrypted on our backup server
> > itself, allowing very cheap encrypted offsite backups (simply not sending
> > the keyfile to the remote side is enough...)
> >
> > The result: offsite backups of our 400GiB pool, containing 350GiB data, of
> > which about 2GiB changes daily, is synchronised 5 times a week with offsite
> > backup in 12-15 hours, requiring nearly no bandwidth. This seems mostly
> > limited by the slow disk I/O on the receiver side (25MiB/s).
> >
> > Hope you find this interesting/useful,
>
> The one thing that would bother me about this approach is that you would
> have a fairly long window of time while the remote filesystem chunks are
> being updated. While rsync normally creates a copy of an individual
> file and does not delete the original until the copy is complete, a
> mis-matched set of filesystem chunks would likely not be usable. Since
> disasters always happen at the worst possible time, I'd want to be sure
> you could recover from losing the primary filesystem (site?) in the
> middle of a remote copy. This might be done by keeping a 2nd copy of
> the files at the remote location, keeping them on an LVM with a snapshot
> taken before each update, or perhaps catting them together onto a
> removable device for fast access after the chunks update.

You're very right, and i thought about it too. Instead of using a RAID1 on
the offsite backup, there are two separate backups on the offsite machine,
and synchronisation switches between them. This also enables the use of
rsync's --inplace option.

Keeping an LVM snapshot is a possibility, but it becomes somewhat complex to
manage: you get a snapshot of a volume containing a filesystem whose files
correspond to parts of a snapshot of a volume containing an (encrypted)
filesystem containing a directory that corresponds to a pool of backups...

Catting the part files together to a device after transmission isn't a
complete solution: what if the machine crashes during the catting...?

--
Pieter
Les Mikesell
2009-09-02 18:08:27 UTC
Permalink
Pieter Wuille wrote:
> >> The one thing that would bother me about this approach is that you would
>> have a fairly long window of time while the remote filesystem chunks are
>> being updated. While rsync normally creates a copy of an individual
>> file and does not delete the original until the copy is complete, a
>> mis-matched set of filesystem chunks would likely not be usable. Since
>> disasters always happen at the worst possible time, I'd want to be sure
>> you could recover from losing the primary filesystem (site?) in the
>> middle of a remote copy. This might be done by keeping a 2nd copy of
>> the files at the remote location, keeping them on an LVM with a snapshot
>> taken before each update, or perhaps catting them together onto a
>> removable device for fast access after the chunks update.
>
> You're very right, and i thought about it too. Instead of using a RAID1 on
> the offsite backup, there are two separate backups on the offsite machine,
> and synchronisation switches between them. This also enables the use of
> rsync's --inplace option.

That should be safe enough, but doesn't that mean you xfer each set of
changes twice since the alternate would be older?

> Keeping an LVM snapshot is a possibility, but it becomes somewhat complex to
> manage: you get a snapshot of a volume containing a filesystem whose files
> correspond to parts of a snapshot of a volume containing an (encrypted)
> filesystem containing a directory that corresponds to a pool of backups...

The snapshot would just contain be the same files you had before the
last xfer started. But, you'd still need space to hold the large file
changes.

> Catting the part files together to a device after transmission isn't a
> complete solution: what if the machine crashes during the catting...?

The machine crash would have to destroy the filesystem containing the
chunks to be a real problem. And then I wouldn't expect both your
primary server and the server holding the file chunks to die at the same
time, but it would mean you'd have to xfer the whole mess again.
Perhaps you could alternate the catting to 2 different devices so you'd
always have one ready to whisk off to the restore location.

--
Les Mikesell
***@gmail.com
Pieter Wuille
2009-09-03 12:59:42 UTC
Permalink
On Wed, Sep 02, 2009 at 01:08:27PM -0500, Les Mikesell wrote:
> Pieter Wuille wrote:
> > You're very right, and i thought about it too. Instead of using a RAID1 on
> > the offsite backup, there are two separate backups on the offsite machine,
> > and synchronisation switches between them. This also enables the use of
> > rsync's --inplace option.
>
> That should be safe enough, but doesn't that mean you xfer each set of
> changes twice since the alternate would be older?

That's correct, but it hardly seems to matter. Due to a problem the offsite
machine was down once for over two weeks, and the subsequent synchronisation
run still only took 14h. The limiting factor is the sequential read speed of
the device, not the network.

> > Keeping an LVM snapshot is a possibility, but it becomes somewhat complex to
> > manage: you get a snapshot of a volume containing a filesystem whose files
> > correspond to parts of a snapshot of a volume containing an (encrypted)
> > filesystem containing a directory that corresponds to a pool of backups...
>
> The snapshot would just contain be the same files you had before the
> last xfer started. But, you'd still need space to hold the large file
> changes.

It's definitely possible, and if your destination machine uses RAID5+, and
only relatively small changes per synchronisation run, it may be preferable
to keeping two sets on non-redundant storage.

> > Catting the part files together to a device after transmission isn't a
> > complete solution: what if the machine crashes during the catting...?
>
> The machine crash would have to destroy the filesystem containing the
> chunks to be a real problem. And then I wouldn't expect both your
> primary server and the server holding the file chunks to die at the same
> time, but it would mean you'd have to xfer the whole mess again.
> Perhaps you could alternate the catting to 2 different devices so you'd
> always have one ready to whisk off to the restore location.

Yes, i was wrong. A crash during the catting would normally not hurt the
files that already were transmitted. As long as you don't start transferring
the next set during the catting of the previous, there is no problem.

--
Pieter
Les Mikesell
2009-09-03 16:35:50 UTC
Permalink
Pieter Wuille wrote:
>
>>> You're very right, and i thought about it too. Instead of using a RAID1 on
>>> the offsite backup, there are two separate backups on the offsite machine,
>>> and synchronisation switches between them. This also enables the use of
>>> rsync's --inplace option.
>> That should be safe enough, but doesn't that mean you xfer each set of
>> changes twice since the alternate would be older?
>
> That's correct, but it hardly seems to matter. Due to a problem the offsite
> machine was down once for over two weeks, and the subsequent synchronisation
> run still only took 14h. The limiting factor is the sequential read speed of
> the device, not the network.

Your network between sites must be exceptionally fast - that's probably
not a typical situation.

>>> Catting the part files together to a device after transmission isn't a
>>> complete solution: what if the machine crashes during the catting...?
>> The machine crash would have to destroy the filesystem containing the
>> chunks to be a real problem. And then I wouldn't expect both your
>> primary server and the server holding the file chunks to die at the same
>> time, but it would mean you'd have to xfer the whole mess again.
>> Perhaps you could alternate the catting to 2 different devices so you'd
>> always have one ready to whisk off to the restore location.
>
> Yes, i was wrong. A crash during the catting would normally not hurt the
> files that already were transmitted. As long as you don't start transferring
> the next set during the catting of the previous, there is no problem.

Maybe it doesn't matter if your network is as fast as your disks, but I
like the idea of ending up with a disk you can ship overnight or toss in
a briefcase and take to your disaster recovery location and start
restoring immediately.

--
Les Mikesell
***@gmail.com
Pieter Wuille
2009-09-03 22:34:15 UTC
Permalink
On Thu, Sep 03, 2009 at 11:35:50AM -0500, Les Mikesell wrote:
> Pieter Wuille wrote:
> >
> >>> You're very right, and i thought about it too. Instead of using a RAID1 on
> >>> the offsite backup, there are two separate backups on the offsite machine,
> >>> and synchronisation switches between them. This also enables the use of
> >>> rsync's --inplace option.
> >> That should be safe enough, but doesn't that mean you xfer each set of
> >> changes twice since the alternate would be older?
> >
> > That's correct, but it hardly seems to matter. Due to a problem the offsite
> > machine was down once for over two weeks, and the subsequent synchronisation
> > run still only took 14h. The limiting factor is the sequential read speed of
> > the device, not the network.
>
> Your network between sites must be exceptionally fast - that's probably
> not a typical situation.

The network connection between them is indeed quite fast, but that doesn't
really matter. We're talking about a few gigabytes (maybe 10-20) of changes
that need to be transferred along with some checksums during a period of 14
hours. That's an average bandwidth of 3-4 megabit

> >>> Catting the part files together to a device after transmission isn't a
> >>> complete solution: what if the machine crashes during the catting...?
> >> The machine crash would have to destroy the filesystem containing the
> >> chunks to be a real problem. And then I wouldn't expect both your
> >> primary server and the server holding the file chunks to die at the same
> >> time, but it would mean you'd have to xfer the whole mess again.
> >> Perhaps you could alternate the catting to 2 different devices so you'd
> >> always have one ready to whisk off to the restore location.
> >
> > Yes, i was wrong. A crash during the catting would normally not hurt the
> > files that already were transmitted. As long as you don't start transferring
> > the next set during the catting of the previous, there is no problem.
>
> Maybe it doesn't matter if your network is as fast as your disks, but I
> like the idea of ending up with a disk you can ship overnight or toss in
> a briefcase and take to your disaster recovery location and start
> restoring immediately.

That's a conforting thought. But it is maybe possible to add write support to
my script, and use that on the remote side. That way you would build a real
remote mirror device immediately, instead of a set of files that can be used to
reconstruct it. I'll try that one of the next days. Using that patched rsync
would allow you to do the same...

--
Pieter
Jeffrey J. Kosowsky
2009-09-02 15:41:25 UTC
Permalink
Les Mikesell wrote at about 10:14:05 -0500 on Wednesday, September 2, 2009:
> Pieter Wuille wrote:
> >
> > To overcome this issue, i wrote a perl/fuse filesystem that allows you to
> > "mount" a block device (or real file) as a directory containing files
> > part0001.img, part0002.img, ... each representing 1 GiB of data of the
> > original device:
> >
> > https://svn.ulyssis.org/repos/sipa/backuppc-fuse/devfiles.pl
> >
> > This directory can be rsynced in a normal way with an "ordinary" directory
> > on an offsite backup. In case a restore is necessary, doing
> > 'ssh remote "cat /backup/part*.img" >/dev/sdXY' (or equivalent) suffices.
> > Although devfiles.pl has (limited) write support, rsync'ing to the resulting
> > directory is not yet possible - maybe i can try to have this working if
> > people have a need for it. This would allow restoration by simply rsync'ing
> > in the opposite direction.
> > Doing the synchronisation in groups of 1GiB prevents rsync from searching
> > too far, and splitting it in multiple files allows some parallellism
> > (sender transmitting data to receiver, while receiver already checksums
> > the next file; this is heavily limited by disk I/O however).
>
> Thanks for posting this. I've considered a very similar approach using
> a VMware .vmx image file using the options to pre-allocate the space and
> segment into chunks as an intermediate that would be directly usable by
> a vmware guest. I'm glad to hear that the rsync logistics would be
> practical.
>
> > In our case, the BackupPC pool is stored on an XFS filesystem on an LVM
> > volume, allowing a xfsfreeze/sync/snapshot/xfsunfreeze, and using
> > devfiles.pl on the snapshot. Instead of xfsfreeze+unfreeze, a backuppc
> > stop/umount + mount/backuppc start is also possible. If no system for making
> > snapshots is available, you would need to suspend backuppc during the whole
> > synchronisation.
> > In fact, the BackupPC volume is already encrypted on our backup server
> > itself, allowing very cheap encrypted offsite backups (simply not sending
> > the keyfile to the remote side is enough...)
> >
> > The result: offsite backups of our 400GiB pool, containing 350GiB data, of
> > which about 2GiB changes daily, is synchronised 5 times a week with offsite
> > backup in 12-15 hours, requiring nearly no bandwidth. This seems mostly
> > limited by the slow disk I/O on the receiver side (25MiB/s).
> >
> > Hope you find this interesting/useful,
>
> The one thing that would bother me about this approach is that you would
> have a fairly long window of time while the remote filesystem chunks are
> being updated. While rsync normally creates a copy of an individual
> file and does not delete the original until the copy is complete, a
> mis-matched set of filesystem chunks would likely not be usable. Since
> disasters always happen at the worst possible time, I'd want to be sure
> you could recover from losing the primary filesystem (site?) in the
> middle of a remote copy. This might be done by keeping a 2nd copy of
> the files at the remote location, keeping them on an LVM with a snapshot
> taken before each update, or perhaps catting them together onto a
> removable device for fast access after the chunks update.

You could also could try using rsync with --link-dest which would
create a 2nd copy that hard links to blocks that are the same and only
copies in new blocks. With luck some of the blocks might be the same,
saving you some storage vs. a full 2nd copy.
Christian Völker
2009-09-02 18:20:13 UTC
Permalink
Les Mikesell wrote:

> a VMware .vmx image file using the options to pre-allocate the space and
> segment into chunks as an intermediate that would be directly usable by
> a vmware guest.
There is a solution for VMware vSphere (ESX/VC 4.0) which would be
perfect. VMware Data Recovery claims to backup a virtual Disk- but after
a first full backup only the changed blocks to NFS or wherever you like.

I haven't tested it for myself, but as far as I can see this is what we
all want. Just the disadvantage to buy the VMware vSphere and VDR
products....and your backuppc needs to be virtual (is here anyways....).


> I'm glad to hear that the rsync logistics would be practical.
I'd prefer this, too and I'm waiting for some feedback. If I can get
some time I'll try it for myself soon.

Greetings

Christian
dan
2009-09-02 20:18:41 UTC
Permalink
Can I offer an alternative solution? How about using bittorrent?

if you patch the btmakemeta and download.py files as show here:
http://osdir.com/ml/network.bit-torrent.general/2003-12/msg00356.html

(stop backuppc, unmount filesystem)
you can create a torrent of your block device
btmakemeta /dev/md0 tracker_url --target md0.torrent
then run it
btdownloadcurses md0.torrent --saveas /dev/md0

you can patch the download.py on the second machine and download that
torrent with the target being the block device.
now make sure backuppc is stopped and then unmount the filesystem
btdownloadcurses md0.torrent --saveas /dev/md0

bittorrent will take some time creating the torrent file as it has to scan
every block. Now, the first run may take a long time because they were
never block-for-block backups which means that it will have to write every
block. each run after than should be pretty quick though.

I would also suggest unmounting the filesystem, running fsck and then
remounting it once on the original machine so that you dont propogate any
errors.

You can use the seed% setting to close the downloaded after 1 complete
seeding and then remount the filesystem and start backups. same on the
second machine, put it to seed 0 so bittorrent closes immediately and then
remount the filesystem. with scp and ssh keys or ftp or something you can
automate the whole process if you really like.









2009/9/2 Christian Völker <***@knebb.de>

> Les Mikesell wrote:
>
> > a VMware .vmx image file using the options to pre-allocate the space and
> > segment into chunks as an intermediate that would be directly usable by
> > a vmware guest.
> There is a solution for VMware vSphere (ESX/VC 4.0) which would be
> perfect. VMware Data Recovery claims to backup a virtual Disk- but after
> a first full backup only the changed blocks to NFS or wherever you like.
>
> I haven't tested it for myself, but as far as I can see this is what we
> all want. Just the disadvantage to buy the VMware vSphere and VDR
> products....and your backuppc needs to be virtual (is here anyways....).
>
>
> > I'm glad to hear that the rsync logistics would be practical.
> I'd prefer this, too and I'm waiting for some feedback. If I can get
> some time I'll try it for myself soon.
>
> Greetings
>
> Christian
>
>
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus
> on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now. http://p.sf.net/sfu/bobj-july
> _______________________________________________
> BackupPC-users mailing list
> BackupPC-***@lists.sourceforge.net
> List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
> Wiki: http://backuppc.wiki.sourceforge.net
> Project: http://backuppc.sourceforge.net/
>
Tino Schwarze
2009-09-03 10:15:32 UTC
Permalink
On Wed, Sep 02, 2009 at 02:18:41PM -0600, dan wrote:

> Can I offer an alternative solution? How about using bittorrent?

I don't see the benefits over using the patched rsync... What am I
missing? After all it's still read-all-blocks - compare checksums -
transfer changes, right?

Tino.

--
"What we nourish flourishes." - "Was wir nähren erblüht."

www.lichtkreis-chemnitz.de
www.craniosacralzentrum.de
dan
2009-09-05 16:01:57 UTC
Permalink
because bittorrent stores the file list in a file and bittorrent clients use
an index for downloaded bits. rsync stores the filelist in ram.

Also, there is a patch out there for bittorrent (very easy to apply) that
allows you to make a torrent of a block device. rsync wont do this.

One more benny, you can have MULTIPLE remote peers with bittorrent allowing
you to distribute the data to 3 sites if you like and save time/bandwidth at
the same time.

On Thu, Sep 3, 2009 at 4:15 AM, Tino Schwarze <***@tisc.de>wrote:

> On Wed, Sep 02, 2009 at 02:18:41PM -0600, dan wrote:
>
> > Can I offer an alternative solution? How about using bittorrent?
>
> I don't see the benefits over using the patched rsync... What am I
> missing? After all it's still read-all-blocks - compare checksums -
> transfer changes, right?
>
> Tino.
>
> --
> "What we nourish flourishes." - "Was wir nähren erblüht."
>
> www.lichtkreis-chemnitz.de
> www.craniosacralzentrum.de
>
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus
> on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now. http://p.sf.net/sfu/bobj-july
> _______________________________________________
> BackupPC-users mailing list
> BackupPC-***@lists.sourceforge.net
> List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
> Wiki: http://backuppc.wiki.sourceforge.net
> Project: http://backuppc.sourceforge.net/
>
Les Mikesell
2009-09-05 16:25:09 UTC
Permalink
dan wrote:
> because bittorrent stores the file list in a file and bittorrent clients
> use an index for downloaded bits. rsync stores the filelist in ram.

But how good is bittorrent at finding arbitrarily small differences between the
old/new copies and resynchronizing on the matches?

--
Les Mikesell
***@gmail.com
dan
2009-09-06 00:35:16 UTC
Permalink
This post might be inappropriate. Click to display it.
Tino Schwarze
2009-09-07 11:38:02 UTC
Permalink
On Sat, Sep 05, 2009 at 06:35:16PM -0600, dan wrote:

[...]

> Thinking about the logistics in the method I have thought up a few hurdles.
> The source disks must remain unchanged during the entire sync.

> You would need to either have a spare disk in a raid1 mirror that you
> could remove from the array and source from that, or you need to do
> some more hacking to bittorrent so that it could update the torrent
> file during the backup to reflect changes(lots of work i think)

You may use LVM snapshots for that purpose. If course, then your device
name might change or something like this and you need to work around
that.

Tino.

--
"What we nourish flourishes." - "Was wir nähren erblüht."

www.lichtkreis-chemnitz.de
www.craniosacralzentrum.de
Christian Völker
2009-09-09 07:33:03 UTC
Permalink
Hi,

as I have the same issue with storing my BackupPC outside I tried
another way the last days:

First, my environment:
28 hosts to back up. Mostly idle machines with minor services (so no big
databases and so on). Partially fileserver with only little daily
changes. So I expected not too much daily changes on the pool.
I want to copy the pool to a remote location after testing is done.

So I used an USB 2.0 disk as "second backup device" to store the copied
pool.
The pool itself is aprox 540GB in total. And is not growing any more-
even not all 12 full backups are stored.

My first attempt was rsync the pool. This tooks ages.
Second attempt was to "dd" the whole device (with less frequency), but
for the size of the pool, took ages, too.
Third try was to use "dump" for this hoping it would transfer only
changed blocks after the initial dump. No way. After one day it
transferred 300GB (!). I thought, dump might not bee a good solution...
Last attempt now was to move the pool device to a VMware virtual disk
and let rsync run over this file. Thus rsync backing up the block
device. Best attepmt, I thought. Result: rsync transfers after the first
run ~300GB, too.

So what does this mean to me?

Looks like on my pool BackupPC changes approx two third of the whole
pool daily. So if I would transfer to a remote site I'd need to transfer
300GB daily!

Can anyone confirm BackupPC changes so many data inside the pool DAILY?

I can't imagine the backed up data itself is 300GB!

Any clues?

Christian
Tino Schwarze
2009-09-09 09:46:41 UTC
Permalink
Hi Christian,

On Wed, Sep 09, 2009 at 09:33:03AM +0200, Christian Völker wrote:

> First, my environment:
> 28 hosts to back up. Mostly idle machines with minor services (so no big
> databases and so on). Partially fileserver with only little daily
> changes. So I expected not too much daily changes on the pool.
> I want to copy the pool to a remote location after testing is done.
>
> So I used an USB 2.0 disk as "second backup device" to store the copied
> pool.

I'd say: Replace that USB 2.0 disk by something else like something
connected via Firewire or eSATA. USB 2.0 is very, very slow, especially
for random access.

> The pool itself is aprox 540GB in total. And is not growing any more-
> even not all 12 full backups are stored.
>
> My first attempt was rsync the pool. This tooks ages.
> Second attempt was to "dd" the whole device (with less frequency), but
> for the size of the pool, took ages, too.
> Third try was to use "dump" for this hoping it would transfer only
> changed blocks after the initial dump. No way. After one day it
> transferred 300GB (!). I thought, dump might not bee a good solution...
> Last attempt now was to move the pool device to a VMware virtual disk
> and let rsync run over this file. Thus rsync backing up the block
> device. Best attepmt, I thought. Result: rsync transfers after the first
> run ~300GB, too.

Do you want to try DRBD and see how that works? Might be a more complex
setup though..

> So what does this mean to me?
>
> Looks like on my pool BackupPC changes approx two third of the whole
> pool daily. So if I would transfer to a remote site I'd need to transfer
> 300GB daily!

That sounds far too much for a 540 GB pool. Did you try again the next
day?

But there is currently no out-of-the-box or best practice to transfer a
BackupPC pool to a remote location.

Tino.

--
"What we nourish flourishes." - "Was wir nähren erblüht."

www.lichtkreis-chemnitz.de
www.craniosacralzentrum.de
Christian Völker
2009-09-09 10:12:46 UTC
Permalink
Hi,

> I'd say: Replace that USB 2.0 disk by something else like something
> connected via Firewire or eSATA. USB 2.0 is very, very slow, especially
> for random access.
I know, but that's not the point here. Speed doesn't concern me- the
copy should later go over a slow link anyways. And for this link USB2.0
is fast enough.


> Do you want to try DRBD and see how that works? Might be a more complex
> setup though..
I had this in mind- but as it should go over a small line I preferred to
figure out the amount of data first.

>> Looks like on my pool BackupPC changes approx two third of the whole
>> pool daily. So if I would transfer to a remote site I'd need to transfer
>> 300GB daily!
> That sounds far too much for a 540 GB pool. Did you try again the next
> day?
It matches my experience with the dump utility. I created the second one
with a size of 300GB after a day. And the next day again. And so on...

Any possible ways to verify why this is so many data? I moutned the pool
with "noatime,data=ordered" for ext3.

> But there is currently no out-of-the-box or best practice to transfer a
> BackupPC pool to a remote location.
I know. For this I wanted to share my experience here.

Greetings


Christian
Dan Pritts
2009-09-11 17:40:02 UTC
Permalink
On Wed, Sep 09, 2009 at 11:46:41AM +0200, Tino Schwarze wrote:
> I'd say: Replace that USB 2.0 disk by something else like something
> connected via Firewire or eSATA. USB 2.0 is very, very slow, especially
> for random access.

Hi Tino,

do you have empirical results that show this?

Not having tested it myself, that is exactly the opposite of what i
would expect.

random access times are dominated primarily by disk head seek time,
which is gonna be the same no matter what the transport to the drive is.
So the slower transport won't matter nearly as much with random I/O as
it will with sequential.

SATA or SAS/SCSI with command queueing should have better random access
performance than anything without command queueing. However, I don't
believe firewire has command queueing support, which would suggest that
this isn't what you're thinking of.

danno
Les Mikesell
2009-09-11 18:18:03 UTC
Permalink
Dan Pritts wrote:
> On Wed, Sep 09, 2009 at 11:46:41AM +0200, Tino Schwarze wrote:
>> I'd say: Replace that USB 2.0 disk by something else like something
>> connected via Firewire or eSATA. USB 2.0 is very, very slow, especially
>> for random access.
>
> Hi Tino,
>
> do you have empirical results that show this?
>
> Not having tested it myself, that is exactly the opposite of what i
> would expect.
>
> random access times are dominated primarily by disk head seek time,
> which is gonna be the same no matter what the transport to the drive is.
> So the slower transport won't matter nearly as much with random I/O as
> it will with sequential.
>
> SATA or SAS/SCSI with command queueing should have better random access
> performance than anything without command queueing. However, I don't
> believe firewire has command queueing support, which would suggest that
> this isn't what you're thinking of.

I think USB has a bit more CPU overhead than firewire so firewire would
give somewhat better throughput. But eSATA or a hot-swap internal bay
for a bare SATA drive would be even better.

--
Les Mikesell
***@gmail.com
Michael Stowe
2009-09-11 18:47:45 UTC
Permalink
> On Wed, Sep 09, 2009 at 11:46:41AM +0200, Tino Schwarze wrote:
>> I'd say: Replace that USB 2.0 disk by something else like something
>> connected via Firewire or eSATA. USB 2.0 is very, very slow, especially
>> for random access.
>
> Hi Tino,
>
> do you have empirical results that show this?
>
> Not having tested it myself, that is exactly the opposite of what i
> would expect.

Errr... Do you mean that USB 2.0 would be *faster* then eSATA for random
access, or do you mean that USB 2.0 would always slower than eSATA, but
not as much slower for random access?

In either case, you're likely to be incorrect, there's a study here:

http://www.rt.db.erau.edu/655s08/655webUSBSAT/analysis.htm

> random access times are dominated primarily by disk head seek time,
> which is gonna be the same no matter what the transport to the drive is.
> So the slower transport won't matter nearly as much with random I/O as
> it will with sequential.

This is not quite correct, because each round trip to the drive controller
experiences additional latency, and the round trip latency adds up.

> SATA or SAS/SCSI with command queueing should have better random access
> performance than anything without command queueing. However, I don't
> believe firewire has command queueing support, which would suggest that
> this isn't what you're thinking of.
>
> danno

It's probably worth keeping in mind that a USB 2.0 attached drive is
actually attached to either a SATA or an IDE controller; so you're either
comparing USB+SATA or USB+IDE to eSATA.
Dan Pritts
2009-09-11 20:33:25 UTC
Permalink
On Fri, Sep 11, 2009 at 01:47:45PM -0500, Michael Stowe wrote:
> Errr... Do you mean that USB 2.0 would be *faster* then eSATA for random
> access, or do you mean that USB 2.0 would always slower than eSATA, but
> not as much slower for random access?

Obviously, USB2 is slower than SATA, and is slower than anything else
short of USB1 or floppy disk.

I'm surprised by the "especially for random access" bit.

> In either case, you're likely to be incorrect, there's a study here:
>
> http://www.rt.db.erau.edu/655s08/655webUSBSAT/analysis.htm

I don't see anything there that talks about random access. It shows
that USB is slower than SATA when transferring a single 80MB file, and
that USB continues to be slower than SATA when transferring a single
256MB file.

> > random access times are dominated primarily by disk head seek time,
> > which is gonna be the same no matter what the transport to the drive is.
> > So the slower transport won't matter nearly as much with random I/O as
> > it will with sequential.
>
> This is not quite correct,

i never said it was completely "correct", i said "dominated primarily"
and "won't matter nearly as much".

> because each round trip to the drive controller
> experiences additional latency, and the round trip latency adds up.

I don't know what kind of additional latency USB has vs. SATA, a quick
web search didn't show me anything useful.

I'm sure there's some, but I'd be surprised to see numbers that showed
that the extra latency for USB is significant compared to the average
time-to-access inherent in the disk mechanism (avg seek + avg rotational
latency). Let's guess that it adds 10% to the total latency of a request.
Maybe i'm way off-base here, feel free to provide data.

My own real-world measurements showed, with the same target hard disk,
25 MB/sec bulk throughput via USB2 and 40MB/sec throughput via eSATA.

I was surprised at how slow the eSATA throughput was, actually, but did
not investigate further.

The test was done on a linux system, dd'ing from a fast disk array to
the target hard disk.

> It's probably worth keeping in mind that a USB 2.0 attached drive is
> actually attached to either a SATA or an IDE controller; so you're either
> comparing USB+SATA or USB+IDE to eSATA.

sure, and it's easy to remove this source of error by using the same
sata hard disk for each set of measurements.

danno
--
Dan Pritts, Sr. Systems Engineer
Internet2
office: +1-734-352-4953 | mobile: +1-734-834-7224

Fall 2009 Internet2 Member Meeting, October 5-8
Hosted by the University of Texas at San Antonio and LEARN
http://events.internet2.edu/2009/fall-mm/
Tino Schwarze
2009-09-14 11:23:57 UTC
Permalink
Hi Dan,

On Fri, Sep 11, 2009 at 01:40:02PM -0400, Dan Pritts wrote:

> > I'd say: Replace that USB 2.0 disk by something else like something
> > connected via Firewire or eSATA. USB 2.0 is very, very slow, especially
> > for random access.
>
> do you have empirical results that show this?

I did not do benchmarks. It's just my personal experience that I've yet
to see an USB-attached disk which feels fast. Remember: Disks do not
speak USB, they are adressed via IDE or SATA. So, if you use USB, you
get an additional translation layer.

Apart from that it looks like USB is not optimized for fast transfer and
low latency. SATA et al are designed for adressing hard disks, they
don't care about input devices etc. So there is less overhead.

Tino.

--
"What we nourish flourishes." - "Was wir nähren erblüht."

www.lichtkreis-chemnitz.de
www.craniosacralzentrum.de
dan
2009-09-14 19:20:51 UTC
Permalink
USB is slower because
a) there is an additional protocol translation to/from USB
b) USB chipsets must hand off data to the CPU for processing which causes
each piece of data to have additional latency going through the CPU once as
raw USB packets to be translated by the driver and then again by whatever
app is processing that data. SATA/SAS/IDE all have DMA so they can dump the
usable data to memory and the CPU can process it once from there.
c)because USB packets (for storage devices) are fairly simple packets to
decode, its the mhz that matter as its how fast the packet can be pushed
through. Improving controller design can only have a marginal impact on
performance unless a high speed controller is used specifically for storage
devices(i dont believe there are any on the market).
d)USB devices rely on a driver to process the raw USB packets into
scsi/ide/ata packets. SATA/IDE controllers require a driver only to read
packets already in the appropriate format. More processing is done in the
driver and software tends to have more latency that hardware.

to break that down to a sign phrase. USB requires multiple levels of data
processing to get the data delivered to the OS while specialize storage
interfaces do most of the work in a hardware chip before handing data to the
OS.

On Mon, Sep 14, 2009 at 5:23 AM, Tino Schwarze <***@tisc.de>wrote:

> Hi Dan,
>
> On Fri, Sep 11, 2009 at 01:40:02PM -0400, Dan Pritts wrote:
>
> > > I'd say: Replace that USB 2.0 disk by something else like something
> > > connected via Firewire or eSATA. USB 2.0 is very, very slow, especially
> > > for random access.
> >
> > do you have empirical results that show this?
>
> I did not do benchmarks. It's just my personal experience that I've yet
> to see an USB-attached disk which feels fast. Remember: Disks do not
> speak USB, they are adressed via IDE or SATA. So, if you use USB, you
> get an additional translation layer.
>
> Apart from that it looks like USB is not optimized for fast transfer and
> low latency. SATA et al are designed for adressing hard disks, they
> don't care about input devices etc. So there is less overhead.
>
> Tino.
>
> --
> "What we nourish flourishes." - "Was wir nähren erblüht."
>
> www.lichtkreis-chemnitz.de
> www.craniosacralzentrum.de
>
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus
> on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now. http://p.sf.net/sfu/bobj-july
> _______________________________________________
> BackupPC-users mailing list
> BackupPC-***@lists.sourceforge.net
> List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
> Wiki: http://backuppc.wiki.sourceforge.net
> Project: http://backuppc.sourceforge.net/
>
Jim Leonard
2009-09-16 00:17:42 UTC
Permalink
dan wrote:
> b) USB chipsets must hand off data to the CPU for processing which
> causes each piece of data to have additional latency going through the
> CPU once as raw USB packets to be translated by the driver and then
> again by whatever app is processing that data. SATA/SAS/IDE all have
> DMA so they can dump the usable data to memory and the CPU can process
> it once from there.
> c)because USB packets (for storage devices) are fairly simple packets to
> decode, its the mhz that matter as its how fast the packet can be pushed
> through. Improving controller design can only have a marginal impact on
> performance unless a high speed controller is used specifically for
> storage devices(i dont believe there are any on the market).
> d)USB devices rely on a driver to process the raw USB packets into
> scsi/ide/ata packets. SATA/IDE controllers require a driver only to
> read packets already in the appropriate format. More processing is done
> in the driver and software tends to have more latency that hardware.

All of the above is incorrect in more ways than I care to go into.

But the core message is the same: USB *is* slower than eSATA simply
because USB 2.0 has a maximum bandwidth of 480mbps. After driver and
protocol overhead, you usually get about 33 MB/s out of a USB drive.
The same drive hooked up via eSATA runs at drive speed, usually between
60MB/s to 120MB/s depending on how fast the drive actually is.
--
Jim Leonard (***@oldskool.org) http://www.oldskool.org/
Help our electronic games project: http://www.mobygames.com/
Or check out some trippy MindCandy at http://www.mindcandydvd.com/
A child borne of the home computer wars: http://trixter.wordpress.com/
Dan Pritts
2009-09-16 16:44:54 UTC
Permalink
heck yes, USB is slow overall. Sorry i didn't make it clear that
i understood this.

as you may have noticed in the other message chain, I was wondering about
"especially for random access", as opposed to just slow in general.

i'd expect USB-attached drives to work relatively better for random-access
than they do for bulk transfer. **In neither case would i expect them to
be fast.**

In the other sub-thread we went back and forth about this and decided
it we both agreed usb sucked overall and it wasn't worth arguing about
exactly how it sucked. :)

danno

On Mon, Sep 14, 2009 at 01:23:57PM +0200, Tino Schwarze wrote:
> Hi Dan,
>
> On Fri, Sep 11, 2009 at 01:40:02PM -0400, Dan Pritts wrote:
>
> > > I'd say: Replace that USB 2.0 disk by something else like something
> > > connected via Firewire or eSATA. USB 2.0 is very, very slow, especially
> > > for random access.
> >
> > do you have empirical results that show this?
>
> I did not do benchmarks. It's just my personal experience that I've yet
> to see an USB-attached disk which feels fast. Remember: Disks do not
> speak USB, they are adressed via IDE or SATA. So, if you use USB, you
> get an additional translation layer.
>
> Apart from that it looks like USB is not optimized for fast transfer and
> low latency. SATA et al are designed for adressing hard disks, they
> don't care about input devices etc. So there is less overhead.

danno
--
Dan Pritts, Sr. Systems Engineer
Internet2
office: +1-734-352-4953 | mobile: +1-734-834-7224

Fall 2009 Internet2 Member Meeting, October 5-8
Hosted by the University of Texas at San Antonio and LEARN
http://events.internet2.edu/2009/fall-mm/
Pieter Wuille
2009-09-09 11:08:51 UTC
Permalink
On Wed, Sep 09, 2009 at 09:33:03AM +0200, Christian Völker wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi,
>
> as I have the same issue with storing my BackupPC outside I tried
> another way the last days:
>
> First, my environment:
> 28 hosts to back up. Mostly idle machines with minor services (so no big
> databases and so on). Partially fileserver with only little daily
> changes. So I expected not too much daily changes on the pool.
> I want to copy the pool to a remote location after testing is done.
>
<snip>
> So what does this mean to me?
>
> Looks like on my pool BackupPC changes approx two third of the whole
> pool daily. So if I would transfer to a remote site I'd need to transfer
> 300GB daily!
>
> Can anyone confirm BackupPC changes so many data inside the pool DAILY?
>
> I can't imagine the backed up data itself is 300GB!

I used LVM snapshots to see how much data changes. Put your BackupPC volume
on an LVM logical volume, let it come to some stable state (disk usage
+- doesn't change anymore), stop backuppc, umount, take an LVM snapshot,
mount, start backuppc, and periodically check lvdisplay to see how
much % of your snapshot is filled. It should give a good idea of how much
data changes. On your setup with >300 home directories, and as much mysql and
pgsql databases, resulting in a pool of 334GiB, +- 1-3GiB changes daily.

--
Pieter
Les Mikesell
2009-09-09 12:59:18 UTC
Permalink
Christian Völker wrote:
>
> First, my environment:
> 28 hosts to back up. Mostly idle machines with minor services (so no big
> databases and so on). Partially fileserver with only little daily
> changes. So I expected not too much daily changes on the pool.
> I want to copy the pool to a remote location after testing is done.
>
> So I used an USB 2.0 disk as "second backup device" to store the copied
> pool.
> The pool itself is aprox 540GB in total. And is not growing any more-
> even not all 12 full backups are stored.
>
> My first attempt was rsync the pool. This tooks ages.
> Second attempt was to "dd" the whole device (with less frequency), but
> for the size of the pool, took ages, too.

With reasonable fast drives, this should take 2 or 3 hours for disks under 1TB.
USB would slow it down some compared to SATA.

> Third try was to use "dump" for this hoping it would transfer only
> changed blocks after the initial dump. No way. After one day it
> transferred 300GB (!). I thought, dump might not bee a good solution...

I suspect dump sees the link count change in the pool file's inode as you expire
a backup and add a new one as an update.

> Last attempt now was to move the pool device to a VMware virtual disk
> and let rsync run over this file. Thus rsync backing up the block
> device. Best attepmt, I thought. Result: rsync transfers after the first
> run ~300GB, too.

I haven't tried that approach but always thought it should work. Can you try
using the vmware option to split the virtual disk into files of 1 or 2 GB?
Perhaps rsync has trouble finding resync points after a change on such a large
volume. Also you might be able to use vmware's snapshots to see how much really
is changing.

> So what does this mean to me?
>
> Looks like on my pool BackupPC changes approx two third of the whole
> pool daily. So if I would transfer to a remote site I'd need to transfer
> 300GB daily!
>
> Can anyone confirm BackupPC changes so many data inside the pool DAILY?

That will depend on your data. Large files that change a little (growing logs,
unix style mailboxes, databases, vmware images, etc.) can cause a big turnover
but files that don't change at all should remain in the pool with nothing but
the link count changing (which causes a ctime change as a side effect).

--
Les Mikesell
***@gmail.com
Christian Völker
2009-09-09 15:56:25 UTC
Permalink
Hi,

>> Third try was to use "dump" for this hoping it would transfer only
>> changed blocks after the initial dump. No way. After one day it
>> transferred 300GB (!). I thought, dump might not bee a good solution...
> I suspect dump sees the link count change in the pool file's inode as you expire
> a backup and add a new one as an update.
Yeah, something like this. So dump cannot be used. I don't need to
understand, I'm just surprised.

>> Last attempt now was to move the pool device to a VMware virtual disk
>> and let rsync run over this file. Thus rsync backing up the block
>> device. Best attepmt, I thought. Result: rsync transfers after the first
>> run ~300GB, too.
> I haven't tried that approach but always thought it should work.
I thought this might be a good idea, indeed. But it issn't.

Here the results of the "update" run, taken from the LVM snapshot. I
can't access the file directly as it's locked by VMware ESX (BackupPC is
running). The update run was started approx 12h after the first one.

=====================================================
sent 583112982865 bytes received 55 bytes 15555694.41 bytes/sec
total size is 583041810920 speedup is 1.00
=====================================================
So rsync transferred the whole file again. No speedup. :-(
I'll try to split it in 2GB chunks but I think it's not really worth
testing. Total run time approx 10 hours!

This is the result of the LVM snapshot taken before rsync and removed
after rsync done:
=====================================================
Using logical volume(s) on command line
--- Logical volume ---
LV Name /dev/vg1/snap_backuppc
VG Name vg1
LV UUID CcEkAd-W8dk-EEPz-9DU3-LX5t-FMc7-xSBpqu
LV Write Access read only
LV snapshot status active destination for /dev/vg1/backup_nfs
LV Status available
# open 0
LV Size 553.75 GB
Current LE 17720
COW-table size 30.00 GB
COW-table LE 960
Allocated to snapshot 5.27%
Snapshot chunk size 64.00 KB
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:3

Wed Sep 9 16:41:49 CEST 2009
=====================================================


> Also you might be able to use vmware's snapshots to see how much really
> is changing.
Yeah, good idea. Just to realize how many stuff changes during a day.


Any other clue? Looks like this is really an issue- as some special
tricks like I tried don't work either...

I'll check with the VMware snapshots about the amount which changes- and
then I'll ahve a look at drbs (or so ;)) if it's suitable through a slow
remote line.


Any further ideas?

Christian
Christian Völker
2009-09-10 09:42:04 UTC
Permalink
Hi,
> Also you might be able to use vmware's snapshots to see how much really
> is changing.
I did this.
I took a VMware snapshot yest evening. And it has been grown within
12hrs now aprox 5GB. I'll do some further monitoring, but it seem
indeed, there's not so many data which changes daily as dump might assume.

And I have another idea 'coz of rsync copying the whole .vdmk file over-
this could be related to access time. Obviously the times changes as the
file is under continously access from the ESX host. I'll see if I can
skip the time identification and force rsyn just to check the content.

For the 2GB chunks idea it'll take quiet a while to migrate the 540GB
vmdk to 2GB chunks. Oh, and I think ESX does not support this type...

Greetings

Christian
Les Mikesell
2009-09-10 16:02:15 UTC
Permalink
Christian Völker wrote:
>
>> Also you might be able to use vmware's snapshots to see how much really
>> is changing.
> I did this.
> I took a VMware snapshot yest evening. And it has been grown within
> 12hrs now aprox 5GB. I'll do some further monitoring, but it seem
> indeed, there's not so many data which changes daily as dump might assume.
>
> And I have another idea 'coz of rsync copying the whole .vdmk file over-
> this could be related to access time. Obviously the times changes as the
> file is under continously access from the ESX host. I'll see if I can
> skip the time identification and force rsyn just to check the content.

You'll almost certainly end up with an unusable copy if you let the VM
run while copying - unless you can isolate changes with a snapshot. I'm
not a filesystem expert, but the filesystem type might make a big
difference in how much rsync has to copy. I believe some filesystems
cluster the inodes and some distribute them across the disk. The ctimes
will change in a lot of the inodes each day as the link counts change,
and if the inode data is distributed, rsync may find many more locations
with changes.

> For the 2GB chunks idea it'll take quiet a while to migrate the 540GB
> vmdk to 2GB chunks. Oh, and I think ESX does not support this type...

I think vmware server can do it, but I'm having trouble finding a spot
with enough space since I took the full amount on the physical drives
for what I want to mirror.

--
Les Mikesell
***@gmail.com
Carl Wilhelm Soderstrom
2009-09-10 16:42:33 UTC
Permalink
On 09/10 11:02 , Les Mikesell wrote:
> You'll almost certainly end up with an unusable copy if you let the VM
> run while copying


To test this, some years ago I actually tried archiving and restoring a
vmware disk image while it was in use. Turns out it worked just fine.
That said, the vmware session was not in active use at the time and I
certainly wouldn't trust this to not give thoroughly corrupt data.

You may be able to get away with it occasionally; but on any system in
production the amount of disk activity will likely corrupt your backup.

--
Carl Soderstrom
Systems Administrator
Real-Time Enterprises
www.real-time.com
Timothy J Massey
2009-09-10 18:14:06 UTC
Permalink
Carl Wilhelm Soderstrom <***@real-time.com> wrote on 09/10/2009
12:42:33 PM:

> On 09/10 11:02 , Les Mikesell wrote:
> > You'll almost certainly end up with an unusable copy if you let the VM

> > run while copying
>
>
> To test this, some years ago I actually tried archiving and restoring a
> vmware disk image while it was in use. Turns out it worked just fine.
> That said, the vmware session was not in active use at the time and I
> certainly wouldn't trust this to not give thoroughly corrupt data.
>
> You may be able to get away with it occasionally; but on any system in
> production the amount of disk activity will likely corrupt your backup.

You are right: it comes completely down to a lot of uncontrollable items

For example, if the *guest* OS has cached certain items in RAM, how will
you know? If the guest has left the filesystem in an unrecoverable state,
how will you know? Or if the *host* operating system is playing games
with something, how will you know?

Of course, the bigger problem is that the disk is basically *guaranteed*
to be inconsistent, as you copy the file(s) and the guest continues to
make changes behind you... After all, if the system wasn't doing anything
during the copy, you could have simply shut it down! :)

It's even worse than pulling the plug on a physical machine and hoping
that it comes up. At least with real hardwrae, you've got an
internally-consistent (assuming journalling) image of the file system. 99%
of the time, it will be OK. 1% of the time, it won't. I don't want to
find out that I've hit that 1% when I'm trying to recover a production
file server! :)

There are other issues that one should test as well. For example, have
you tried restoring a guest to a *different* host? One with a different
CPU? Depending on the combination of CPU features (such as NX, 32/64-bit,
virtualization support, etc.), it may not be possible to restore a running
guest to a different host without crashing it. It's always a good idea to
test for these things! (Even with the for-pay vMotion tools, there are
limits on what hosts you can switch between without problems.)

Or do what we do: shut down the guest before copying it somewhere. We
only do this on occasion (usually quarterly), so the couple of minutes of
downtime is manageable. We don't consider this as backup, but disaster
recovery. We can restore a, say, month-old copy of the server, then use
BackupPC to restore the data from within the guest. Still *way* faster
than actual bare-metal restores, for no additional cost.

Tim Massey
Timothy J Massey
2009-09-10 17:57:56 UTC
Permalink
Les Mikesell <***@gmail.com> wrote on 09/10/2009 12:02:15 PM:

> > And I have another idea 'coz of rsync copying the whole .vdmk file
over-
> > this could be related to access time. Obviously the times changes as
the
> > file is under continously access from the ESX host. I'll see if I can
> > skip the time identification and force rsyn just to check the content.
>
> You'll almost certainly end up with an unusable copy if you let the VM
> run while copying - unless you can isolate changes with a snapshot.

Even then, this is an unsupported state, and will fail on restore (the
worst possible time!) in most cases. The only way to do this properly is
to:

Shut down the guest, quiesce the filesystem (which may happen
automatically: e.g. ext3), take an LVM snapshot, and restart the guest,
or

WUse VMware's snapshot functionality and just copy the files
normally without any kind of LVM snapshot--and plan on reverting the
snapshot if you restore the VM!

We combine these two for maximum reliability: shut down the guest, VMware
snapshot the guest, restart the guest and copy the files from the
filesystem. Once the copy has finished, we simply delete the snapshot
(which freezes the guest for a few *minutes*, so be prepared for this). In
the event of a restore, we revert the snapshot (returning the VM to the
shut down state before the snapshot) and restart the VM just as if we were
simply powering it up cleanly.

Minimal fuss and greatest reliability in the event of disaster. Not only
does this give you the cleanest state, it allows you to restore a guest VM
to a machine with different underlying hardware (there are CPUID and NX
issues if you try to do this with a powered-on guest). The biggest
hassles are having to shut down the guest for a minute or two to create a
clean snapshot, and the freezing of the guest for some small period of
time while the snapshot is committed.

It is *not* completely transparent, but that's what you get when you're
using the free tools. If you want truly transparent, VMware will happily
sell you their vMotion and backup tools. They're even relatively
reasonably priced.

> > For the 2GB chunks idea it'll take quiet a while to migrate the 540GB
> > vmdk to 2GB chunks. Oh, and I think ESX does not support this type...
>
> I think vmware server can do it, but I'm having trouble finding a spot
> with enough space since I took the full amount on the physical drives
> for what I want to mirror.

VMware Sever 2.0 supports disks in single files, or split in
arbitrary-sized pieces (usually 2GB), both fully allocated or
grow-on-demand. ESX/ESXi only supports single-file images, and it is
designed to use VMFS. I don't know for sure if you can use split images
across an NFS share (the only non-block NAS-style share supported by
ESX/ESXi), but my guess is no--the tools won't let you, and part of the
process of converting a guest from Server to ESX/ESXi is to create the
appropriate image file format.

Tim Massey
Les Mikesell
2009-09-10 18:41:58 UTC
Permalink
Timothy J Massey wrote:
>
> VMware Sever 2.0 supports disks in single files, or split in
> arbitrary-sized pieces (usually 2GB), both fully allocated or
> grow-on-demand. ESX/ESXi only supports single-file images, and it is
> designed to use VMFS.

I thought there was a way to access the vmx image directly from linux,
but I don't know if it has to be mounted or if you can see a raw
partition. I was hoping I could make a virtual partition visible on
physical backuppc server so I could sync it as a raid1, then remove it
and have something that could be used in a virtual machine or copied via
rsync, then used by a remote virtual machine.

--
Les Mikesell
***@gmail.com
Timothy J Massey
2009-09-10 20:04:15 UTC
Permalink
Les Mikesell <***@gmail.com> wrote on 09/10/2009 02:41:58 PM:

> Timothy J Massey wrote:
> >
> > VMware Sever 2.0 supports disks in single files, or split in
> > arbitrary-sized pieces (usually 2GB), both fully allocated or
> > grow-on-demand. ESX/ESXi only supports single-file images, and it is
> > designed to use VMFS.
>
> I thought there was a way to access the vmx image directly from linux,
> but I don't know if it has to be mounted or if you can see a raw
> partition.

Depends: are we talking VMware Server, or ESX/ESXi? And if we're talking
ESXi, are we talking SAN, NAS or DAS (directly attached storage)?

In the case of Server, the VMX images are right there on a Linux
filesystem: do with them what you want. Just snapshot properly. I think
I've commented enough on this in other e-mails.

In the case of ESX, you can use the Linux management console to deal with
them, almost like a native Linux filesystem, but with the same limitations
as above. Of course, with ESX, you have the proper tools for block-based
changes-only backups already, using VMware's tools.

In the case of ESXi (which has *zero* supported management on the server),
you have to do it on the storage device. (Or, pay for the VMware tools,
which is by far the simplest, most reliable choice). Therefore, it
depends on how the storage is connected to the server.

In the case of DAS, you're *very* stuck without doing weirdness
like using unsupported tricks to start an SSH server on the ESXi
machine--not recommended for production. Or using the for-pay tools. But
ESXi with DAS makes very little sense.

In the case of SAN (e.g. iSCSI or Fibre Channel), you're still
stuck, because while the block device might be accessible remotely, it's
still using VMFS, and you can't exactly mount that. Again, you're going
to have to do unsupported weirdness or use the pay tools.

In the case of NAS (e.g. NFS), you can access the files just like
you would from any other NFS share, or depending on what's providing the
NFS share, from the NFS server itself. And with all of the same types of
issues that come from accessing the files directly as you would with the
VMware Server product. And losing the advantages of VMFS (which, without
the for-pay tools, isn't much).

> I was hoping I could make a virtual partition visible on
> physical backuppc server so I could sync it as a raid1, then remove it
> and have something that could be used in a virtual machine or copied via

> rsync, then used by a remote virtual machine.

What is a "virtual partition"? Do you mean the VMFS directly? Or do you
mean that you want to *mount* the filesystem *inside* of the VMware image?

While it is theoretically possible to configure VMware guests (at least on
Server) to use physical partitions directly (and therefore eliminate a
layer of complexity), this is *not* a recommended configuration. Frankly,
the idea of doing *anything* to a VMware guest filesystem at the block
layer is pretty much a non-starter--especially when VMFS is involved.

Basically, with VMware, you have few and limited choices for image-level
backup without using the pay-for tools.

Server: You can capture images at the filesystem layer (as standard UNIX
files), assuming that you snapshot the guest properly. No extra cost, and
100% reliable backups, but a pretty manual process and slight downtime
twice. Not useful for daily backups without a lot of work (you can't
script snapshot removal in Server). Mainly useful for quarterly or so
disaster recovery images.

Of course, daily backups via BackupPC works inside of the guest just
fine... :)

ESX: You have to pay for ESX. With ESX, you have the full suite of tools
available to back up images, INCLUDING automatic block-based incremental
backups of images: Incremental backups that only back up changed blocks
automatically! *Really* straightforward. By far the best production
solution, and not *that* expensive: something like $2000 for 3 servers?

ESXi: You either have to pay for the tools (just like ESX), or you are
limited to sleight of hand on the NAS, and you lose the advantages of
VMFS. If you use DAS or SAN storage, there's no layer in which you can
get access to the image files without doing something to put your ESXi
server in an unsupported configuration (and which might get removed in the
future).

One small alternative to the "unsupported ESXi SSH server" situation:
Veeam FastSCP. http://www.veeam.com/vmware-esxi-fastscp.html I know that
VMware and Veeam are in a bit of an uneasy situation between them.
Basically, Veeam makes tools that do what the VMware tools do. I know
that VMware has tried to prevent Veeam from allowing their tools to work
with ESXi in the past.


Anyway, the idea of trying to automatically work with VMware images at the
block level without VMware's tools is not that great. You can't automate
the use of snapshots in Server (it's intentionally not scriptable). You
*could* shut down the guest, make an LVM snapshot, restart the guest and
use the LVM snapshot to perform a block-level rsync-style (or even
cp-style) copy of the images. You might be able to do something similar
with ESXi and a NAS that allows LVM-style snapshots. But outside of that,
there be dragons...

Tim Masesy
Les Mikesell
2009-09-10 21:23:20 UTC
Permalink
Timothy J Massey wrote:
>
>> I thought there was a way to access the vmx image directly from linux,
>> but I don't know if it has to be mounted or if you can see a raw
>> partition.
>
> Depends: are we talking VMware Server, or ESX/ESXi? And if we're talking
> ESXi, are we talking SAN, NAS or DAS (directly attached storage)?

Server.

> In the case of Server, the VMX images are right there on a Linux
> filesystem: do with them what you want. Just snapshot properly. I think
> I've commented enough on this in other e-mails.

I know there are there in the form of a directory of files which will be
nice for the rsync step. What I want to know is if the physical host
can see the virtual disk/partition the same way a guest can. There is a
vmware-mount that I think lets you see the virtual filesystem, but I
want access to the raw virtual partition at a block device level.

>> I was hoping I could make a virtual partition visible on
>> physical backuppc server so I could sync it as a raid1, then remove it
>> and have something that could be used in a virtual machine or copied via
>
>> rsync, then used by a remote virtual machine.
>
> What is a "virtual partition"? Do you mean the VMFS directly? Or do you
> mean that you want to *mount* the filesystem *inside* of the VMware image?

I was hoping to be able to see a block device on the host that would be
the same thing a guest would see. Then, without running a guest I could
either let the virtual partition sync as a raid member or dd an image
copy to it. When that is complete, I'd like to be able to rsync the
directory of files that hold the vmdk virtual disk off to another
machine where a virtual machine could be started to access it the usual way.

> While it is theoretically possible to configure VMware guests (at least on
> Server) to use physical partitions directly (and therefore eliminate a
> layer of complexity), this is *not* a recommended configuration. Frankly,
> the idea of doing *anything* to a VMware guest filesystem at the block
> layer is pretty much a non-starter--especially when VMFS is involved.

I want it the other way around - the physical host accessing a virtual disk.

> Basically, with VMware, you have few and limited choices for image-level
> backup without using the pay-for tools.
>
> Server: You can capture images at the filesystem layer (as standard UNIX
> files), assuming that you snapshot the guest properly. No extra cost, and
> 100% reliable backups, but a pretty manual process and slight downtime
> twice. Not useful for daily backups without a lot of work (you can't
> script snapshot removal in Server). Mainly useful for quarterly or so
> disaster recovery images.

I suppose I could do dd over ssh to image copy to a running guest if the
physical host can't do it directly, then shut the guest down for the rsync.

> Anyway, the idea of trying to automatically work with VMware images at the
> block level without VMware's tools is not that great. You can't automate
> the use of snapshots in Server (it's intentionally not scriptable).

On the local side I don't really need a snapshot since I've got the
master copy. On the remote side I'd want to keep two copies, switching
between them for updates or something similar.

> You
> *could* shut down the guest, make an LVM snapshot, restart the guest and
> use the LVM snapshot to perform a block-level rsync-style (or even
> cp-style) copy of the images. You might be able to do something similar
> with ESXi and a NAS that allows LVM-style snapshots. But outside of that,
> there be dragons...

If I have to change the parent OS, it would probably be to opensolaris
so I could work with zfs snapshots.

--
Les Mikesell
***@gmail.com
Timothy J Massey
2009-09-11 01:17:12 UTC
Permalink
Les Mikesell <***@gmail.com> wrote on 09/10/2009 05:23:20 PM:

> Timothy J Massey wrote:
> >
> >> I thought there was a way to access the vmx image directly from
linux,
> >> but I don't know if it has to be mounted or if you can see a raw
> >> partition.
> >
> > Depends: are we talking VMware Server, or ESX/ESXi? And if we're
talking
> > ESXi, are we talking SAN, NAS or DAS (directly attached storage)?
>
> Server.
>
> > In the case of Server, the VMX images are right there on a Linux
> > filesystem: do with them what you want. Just snapshot properly. I
think
> > I've commented enough on this in other e-mails.
>
> I know there are there in the form of a directory of files which will be

> nice for the rsync step. What I want to know is if the physical host
> can see the virtual disk/partition the same way a guest can. There is a
> vmware-mount that I think lets you see the virtual filesystem, but I
> want access to the raw virtual partition at a block device level.

No, you really don't! :)

What is the advantage of this? You can simply rsync the *file* just as
easy as try to work with "raw virtual partitions". I see no advantage,
only problems. Even better, rsync is already set up to handle files! Why
not just use it as-is?

> >> I was hoping I could make a virtual partition visible on
> >> physical backuppc server so I could sync it as a raid1, then remove
it
> >> and have something that could be used in a virtual machine or copied
via
> >
> >> rsync, then used by a remote virtual machine.
> >
> > What is a "virtual partition"? Do you mean the VMFS directly? Or do
you
> > mean that you want to *mount* the filesystem *inside* of the VMware
image?
>
> I was hoping to be able to see a block device on the host that would be
> the same thing a guest would see. Then, without running a guest I could

> either let the virtual partition sync as a raid member or dd an image
> copy to it. When that is complete, I'd like to be able to rsync the
> directory of files that hold the vmdk virtual disk off to another
> machine where a virtual machine could be started to access it the usual
way.

You don't *need* all of that complexity. Do *exactly* what you're
describing, but use cp instead of dd to copy the files. I think you need
to really think long and hard about the hoops you're trying to jump
through. The logic that you'd use for a physical machine is obselete when
you're dealing with VMware. Just copy the native files as native files to
another machine with VMware server, and start the host. That simple.

There is *no* need for dealing with RAID, dd, etc. cp'ing the files is
logically *identical* to a DD or RAID1 rebuild of a physical machine. And
*way* simpler.

> > While it is theoretically possible to configure VMware guests (at
least on
> > Server) to use physical partitions directly (and therefore eliminate a

> > layer of complexity), this is *not* a recommended configuration.
Frankly,
> > the idea of doing *anything* to a VMware guest filesystem at the block

> > layer is pretty much a non-starter--especially when VMFS is involved.
>
> I want it the other way around - the physical host accessing a virtual
disk.

Again, you *really* *really* do not. There is **NO** advantage to this.
Simply treat the VMX files as if they *were* a physical copy made by
dd--pretend that step is done for you! :)

> > Basically, with VMware, you have few and limited choices for
image-level
> > backup without using the pay-for tools.
> >
> > Server: You can capture images at the filesystem layer (as standard
UNIX
> > files), assuming that you snapshot the guest properly. No extra cost,
and
> > 100% reliable backups, but a pretty manual process and slight downtime

> > twice. Not useful for daily backups without a lot of work (you can't
> > script snapshot removal in Server). Mainly useful for quarterly or so

> > disaster recovery images.
>
> I suppose I could do dd over ssh to image copy to a running guest if the

> physical host can't do it directly, then shut the guest down for the
rsync.

Again, what is your logic here? You don't move the contents of a block
device from one guest to another. YOU MOVE THE ENTIRE GUEST. Period.
This is so, very, very way better than what you're trying to do.

It's like waving a magic wand and cloning an entire pretend-physical (i.e.
virtual) machine into two complete, identical machines. Or better yet,
imagine you've magically cloned the hard drives and slid them into an
identical machine. By simply cp'ing the contents of the guest directory
you have accomplished exactly that.

No messing with LVM, RAID, etc. Just a simple cp, and if you do the
snapshotting properly, you can do it with only a reboot worth of downtime.

> > Anyway, the idea of trying to automatically work with VMware images at
the
> > block level without VMware's tools is not that great. You can't
automate
> > the use of snapshots in Server (it's intentionally not scriptable).
>
> On the local side I don't really need a snapshot since I've got the
> master copy. On the remote side I'd want to keep two copies, switching
> between them for updates or something similar.

Again, make copies via cp.

> > You
> > *could* shut down the guest, make an LVM snapshot, restart the guest
and
> > use the LVM snapshot to perform a block-level rsync-style (or even
> > cp-style) copy of the images. You might be able to do something
similar
> > with ESXi and a NAS that allows LVM-style snapshots. But outside of
that,
> > there be dragons...
>
> If I have to change the parent OS, it would probably be to opensolaris
> so I could work with zfs snapshots.

But you can't get VMware for OpenSolaris! :) I hate to tell you, ZFS
buys you *NOTHING* in this situation. You'll be forced to use VMware
snapshotting anyway, in which case you no longer *need* filesystem
snapshotting.

Tim Massey
Les Mikesell
2009-09-11 02:38:56 UTC
Permalink
Timothy J Massey wrote:
>
>> I know there are there in the form of a directory of files which will be
>
>> nice for the rsync step. What I want to know is if the physical host
>> can see the virtual disk/partition the same way a guest can. There is a
>> vmware-mount that I think lets you see the virtual filesystem, but I
>> want access to the raw virtual partition at a block device level.
>
> No, you really don't! :)
>
> What is the advantage of this? You can simply rsync the *file* just as
> easy as try to work with "raw virtual partitions". I see no advantage,
> only problems. Even better, rsync is already set up to handle files! Why
> not just use it as-is?

Maybe I'm not making this clear. I want the physical backuppc host to be able
to mirror its current 750 GB backuppc partition onto what appears at the time to
be a virtual partition, but which results in the updating of the files that
comprise a vmdk disk. Backuppc isn't running on a VM. After this step
completes, I want to rsync those files to copies elsewhere. At the other
location, I might want to access the vmdk from a VM if it is necessary to
restore something.


>> I was hoping to be able to see a block device on the host that would be
>> the same thing a guest would see. Then, without running a guest I could
>
>> either let the virtual partition sync as a raid member or dd an image
>> copy to it. When that is complete, I'd like to be able to rsync the
>> directory of files that hold the vmdk virtual disk off to another
>> machine where a virtual machine could be started to access it the usual
> way.
>
> You don't *need* all of that complexity. Do *exactly* what you're
> describing, but use cp instead of dd to copy the files.

The files don't have any contents yet.

> I think you need
> to really think long and hard about the hoops you're trying to jump
> through. The logic that you'd use for a physical machine is obselete when
> you're dealing with VMware. Just copy the native files as native files to
> another machine with VMware server, and start the host. That simple.

I need to get the image of my working system on there in the first place.

> There is *no* need for dealing with RAID, dd, etc. cp'ing the files is
> logically *identical* to a DD or RAID1 rebuild of a physical machine. And
> *way* simpler.

But it also doesn't do anything useful when there is nothing on the partition yet.


>> I suppose I could do dd over ssh to image copy to a running guest if the
>
>> physical host can't do it directly, then shut the guest down for the
> rsync.
>
> Again, what is your logic here? You don't move the contents of a block
> device from one guest to another. YOU MOVE THE ENTIRE GUEST. Period.
> This is so, very, very way better than what you're trying to do.

The point is to get the data which is on a physical host partition into a vmdk
that can be copied as a set of smaller files - and used that way directly if needed.

> It's like waving a magic wand and cloning an entire pretend-physical (i.e.
> virtual) machine into two complete, identical machines. Or better yet,
> imagine you've magically cloned the hard drives and slid them into an
> identical machine. By simply cp'ing the contents of the guest directory
> you have accomplished exactly that.

But I do that on the physical machine now by adding a partition to the raid,
letting it sync, then removing it. The cloning part isn't a particular problem
within the machine. I just want to get it into something rsync can handle.

>> If I have to change the parent OS, it would probably be to opensolaris
>> so I could work with zfs snapshots.
>
> But you can't get VMware for OpenSolaris! :) I hate to tell you, ZFS
> buys you *NOTHING* in this situation. You'll be forced to use VMware
> snapshotting anyway, in which case you no longer *need* filesystem
> snapshotting.

Virtualbox seems to be a reasonable match for VMware these days. It can even
use vmdk format disks directly. With zfs I'd be able to use the incremental
send/receive function which would likely be even better than rsync'ing the files
sitting on top of it.

--
Les Mikesell
***@gmail.com
Timothy J Massey
2009-09-11 06:33:33 UTC
Permalink
Les Mikesell <***@gmail.com> wrote on 09/10/2009 10:38:56 PM:

> Maybe I'm not making this clear. I want the physical backuppc host
> to be able
> to mirror its current 750 GB backuppc partition onto what appears at
> the time to
> be a virtual partition, but which results in the updating of the files
that
> comprise a vmdk disk. Backuppc isn't running on a VM. After this step
> completes, I want to rsync those files to copies elsewhere. At the
other
> location, I might want to access the vmdk from a VM if it is necessary
to
> restore something.

So you're attempting to convert a physical BackupPC server into a virtual
image? VMware has conversion tools that do this. I've only used the
Windows version of VMware Converter, but it has worked perfectly for
converting a physical host into a virtual host. There is a Linux version.

I would look into this, rather than trying to do a poor-man's version of
it. It's a very simple process, and after a somewhat long wait (but
shorter than doing a RAID1 rebuild) you will have a brand new virtual
clone of your physical box!

And if all you're doing is to try to capture a file-based version of your
block device (a physical partition) that you want to mount using some
other physical server (or even a virtual server, come to think of it), I
think you'd be *far* better off just dd'ing the partition into a file and
using a loopback mount to mount it someplace else.

In other words, the only time you should be dealing with VMDK files is if
you're trying to create a new virtual guest. And if you are doing this,
the proper way of doing this is *not* by trying to use LVM/RAID weirdness,
but using the VMware Converter tools to do this for you properly.

If you're *not* trying to create a new virtual guest, then don't mess with
VMDK files. They're an annoyance that should only be dealt with if you
actually have to.

> > I think you need
> > to really think long and hard about the hoops you're trying to jump
> > through. The logic that you'd use for a physical machine is obselete
when
> > you're dealing with VMware. Just copy the native files as native
files to
> > another machine with VMware server, and start the host. That simple.
>
> I need to get the image of my working system on there in the first
place.

OK, then what I explained above is probably correct. The way you do this
is by using the VMware Converter tools, not by trying to do it yourself!
:)

> > Again, what is your logic here? You don't move the contents of a
block
> > device from one guest to another. YOU MOVE THE ENTIRE GUEST. Period.

> > This is so, very, very way better than what you're trying to do.
>
> The point is to get the data which is on a physical host partition
> into a vmdk
> that can be copied as a set of smaller files - and used that way
> directly if needed.

Yup. Use the conversion tools.

> > But you can't get VMware for OpenSolaris! :) I hate to tell you, ZFS

> > buys you *NOTHING* in this situation. You'll be forced to use VMware
> > snapshotting anyway, in which case you no longer *need* filesystem
> > snapshotting.
>
> Virtualbox seems to be a reasonable match for VMware these days. Itcan
even
> use vmdk format disks directly. With zfs I'd be able to use the
incremental
> send/receive function which would likely be even better than
> rsync'ing the files
> sitting on top of it.

VirtualBox compares fairly with the free VMware Server, but VMware server
is about 10% of what you can do with VMware--with the paid-for tools.

When it comes to commercial tools, VMware is in a class by itself, though
Citrix is trying hard with XenServer (still too cumbersome and unpolished
compared to VMware, and requires VM hardware for Windows). When it comes
to free-as-in-beer, XenServer is the best. It's still cumbersome, but
they give you several of the items for free that VMware charges for.

VirtualBox is neither the best tool overall, nor the best tool for free.
And unfortunately, the GPL'ed code is only a fraction of what you really
need for a usable virtualization environment. If you want GPL tools, KVM
(especially in RHEL 5.4) is the best around.

The only advantage that VirtualBox has is that it runs on OpenSolaris (or
OS/2...). For me, that's a non-feature. Obviously, YM *does* V... :)


Interesting thoughts. I've never been a fan of running BackupPC inside of
a virtualized guest. Basically, my philosophy is to put as little between
my backups and the hardware as possible. I don't even use compression on
my backups! The idea of putting my backups inside of a virtual disk on
top of yet another filesystem is not overly appealing. Now I've got two
ways for EXT3 to screw me! :)

But the ability to rsync collections of VMDK files to a remote host *is*
appealing. Interesting...

However, you could achieve the same thing in other ways without having to
run BackupPC in a virtualized guest. You could simply use an rsync-like
process for copying the block device to a remote (physical) host. Years
ago someone posted a script that read a block device 64k (IIRC) at a time
and did an MD5SUM on it and compared it with a remote block device (via
netcat, again IIRC). If the blocks matched, they were skipped. If they
didn't, the block was sent over. You could do something like that to a
physical block device to achieve largely the same thing.

Or, if you wanted to use actual rsync and you wanted to avoid block
devices, you could do the same thing by using a (very large!) loopback
file for your BackupPC pool partition on a physical server and a physical
partition and rsync the file (after you stop BackupPC and unmount the
partition). In fact, you could create a partition for your pool and
create a single file that fills the entire device, and use that file as a
loopback partition. In that case, you could very nicely use LVM tools to
snapshot the outer partition and you could even restart BackupPC while the
remote sync was taking place!

How much performance do you lose using a loopback mount? It's *gotta* be
less than the overhead of virtualization! I like that idea even better.
But all it buys you is being able to use rsync directly on a file instead
of coming up with a way to copy a block device in an rsync-like manner...
And, to me, that's the best way of all.

Of course, now we've come full circle: how do you copy a physical block
device in an rsync-like manner? :)

Tim Massey
Adam Goryachev
2009-09-11 06:47:31 UTC
Permalink
Timothy J Massey wrote:
>
> But the ability to rsync collections of VMDK files to a remote host
> *is* appealing. Interesting...
>
> However, you could achieve the same thing in other ways without
> having to run BackupPC in a virtualized guest. You could simply
> use an rsync-like process for copying the block device to a remote
> (physical) host. Years ago someone posted a script that read a
> block device 64k (IIRC) at a time and did an MD5SUM on it and
> compared it with a remote block device (via netcat, again IIRC).
> If the blocks matched, they were skipped. If they didn't, the
> block was sent over. You could do something like that to a
> physical block device to achieve largely the same thing.
>
> Or, if you wanted to use actual rsync and you wanted to avoid block
> devices, you could do the same thing by using a (very large!)
> loopback file for your BackupPC pool partition on a physical server
> and a physical partition and rsync the file (after you stop
> BackupPC and unmount the partition). In fact, you could create a
> partition for your pool and create a single file that fills the
> entire device, and use that file as a loopback partition. In that
> case, you could very nicely use LVM tools to snapshot the outer
> partition and you could even restart BackupPC while the remote sync
> was taking place!
>
> How much performance do you lose using a loopback mount? It's
> *gotta* be less than the overhead of virtualization! I like that
> idea even better. But all it buys you is being able to use rsync
> directly on a file instead of coming up with a way to copy a block
> device in an rsync-like manner... And, to me, that's the best way
> of all.
>
> Of course, now we've come full circle: how do you copy a physical
> block device in an rsync-like manner? :)
Why not just use lvm to take a snapshot, use dd to take 2G chunks (or
whatever size you want) or even cat /dev/blah | cut -c 2G etc... once
split into files of the right size, do they rsync to the remote site.

Downsides:
1) You need LVM to create the snapshot (or else you need to stop
backuppc while creating the split files)
2) You need double the storage space to store your pool data locally
as split files
3) You need double the storage space on your remote server if you want
to actually save the split files to a device so it is ready to roll
when needed... (or you need extra storage space to do this at the time
your disaster strikes).

Of course, all the 'double storage spaces' can be single disks, they
don't need to be expensive, fast disks, and don't need to have RAID
etc....

Just my 0.02c...

Regards,
Adam
Pieter Wuille
2009-09-11 08:43:23 UTC
Permalink
On Fri, Sep 11, 2009 at 04:47:31PM +1000, Adam Goryachev wrote:
> Timothy J Massey wrote:
> > Of course, now we've come full circle: how do you copy a physical
> > block device in an rsync-like manner? :)
> Why not just use lvm to take a snapshot, use dd to take 2G chunks (or
> whatever size you want) or even cat /dev/blah | cut -c 2G etc... once
> split into files of the right size, do they rsync to the remote site.
>
> Downsides:
> 1) You need LVM to create the snapshot (or else you need to stop
> backuppc while creating the split files)
> 2) You need double the storage space to store your pool data locally
> as split files
> 3) You need double the storage space on your remote server if you want
> to actually save the split files to a device so it is ready to roll
> when needed... (or you need extra storage space to do this at the time
> your disaster strikes).
>
> Of course, all the 'double storage spaces' can be single disks, they
> don't need to be expensive, fast disks, and don't need to have RAID
> etc....

Or you can just use the perl script i gave the link to in the initial post
of this thread to avoid the double storage and *mount* the LVM snapshot
as a directory with 1GiB files, and rsync those to the other side.

PS: the latest version has write support, so use it at both sides, and
you rsync two blockdevices directly (use --inplace --ignore-times).
Of course, being Perl and using Fuse, it causes some serious overhead
(especially on old machines - on a P3 800 the CPU becomes the bottleneck),
so you might be better of using that patched version of rsync that
supports blockdevices on itself.

--
Pieter
Les Mikesell
2009-09-11 16:14:22 UTC
Permalink
Timothy J Massey wrote:
>
> So you're attempting to convert a physical BackupPC server into a virtual
> image? VMware has conversion tools that do this. I've only used the
> Windows version of VMware Converter, but it has worked perfectly for
> converting a physical host into a virtual host. There is a Linux version.

I'm hoping to accomplish a couple of different things in one step. I
don't want to convert my existing server to VMware. I want to make a
snapshot copy of the backuppc partition with as little downtime as
possible - and sync'ing a RAID member will do that. Then I want a copy
of that offsite - and so far splitting into 2GB chunks looks like a good
way to make rsync work. Then, if the chunked remote copy just happened
to be in a form that could connect up directly to a VMware guest that
could be set up for disaster recover restores, so much the better.

> I would look into this, rather than trying to do a poor-man's version of
> it. It's a very simple process, and after a somewhat long wait (but
> shorter than doing a RAID1 rebuild) you will have a brand new virtual
> clone of your physical box!

I'd be very surprised if the converter can do it faster than a raid
rebuild - and that's not what I want anyway. I only want the single
partition copied. The physical host has other drives that aren't related.

> And if all you're doing is to try to capture a file-based version of your
> block device (a physical partition) that you want to mount using some
> other physical server (or even a virtual server, come to think of it), I
> think you'd be *far* better off just dd'ing the partition into a file and
> using a loopback mount to mount it someplace else.
>
> In other words, the only time you should be dealing with VMDK files is if
> you're trying to create a new virtual guest. And if you are doing this,
> the proper way of doing this is *not* by trying to use LVM/RAID weirdness,
> but using the VMware Converter tools to do this for you properly.
>
> If you're *not* trying to create a new virtual guest, then don't mess with
> VMDK files. They're an annoyance that should only be dealt with if you
> actually have to.

I'd like to accomplish both at once - that is, image copy/raid sync to
get a snapshot, and have the result usable by a separate VM. However, I
haven't been able to figure out how do do it with the vmware (server
2.x) utilities. I can create a chunked disk with vmware-diskmanager and
I can connect it so the host sees the whole disk image in one piece with
vmware-mount and the -f option, but I can't find a way to see a raw
partition. I could mount a single partition if it had a filesystem on
it but I don't see how to access the partition in a way that mdadm will
like.

>
> VirtualBox compares fairly with the free VMware Server, but VMware server
> is about 10% of what you can do with VMware--with the paid-for tools.
>
> When it comes to commercial tools, VMware is in a class by itself, though
> Citrix is trying hard with XenServer (still too cumbersome and unpolished
> compared to VMware, and requires VM hardware for Windows). When it comes
> to free-as-in-beer, XenServer is the best. It's still cumbersome, but
> they give you several of the items for free that VMware charges for.
>
> VirtualBox is neither the best tool overall, nor the best tool for free.
> And unfortunately, the GPL'ed code is only a fraction of what you really
> need for a usable virtualization environment. If you want GPL tools, KVM
> (especially in RHEL 5.4) is the best around.

I'm not convinced that any of that matters when the real issue is moving
a physical disk head around.

> The only advantage that VirtualBox has is that it runs on OpenSolaris (or
> OS/2...). For me, that's a non-feature. Obviously, YM *does* V... :)

You left out Macs, which just happens to matter to me but not so much
for this project. There's a free virtualbox for intel based Macs and no
free vmware product. And with only a bit of tweaking you can make a
guest image created under vmware boot and run under virtualbox.

> Interesting thoughts. I've never been a fan of running BackupPC inside of
> a virtualized guest. Basically, my philosophy is to put as little between
> my backups and the hardware as possible. I don't even use compression on
> my backups! The idea of putting my backups inside of a virtual disk on
> top of yet another filesystem is not overly appealing. Now I've got two
> ways for EXT3 to screw me! :)
>
> But the ability to rsync collections of VMDK files to a remote host *is*
> appealing. Interesting...

And having a VM image prepared to do restores is also appealing since it
isolates the install from the hardware you might have available. I can
do that now from my laptop using a USB adapter to connect the disk with
the mirror of the backuppc partition but it would be nicer to have
remote copies that were completely virtual and automatically updated.

> How much performance do you lose using a loopback mount? It's *gotta* be
> less than the overhead of virtualization! I like that idea even better.

This is the effect I was hoping to get by vmware-mounting the vmdk into
the physical host.

> But all it buys you is being able to use rsync directly on a file instead
> of coming up with a way to copy a block device in an rsync-like manner...
> And, to me, that's the best way of all.
>
> Of course, now we've come full circle: how do you copy a physical block
> device in an rsync-like manner? :)

Maybe the fuse/perl driver mentioned earlier would work with one end in
the physical backuppc server and the other in the remote disaster
recovery VMware guest. But, there is a timing issue unless some sort
of local snapshot capability is added and I'd prefer to avoid LVM. I
suppose I could sync my existing disk into the raid, break it, and mount
it back separately for the rsync step to decouple the transfer time.

--
Les Mikesell
***@gmail.com
Timothy J Massey
2009-09-11 18:40:04 UTC
Permalink
Les Mikesell <***@gmail.com> wrote on 09/11/2009 12:14:22 PM:

> Timothy J Massey wrote:
> >
> > So you're attempting to convert a physical BackupPC server into a
virtual
> > image? VMware has conversion tools that do this. I've only used the
> > Windows version of VMware Converter, but it has worked perfectly for
> > converting a physical host into a virtual host. There is a Linux
version.
>
> I'm hoping to accomplish a couple of different things in one step. I
> don't want to convert my existing server to VMware. I want to make a
> snapshot copy of the backuppc partition with as little downtime as
> possible - and sync'ing a RAID member will do that. Then I want a copy
> of that offsite - and so far splitting into 2GB chunks looks like a good

> way to make rsync work. Then, if the chunked remote copy just happened
> to be in a form that could connect up directly to a VMware guest that
> could be set up for disaster recover restores, so much the better.

Sure. I can see that you want to do that. But I think you're trying to
shoehorn too much into a single process--and requiring pieces (i.e. VMDK)
that weren't designed to do what you're asking of them. I think you're
wanting more than the tools will deliver.

But go for it--prove me wrong! :)

> > And if all you're doing is to try to capture a file-based version of
your
> > block device (a physical partition) that you want to mount using some
> > other physical server (or even a virtual server, come to think of it),
I
> > think you'd be *far* better off just dd'ing the partition into a file
and
> > using a loopback mount to mount it someplace else.
> >
> > In other words, the only time you should be dealing with VMDK files is
if
> > you're trying to create a new virtual guest. And if you are doing
this,
> > the proper way of doing this is *not* by trying to use LVM/RAID
weirdness,
> > but using the VMware Converter tools to do this for you properly.
> >
> > If you're *not* trying to create a new virtual guest, then don't mess
with
> > VMDK files. They're an annoyance that should only be dealt with if
you
> > actually have to.
>
> I'd like to accomplish both at once - that is, image copy/raid sync to
> get a snapshot, and have the result usable by a separate VM. However, I

> haven't been able to figure out how do do it with the vmware (server
> 2.x) utilities. I can create a chunked disk with vmware-diskmanager and

> I can connect it so the host sees the whole disk image in one piece with

> vmware-mount and the -f option, but I can't find a way to see a raw
> partition. I could mount a single partition if it had a filesystem on
> it but I don't see how to access the partition in a way that mdadm will
> like.

Like I said, you may want to do it, but wanting it won't make it so,
unfortunately.

> > VirtualBox compares fairly with the free VMware Server, but VMware
server
> > is about 10% of what you can do with VMware--with the paid-for tools.
> >
> > When it comes to commercial tools, VMware is in a class by itself,
though
> > Citrix is trying hard with XenServer (still too cumbersome and
unpolished
> > compared to VMware, and requires VM hardware for Windows). When it
comes
> > to free-as-in-beer, XenServer is the best. It's still cumbersome, but

> > they give you several of the items for free that VMware charges for.
> >
> > VirtualBox is neither the best tool overall, nor the best tool for
free.
> > And unfortunately, the GPL'ed code is only a fraction of what you
really
> > need for a usable virtualization environment. If you want GPL tools,
KVM
> > (especially in RHEL 5.4) is the best around.
>
> I'm not convinced that any of that matters when the real issue is moving

> a physical disk head around.

If that's all you want out of life, then pick whatever. For most people,
vMotion is a killer app. The only reason I would use a solution that
*doesn't* provide this would be 100% GPL--which is why I'm keeping a very
close eye on KVM.

> > How much performance do you lose using a loopback mount? It's *gotta*
be
> > less than the overhead of virtualization! I like that idea even
better.
>
> This is the effect I was hoping to get by vmware-mounting the vmdk into
> the physical host.

And I think you'd be a million times better by just using a simple
loopback mount--which could be used by a physical *or* a virtual host with
zero drawbacks, outside of loopback itself. And it's rsyncable.

I go back to my original statement: unless you're *trying* to migrate
something to a virtual guest, don't saddle yourself with VMDK's. We use
them because we have to, not because we *want* to. Use the loopback. You
still haven't given me a drawback for this, other than "I want to use
VMDK's, even though they aren't designed to be generic containers for
data." VMDK files are not tar files, here...

Again, a virtual guest can use a loopback file *just* as easily as a
physical host...

> Maybe the fuse/perl driver mentioned earlier would work with one end in
> the physical backuppc server and the other in the remote disaster
> recovery VMware guest. But, there is a timing issue unless some sort
> of local snapshot capability is added and I'd prefer to avoid LVM. I
> suppose I could sync my existing disk into the raid, break it, and mount

> it back separately for the rsync step to decouple the transfer time.

You have *way* too many preconditions.

"I want to deal with my partitions as files"

-- Then use loopback

"But I want to use VMDK files"

-- Then use the VMware Converter Tools

"But I only want to do *some* partitions"

-- Then use LVM and snapshot them

"But I don't want to use LVM"

-- Well, then, I guess you're out of luck...

http://en.wikipedia.org/wiki/Moving_the_goalpost

Tim Massey
Les Mikesell
2009-09-11 19:58:21 UTC
Permalink
Timothy J Massey wrote:
>
>> I'm not convinced that any of that matters when the real issue is moving
>> a physical disk head around.
>
> If that's all you want out of life, then pick whatever. For most people,
> vMotion is a killer app. The only reason I would use a solution that
> *doesn't* provide this would be 100% GPL--which is why I'm keeping a very
> close eye on KVM.

Most of the services I care about need a large farm of load balanced
physical servers, so slicing one of them up into virtual machines
doesn't make a lot of sense and moving them always involves juggling
hardware anyway. There could be some exceptions but probably not enough
to deal with different technology. An occasional Vmware server/guest
instance isn't a big deal because I don't rely on it for anything I
couldn't do with bare hardware.

>> This is the effect I was hoping to get by vmware-mounting the vmdk into
>> the physical host.
>
> And I think you'd be a million times better by just using a simple
> loopback mount--which could be used by a physical *or* a virtual host with
> zero drawbacks, outside of loopback itself. And it's rsyncable.

Maybe - but I don't want it live in my main server. I want the
chunked-file copy to be a snapshot made quickly locally, then dribbled
offsite. When I have a chance to look at the vmdk spec, it's probably
trivial to copy directly to one creating it as you go as fast as dd
could copy an image. I just expected vmware-mount to provide the option
to see a partition as well as the whole disk - until I tried it. Maybe
I'm still missing something.

>> Maybe the fuse/perl driver mentioned earlier would work with one end in
>> the physical backuppc server and the other in the remote disaster
>> recovery VMware guest. But, there is a timing issue unless some sort
>> of local snapshot capability is added and I'd prefer to avoid LVM. I
>> suppose I could sync my existing disk into the raid, break it, and mount
>
>> it back separately for the rsync step to decouple the transfer time.
>
> You have *way* too many preconditions.

I'm trying to change a working process as little as possible. If I have
to make big changes, zfs is probably the way to go.

> "I want to deal with my partitions as files"
>
> -- Then use loopback

I want something that works with mdadm to add to my raid so the
partition can remain mounted while the copy occurs. And I want
something that is chunked for the rsync steps.

> http://en.wikipedia.org/wiki/Moving_the_goalpost

I have a working setup where I add a physical drive to my raid, then
remove it and I can access that drive through a USB adapter from a
vmware guest on my laptop. The goal is simply to change the step where
I throw that drive in my briefcase and take it offsite with something
that happens automatically and still gives me a copy elsewhere that a
vmware guest can access for disaster recovery. It doesn't _have_ to get
copied into a chunked vmdk immediately, but doing so solves all of the
subsequent steps.

--
Les Mikesell
***@gmail.com
Holger Parplies
2009-09-12 02:08:01 UTC
Permalink
Hi list,

I could probably quote roughly 1GB from this discussion, or top-post and
append the whole thread for those of you who want to read it again, but I
won't.

I just want to share some thoughts that seem to be missing from this
discussion so far - for whatever use anyone can make of them.

* The BackupPC pool file system is, generally speaking, made up of file system
blocks. No logical entity within the file system will be shifted forward or
backward by any amount of space that is *not* an integral multiple of the
file system block size - and probably not even that. (There may be exceptions
with things such as reiserfs tails, but I doubt they're worth taking into
account. Hmm, there are also directory entries - are they worth thinking
about?)

* rsync calculates *rolling* block *checksums* in order to re-match data at
an offset any number of bytes away. While "rolling" does not hurt (much - a
bit of performance at the most) when applied to a whole file system, it
provides no benefit. "checksums" may hurt, because there are bound to be
collisions which would, if I'm not mistaken, cause a second pass across the
"file(s)" to be done. That involves a *lot* of disk I/O if not bandwidth.
The md5sums over individual large (non-rolling) blocks approach someone
mentioned is bound to make much more sense for a file system than the rsync
algorithm.

* *Data within the pool* is *never* modified. Files are created, linked to,
and deleted. That's it. [Wait, that's wrong. rsync checksums may be added
later, but they're *appended*, aren't they? Only once anyway.]
A few *small* files are modified (log files, backups files; perhaps a
.bash_history file or other things outside the scope of BackupPC).
*Existing directories* are modified as new pool files are added and expired
ones removed. The same applies to pc/$host directories regarding new and
expired backups.
*New pc/$host/$num directory hierarchies* are created.
*Inode information* is modified heavily. Forget about the ctime, which may
vary between file systems. The *link count* is modified, meaning the inode
is modified. Not for every file in the pool, but for every file that was
linked to (or unlinked from because of an expiring backup).
Other metadata such as *block usage bitmaps* is modified.

To sum it up, modifications since the last "backup" of the pool FS will
consist of *new files and directories* and *changed file system metadata*.
Presumably, your backups will consist mostly of *data* (you might want to
check that), and a large part of that will be static. 300 GB of daily
changes on a 540 GB pool seems extremely unlikely, 8 GB seems more like it.

* VMDK files, from my experience, do not seem to resemble raw disk images too
closely. I only use the non-preallocated variant, and this seems to be well
optimized for storing wide-spread writes (think "mkfs") in a small amount
of data. A preallocated VMDK may be a completely different matter. But it's
a proprietary format, isn't it? Is there a public spec? Do you know what
design decisions were made and why?

In any case, how much data do you need to fully represent the changes made
to the virtual file system? Does the VMDK change more or less than the file
system it represents? By what factor? Is a VMDK also a logical block array,
or may information shift by non-blocksize distances?

* You could probably use DRBD or NBD to mirror to a partition inside a VMware
guest, presuming you really want to do that.


All of that said, I find the approach of incrementally copying the block
device quite appealing, presuming it proves to work well (and I'm not yet
convinced that rsync is the optimal tool to copy it with). It simply avoids
some of the problems of a file-based approach, but it also has other drawbacks,
meaning it won't work for everyone (e.g. you can't change the FS type; you
need storage for the full device size and bandwidth for the initial transfer;
you may need bandwidth for a full transfer on *restore*; you'll need enough
space for the image on restore, even if only a fraction of the FS is in use;
resizing the source FS may lead to a very long incremental transfer; you can't
backup anything other than the *complete* FS the pool is on; it won't protect
you from slowly accumulating FS corruption, as you're copying that into your
backup; ...). I'm really interested in hearing about your experiences with
this, but as, for me, <backuppc-users> is currently running in degraded
read-mostly mode due to sheer volume, don't expect me to join the discussion
on a regular basis :).

Regards,
Holger
Jeffrey J. Kosowsky
2009-09-13 00:11:10 UTC
Permalink
Holger Parplies wrote at about 04:08:01 +0200 on Saturday, September 12, 2009:
> * *Data within the pool* is *never* modified. Files are created, linked to,
> and deleted. That's it. [Wait, that's wrong. rsync checksums may be added
> later, but they're *appended*, aren't they? Only once anyway.]
A small nit here - the first byte of the file is also changed when you
append the rsync checksums.
Christian Völker
2009-09-10 21:36:27 UTC
Permalink
Hi Tim,

> Anyway, the idea of trying to automatically work with VMware images at the
> block level without VMware's tools is not that great.
True.
Thanks for the great mail. It subsummed really nearly everything which
is related to all this.
Just one point where you seem to be wrong:
You mentioned "VMware tools" (don't get confused with the "official"
VMware Tools running inside the guest) which are only available on ESXi
for money.
Are you aware of the VI Perl Toolkit? That contains Perl and a couple of
Perl scripts you can execute remotely even without a VirtualCenter
server. And as far as I know VMware has a Linux vApp with the installed
toolkit ready to download. So this should solve a lots of issues as it
offers the same functionality (for free!) to ESXi as a standalon ESX
has. Just remotely. As far as I remember you can even do some stuff
directly to ESXi instead of the need of VirtualCenter [not sure, I have
always a VirtualCenter running ;)].

> *could* shut down the guest, make an LVM snapshot, restart the guest and
> use the LVM snapshot to perform a block-level rsync-style (or even
> cp-style) copy of the images.
Re taking snapshots of the guest instead of shutting down. With
journaling file systems this shouldn't be an issue. I use ext3 with
"data=ordered" thus preventing any issues with the filesystem during a
snapshot/ backup.

I just removed my snapshot and realized I have data growth about 8GB/
day. So not too much. I have to figure out now a way how to perform
backups of the pool....block based as any filesystem based ones failed
up to now.



Greetings

Christian
Timothy J Massey
2009-09-11 01:31:39 UTC
Permalink
This post might be inappropriate. Click to display it.
Jeffrey J. Kosowsky
2009-09-11 02:43:10 UTC
Permalink
Les Mikesell wrote at about 13:41:58 -0500 on Thursday, September 10, 2009:
> Timothy J Massey wrote:
> >
> > VMware Sever 2.0 supports disks in single files, or split in
> > arbitrary-sized pieces (usually 2GB), both fully allocated or
> > grow-on-demand. ESX/ESXi only supports single-file images, and it is
> > designed to use VMFS.
>
> I thought there was a way to access the vmx image directly from linux,
> but I don't know if it has to be mounted or if you can see a raw
> partition. I was hoping I could make a virtual partition visible on
> physical backuppc server so I could sync it as a raid1, then remove it
> and have something that could be used in a virtual machine or copied via
> rsync, then used by a remote virtual machine.
>
> --
> Les Mikesell
> ***@gmail.com
>
>

I have been able to list partitions and mount them under Linux using
the following command and syntax:
- List partitions on a vmx disk:
vmware-mount.pl -p <disk>
- Mount partitions on a vmx disk
vmware-mount.pl <disk> <partition #> <mount point> -o <options>

I seem to recall the instructions for vmware-mount warning that it
might not be 100% stable (but it may be improved since then). Also, I
have used it only as read-only and am not sure if you can (or if it is
advisable) to attempt to write to the partition.

My first impression was that the mounted disk was not particularly
fast but I didn't really measure it so I may be wrong.
Michael Stowe
2009-09-11 21:07:13 UTC
Permalink
>> > random access times are dominated primarily by disk head seek time,
which is gonna be the same no matter what the transport to the drive
>> is.
>> > So the slower transport won't matter nearly as much with random I/O
as it will with sequential.
>>
>> This is not quite correct,
>
> i never said it was completely "correct", i said "dominated primarily"
and "won't matter nearly as much".

I meant, of course, that your statements aren't quite correct, but I had
misunderstood what you meant. Disk head seek time does dominate over
transfer time at the disk level for random reads, but it doesn't
necessarily dominate over the latency and bandwidth constraints imposed by
USB (some of which will naturally be hardware and software dependent, and
certainly depending on the seek speed of the drive, by way of comparison.)

I should probably explain a bit better, which I think I can do using a
thought experiment, wherein the computer tells a hard drive to seek to the
middle, then the edge of the disk, once using SATA, and once adding USB to
the commands in each direction -- the drive head won't actually start
moving to the edge of the disk until it receives the second command. So,
for simplicity's sake, if we say the drive seeks middle-to-edge in 10 ms,
and there's a 1 ms latency in USB, the SATA seek will take 10 ms, and the
USB seek will take 12 ms before the data can be transferred.

My point is that the drive can't take advantage of the fact that it's not
receiving data from the USB bus to position the head, because it doesn't
know where the head's supposed to be until it receives the USB command --
so slow seek times tend to be magnified by slower transport methods for
random access.

THAT being said, since the bandwidth constraint is less relevant due to
waiting for data, you're right in the sense that if drive heads need to
seek to the extent that they can't saturate the USB bus, then the speed of
the bus doesn't matter as much.

> I don't know what kind of additional latency USB has vs. SATA, a quick
web search didn't show me anything useful.

I haven't found anything particularly useful either, I'm afraid, and some
of the numbers out there are hard to believe.

> I'm sure there's some, but I'd be surprised to see numbers that showed
that the extra latency for USB is significant compared to the average
time-to-access inherent in the disk mechanism (avg seek + avg rotational
latency). Let's guess that it adds 10% to the total latency of a
request. Maybe i'm way off-base here, feel free to provide data.

My point is that any latency tends to magnify the seek times, while drive
seeking tends to desaturate the USB bus. (This probably isn't much
different from what you're saying.)

> My own real-world measurements showed, with the same target hard disk,
25 MB/sec bulk throughput via USB2 and 40MB/sec throughput via eSATA.
>
> I was surprised at how slow the eSATA throughput was, actually, but did
not investigate further.
>
> The test was done on a linux system, dd'ing from a fast disk array to
the target hard disk.
>
>> It's probably worth keeping in mind that a USB 2.0 attached drive is
actually attached to either a SATA or an IDE controller; so you're
either
>> comparing USB+SATA or USB+IDE to eSATA.
>
> sure, and it's easy to remove this source of error by using the same
sata hard disk for each set of measurements.

If you're using the same drive, then it would seem you're only measuring
the additional bandwidth and latency constraints of USB. What I'd expect
from a curve is that as the amount of raw data transferred *from* the
drive gets smaller, then the speed difference between USB and SATA would
tend to decrease. As data is *written,* however, I'd expect the curve to
be gentler if the data needs to be flushed, and sharper if it doesn't need
to wait and can simply saturate the bus.

My interpretation of when you said "random I/O" is that it included both
reads and writes, in which case, I'd expect the slowness to experience a
magnifying effect -- which is what I've seen in quite limited and
unscientific tests I've done with BackupPC.
Continue reading on narkive:
Loading...