rsync是主机间同步和备份的神器。相对于ftpscp等工具,rsync功能更强大,同步/传输效率更高,实属服务器的必备工具。

最近使用rsync时发现一个问题:PC和移动硬盘之间用rsync同步,修改过的二进制大文件会整个文件重传,效率十分低下。说好的rsync只传输差异部分呢?还是二进制文件的问题?但rsync的man手册明明这样写的:

Rsync is a fast and extraordinarily versatile file copying tool. It can copy locally, to/from another host over any remote shell, or to/from a remote rsync daemon. It offers a large number of options that control every aspect of its behavior and permit very flexible specification of the set of files to be copied. It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. Rsync is widely used for backups and mirroring and as an improved copy command for everyday use.

带着这个疑问上网查询,找到一个和我有同样困惑的人:Smarter filetransfers than rsync?

幸运的是有人完美的回答了这个问题:

Rsync will not use deltas but will transmit the full file in its entirety if it – as a single process – is responsible for the source and destination files. It can transmit deltas when there is a separate client and server process running on the source and destination machines.

The reason that rsync will not send deltas when it is the only process is that in order to determine whether it needs to send a delta it needs to read the source and destination files. By the time it’s done that it might as well have just copied the file directly.

翻译过来是:主机间通过网络同步文件,每个主机各运行一个rsync进程分别本机内的文件hash,然后通过网络传输差异部分;主机内的同步只有一个进程,rsync认为与其先对比文件再复制差异部分,不如直接进行复制来得快,故而选择传送整个文件。

仔细想一下,rsync的行为是合理的:主机间通讯的瓶颈在网络带宽,先计算差异部分再传效率高;同主机内是硬盘对拷,速度是网络速度的十来倍,直接拷贝比一般比先对比再传输更快,直接复制整个文件是很好的选择。

写了个脚本测试rsync的行为:

#!/bin/bash
echo "make test file"
dd if=/dev/zero of=testfile bs=1024k count=512
echo "cp test file"
cp testfile syncfile
echo "make changes to test file"
echo '1234567890' >> testfile
echo "rsync file in local..."
rsync -avh -P testfile syncfile

echo ""
echo "restore sync file"
dd if=/dev/zero of=syncfile bs=1024k count=512
echo "rsync file via network"
rsync -avh -P testfile localhost:~/syncfile

测试脚本输出结果如下:

结果和预期的一致:本机内同步时,直接全量复制;走SSH协议后,仅发送差异部分,显著提高效率。

rsync的做法没毛病,但仅做过小部分修改的大文件,同主机内全量拷贝也很伤人。解决办法是用测试脚本内的模拟网络传输。Linux系统的主机基本都内置SSHD,写命令时加上localhost和代表网络的冒号即可;Windows 10的1809版本上,OpenSSH已经成为系统的内置组建,安装和使用也省心。此外有CygwinBitvise SSH Server等可供选择,安装好后也同步大文件也不再是问题。

另一个需要注意的问题是跨分区或设备进行同步时,文件系统应当互相兼容,否则可能会出现问题。例如从NTFS文件系统向(ex)FAT优盘同步文件,使用常用的-avhP选项,每次同步都会将所有文件复制一遍。问题在于两个文件系统支持的功能不同,FAT不支持-l-p等功能,加上这些选项会让rsync判断为两个不同的文件,从而再次复制。针对这种情况,要使用-cvrhP选项。

参考

  1. Smarter filetransfers than rsync?
  2. OpenSSH in Windows
  3. Installing CYGWIN + SSHD for remote access through SSH on windows
  4. Installing SFTP/SSH Server on Windows using OpenSSH
  5. Bitvise SSH Server
  6. rsync not working between NTFS/FAT and EXT