I have a disk full of music and I want to make a new backup.
I copied the directory structure from src to dest:
cp -r src/* dest/
All the files are at the destination. Now I want to be able to do incremental backups using rsync - however, when I try rsync it seems to want to copy everything again! What a bore!
Let's try to find a solution.
"
Rsync finds files that need to be transferred using a "quick check" algorithm (by default) that looks for files that have changed in size or in last-modified time. Any changes in the other preserved attributes (as requested by options) are made on the destination file directly when the quick check indicates that the file's data does not need to be updated. "
I imagine either one of two things might be the cause;
- rsync creates a database. The database doesn't exist yet, so it starts all over again. Maybe I can get rsync to make the database without doing the copying?
- some metainformation was not copied, but was created anew by cp. This makes the files look different to rsync. Solution would be to make the files look the same, and run rsync.
Investigating.
1. Seems to not be true. Rsync doesn't create a database, it uses an efficient checking algorithm to see whether anything has changed.
2. Possible information; the file is different (but I know they won't be!) - something to do with the permissions, etc - something to do with mod times etc.
let's check:
- md5sum src/file1 = md5sum dest/file1 ... yep, the files are the same
- the files have different meta-information (owner, and timestamp) : this could be it!
A little experimentation reveals that its the timestamp data that is the problem.
If I create a new src and dest and make a little file in src, then copy it to the dest, rsync will attempt to copy it again. If I delete the copy in dest, then copy it using cp --preserve ... rsync doesn't try to copy it.
So I need to find a way to change the timestamps on all the files in the destination to be the same as in the source.
matt@blueAcer:~/rsync$ rm dest/*
matt@blueAcer:~/rsync$ cp src/* dest/
matt@blueAcer:~/rsync$ rsync -avzn src/ dest/
sending incremental file list
./
test.file
sent 110 bytes received 34 bytes 288.00 bytes/sec
total size is 6 speedup is 0.04
But if I freshen the timestamps...
matt@blueAcer:~/rsync$ touch src/* dest/*
matt@blueAcer:~/rsync$ rsync -avzn src/ dest/
sending incremental file list
sent 61 bytes received 12 bytes 146.00 bytes/sec
total size is 6 speedup is 0.08
Ah ha! Problem solved. But I guess what I *really* want to do is to make the timestamps in dest the same as they were in src.
touch has a -r [reference file] option:
-r, --reference=FILE
use this file's times instead of current time
touch -r src/test.file dest/test.file
Yes! An rsync after this doesn't try to copy it, and stat shows exactly the same time info for both files. So I could walk my dest directory, touching files w.r.t their origin.
Hmm... ok. This is going to be much faster than rsync ing all those hundreds of Gb of flac files!
It would be nice to be able to use a coreutil for this. Maybe find?
find all flac files in dest, and for each, chop $src from the start of the path, replace it with $dest, then do my touch -r trick with the resulting src file.
A clearly more elegant approach would find pairs of files, and then operate on them. I'm not sure that's possible - I think it would mean running find after each output of the first find. Bad.
dest=dest
src=src
extension=file
find $dest/ -name "*.$extension" -exec touch -r \{\} $src/`basename {}` \;
This doesn't work. The problem lies with the order in which substitution occurs;
basename is getting called with an unsubstituted {} - the base of this is {} ... then when we get to the wrapping context we end up with the substitution of $src with its value, and then the original value for {} - in other words,
basename has no effect.
Same problem happens with
| xargs -n1 -0 ...
A big pain in the arse.
Here's the answer; and it's really simple. We just need to put the filename munging into a script where the touch happens... and call it with exec using parameters from find. It gets called once for each file found by find, and we can pass in the name of the file as {}, along with the values of the root directory of the destination tree and the source tree.
dest=dest; src=src; extension=file;find $dest/ -name "*.$extension" -exec /home/matt/scripts/update.sh {} $dest $src \;
# update.sh
#!/bin/bash
srcFile=${1/$3/$2}
touch -r "$srcFile" "$1"