Just last night I had a few large files I wanted to bring to a friends house. They were HD videos that I took a long time ago and were about 7 GB in total. I figured I would just plop them on a USB drive and in a few minutes I’d be ready to go.
It didn’t quite work out so easily. 15 minutes after I tried to drag the files to the drive in Nautilus it wasn’t apparent that anything was happening. The progress bar had frozen, the flash drive was still doing something, and some of the files looked like they were there. I tried to do it again on the command line using “cp” and got the same results (with no progress bar). I knew there had to be a better way to copy these files and know what was actually going on but first I had to figure out what was really happening.
The initial discovery was that when “cp” copies a file to a vfat (FAT32) formatted flash drive it immediately allocates enough space for the entire file to fit. This is smart because it makes sure you can’t get stuck copying a file to a drive where it won’t fit. It’s also a pain because I can’t open another terminal window and see the file growing as it is being written to.
My next discovery was that disk caching was making the process difficult to understand. Nautilus appeared to copy the file quickly at first and then it just looked like it hung completely.
So I set out to write a script to fix these problems. I knew I wanted all of the following things:
A command-line utility – I wanted to be able to use this program without a GUI
A progress bar – I wanted to get visual feedback without babysitting the script
Files that “grew” while they were being copied – I wanted to be able to check it remotely in a different terminal session if necessary
Predictable behavior with regard to the disk cache – I couldn’t have the disk cache making it seem like the devices were writing very fast, then stalling, then writing very fast again
After an hour I came up with prcp.pl. It is a Perl script that uses mainly Term::ProgressBar and File::Sync to do what I needed. It’s not finished but it’s very functional. What it does is copy the file using sysread and syswrite while fsyncing the output file after each write. This forces what you see on the disk to be consistent with what the program is copying since it largely bypasses the disk cache.
If you’re brave give it a try and let me know what you think. Only use it on data that you have backed up since it has undergone minimal testing so far.