Remote server synchronization using lftp

While moving old websites to the new webservers, we frequently need to update files on the new server. Here is the story: the new server is prepared, the site is pulled from Git and the assets (images, uploads) are copied from the “old” server using FTP. Testing takes a few hours, sometimes even a few days, so after the testing phase, we need to synchronize assets. There are new images added by the clients, new files uploaded. In order to do this rather quickly, we are using lftp.

Lftp is a feature-reach FTP client which allows us to create synchronization scripts easily. Here is a sample one:

#!/bin/bash
#
# Configuration
#
HOST='old.server.address'
USER='old_server_username'
PASS='old_server_password'
TARGET='/new/server/directory/'
SOURCE='/old/server/directory'

lftp -f "
open $HOST
user $USER $PASS
set file:charset utf8
set ftp:charset utf8
#
# Dry run only
#
# mirror --only-missing --parallel=8 --dry-run --verbose $SOURCE $TARGET
#
#
# Actual copy
#
# mirror --only-missing --parallel=8 --verbose $SOURCE $TARGET
#
#
# Copy with delete
#
# mirror --only-missing --parallel=8 --delete --verbose $SOURCE $TARGET
#
bye
"

The whole configuration takes place at the beginning of the file. You have to configure the address of the old server (I assume that the script is running on the new one). Please note that the password is saved in the file. It is not the best practice so it should be used only temporarily. In a case, you need to use such a script for a longer time period or as a regular means of synchronization, save passwords in environmental variables or use rsync over ssh with key-based authentication. We are using FTP because we are not able to use rsync on some of our source servers.

Once the configuration is done, you should also uncomment the proper line in the lftp script. In the example above, there are three of them. The first one is performing dry run only – it will display what would be copied instead of actual copying files. It is a good practice to check such a script first before you will let it copy a large number of files.

The second line (Actual copy) performs the copy of the files. Please note that it takes only missing files and performs the parallel download of at most eight files.

The third line (Copy with delete) is almost the same, but this one also removes the files that were removed from the source. So, if during the initial synchronization you pulled the image and it was later removed by the user, it will also be removed by this part of the script. Sometimes it is better to pull only new files without removal of the removed ones (thus the first mirroring version) and sometimes it is better to create an exact copy (with the missing files being removed).

The script can be used as a cron job if you want to automate the synchronization process. It may be useful if the changes are frequent and you don’t want to pull a large number of files at once.