Published in Open Source For You Sep 2015 issue.
It was just another day for me. I turned on the computer, and started listing that day's works in my mind while it booted. But to my surprise, the monitor showed a message 'Hard disk not found.' At first I thought it would be a loose connection problem, which is nothing to worry. But when the last day's programming experiments came to my mind, I was shocked to realize that I had damaged the filesystem of my hard disk!
Thanks to TestDisk (cgsecurity.org), I was able to recover the whole disk with all partitions and files in it. Not a single file was lost. Even the bootloader was restored.
What does this mean to a novice? He may think that any disk damage can be undone and everything can be recovered. But I knew that it was only luck that helped me this time. Then I started to take frequent backups of important data.
Before getting introduced to some important backup utilities for desktop and workstation uses, let's discuss the importance of data backup.
Where recovery fails and backup wins
Data recovery and backup are the two options you have when your important files are lost (the third one is starting everything from scrap, which drives you really crazy). You perform recovery when you delete some files accidentally, or your disk is damaged partially. While data backup is a process of creating duplicates of your data frequently and restoring them when needed, data recovery just initiates when you lost something. That is, you have nothing to do unless you have lost something. You may now feel pity of data backup, which is just like babysitting. But be aware that data backup is always the right and safe method.
As we have mentioned, there are many tools like TestDisk which can recover your whole hard disk. But recovering lost data from a disk has always some issues. It depends on a single storage device and any negative parameter related with it can ruin the entire process. Sometimes the disk is physically damaged. Sometimes the files are overwritten, which makes them unrecoverable. Also, you cannot say for sure that all disks and filesystems are supported.
Backup is a process that you cannot think of when your data is lost. You should take backups of your files/folders when you make important changes. Many backup utilities can automate this process, making things easier. Copies of your data (or changes only) can be stored in a remote location, or just in another local disk, making your data immortal. Restoring from backups is a simple process. You now know why backup is the right choice.
Why simple copy-paste isn't enough
Does 'backup' sounds mystic to you?. For a lot, it seems to be a process that can only be done by experts with the help of sophisticated tools. But let me put it flat, simply keeping a copy of your files in a CD or something is really a backup. If you are an experienced user, you might be thinking I am silly in order to say this as a great point. But believe me, there is a lot to whom data backup is still alien.
However, simple copy-paste isn't enough always. Especially when you are doing a long-term project or something. Imagine shooting a movie. Everyday you capture some video clips and add them to your library (say a folder titled CLIPS). What should be your backup strategy here? You have two choices provided you don't think about the backup software now. The first one is to take the copy of the folder CLIPS each time you make a change, and rename it with the current date (e.g.: CLIPS_28_07_2015). The second one is to take copies of the changes only.
In the first case, you are wasting a lot of space, while in the second one, it is very difficult to keep track of each change. Either case, you have to do everything manually, wasting a lot of time and giving chance for confusion. Now you know why automated backup utilities are necessary.
Online or offline?
We have just seen the importance of automated backup utilities, and we are going to check two of them. But we should discuss another important thing before that: whether automatic or manual, which medium is best for keeping backup? CD, DVD, hard disk, or online? And how to preserve them?
The answer depends on the longevity and reliability of these media. No medium has an infinite lifespan. Actual lifespan of them may be highly controversial and depending on a lot of parameters. But studies show that you cannot rely upon CDs and DVDs over one or two decades while hard disks has an amazingly short lifespan of four-five years. It will be interesting to know there is M-Disc, which is claimed to have a lifespan of 1000 years. But it doesn't seem to be so flexible right now. And I don't think tapes will be of any interest to a desktop user.
This means if you rely upon a local storage device, you have to constantly monitor them and duplicate them. There is also another reason to do this -- present data storage standards will become obsolete as technology advances.
Here comes the importance of cloud storage. Rely upon some online storage provider or buy some space from a hosting company so that they will take care of your backup. That is, you take the backup of your local files and upload it to a server, and your service provider may do everything to keep it forever. With remote storage, you may need encryption to keep your data safe. Thanks to the tools we are going to learn, all these things are automated.
Déjà Dup: the beginner's tool
Don't think Déjà Dup is a novice tool just by seeing the title. The highly automated program is really user-friendly, which makes it suitable for a beginner. In fact, tt is a clever graphical wrapper around Duplicity, which is a sophisticated command-line utility. We'll be having a look at Duplicity later.
Since it is a front-end for Duplicity, it also follows the 'chain method.' That is, a full backup is taken only at the first time. The following ones notes the changes only. A tricky method to save space. However, it is recommended to take full backups periodically so that you don't have to be afraid of any bugs.
Déjà Dup is shipped with Ubuntu while the basic editions of other distros need not have it. You can install it using your default package manager. In Debian-like systems, the following command will do (open terminal, type it, and press Enter):
sudo apt-get install deja-dup
Latest version of Déjà Dup appears as 'Backup' in menus. In Ubuntu Déjà Dup is shipped already and can be opened from the System Settings window. Figure 1 shows the start screen of Déjà Dup in Debian and Figure 2 shows the same in Ubuntu.
Taking backups with Déjà Dup
It is very simple. These are the basic steps:
- Select the folders to be preserved in the 'Folders' tab. Also choose which sub-folders are to be ignored.
- Select the destination (i.e., where the backups are kept) in the 'Storage' tab.
- You can set a schedule in the 'Schedule' tab, which takes effect only if you enable Automatic backups in 'Overview' tab.
- Click 'Back Up Now' button in the 'Overview' tab. You may now be prompted to set a password.
The 'Folders to ignore' facility is highly useful in cases you have to take the backup of a master directory, excluding some junk or bulk content inside it. The classic example is to take the backup of your home folder, excluding Trash and Downloads.
To test the use of Déjà Dup, please choose a lightweight folder in Step 1, select a Local Folder (Folder e.g.: /tmp) in Step 2 and go on.
If you are taking a serious backup, you may choose an external disk as Backup location. We'll see an example for remote storage right after checking how to restore backups using Déjà Dup.
Restoring backups with Déjà Dup
The general process is something like this:
- Click 'Restore...' in the 'Overview' tab.
- Choose restore from where in the newly appeared dialog and click 'Forward.' If you are storing backups in a single location, you have to change nothing here.
- Déjà Dup asks for password if you were storing your backups in a password-protected area (espacially an FTP server). Give it and go on.
- Now Déjà Dup asks for a restore point. That is, restore from when. Choose a date/timestamp and go forward.
- The next frame asks for a location to restore the backup. You can choose the original location or an alternative folder. The first option is best of you accidently deleted a folder. Go on with the wizard.
- Now it asks for the encryption password (if any). This is the password you used to encrypt your files while taking backup. Give it and the restoration works.
Déjà Dup online backup
Déjà Dup supports many remote storage services. The new version supports Amazon S3 and Rackspace Cloud Files in addition to the standard remote storage methods. Here we are going to experiment with the classic FTP storage. In order to perform this, you need an FTP account. In my case, I am using my FTP account in the website that I own.
Here are the steps:
- Choose the folders in the Folders tab. Give your FTP account details in the Storage tab after selecting FTP from the Storage location combo box (Figure 3). Details like server and port can be obtained from your service provider or web hosting control panel. In my case, as I own nandakumar.co.in, the server is ftp.nandakumar.co.in and the port is 21. You have to give the ftp account username also. 'Folder' means the path of the folder where the backups should be kept. The path should be relative to the FTP directory which is allocated to you.
- Click 'Back Up Now' in the Overview tab. In the coming steps, give your FTP account password and the encryption password for the backup (Figures 4 and 5). It's done!
I don't think there is a need to explain the restoration process. Just click the 'Restore...' button in the Overview tab and follow the instructions.
If you'd like to use a web host account for FTP backup storage, it is recommended to create a dedicated account for this purpose. This ensures better privacy, isolation, and security.
The simplicity of Duplicity
Although servers rely upon command line completely, desktop users no longer use it. But there is still a small community of people who like to explore the power and beauty of it. Let us get introduced to Duplicity, the command line backend that powers Déjà Dup.
Duplicity supports GnuPG encryption and allows us to use local or remote storge methods (such as ftp, sftp/scp, WebDAV, WebDAVs, Google Docs, HSi and Amazon S3). It is powered by 'rsync', a fast and reliable file copying tool, whose algorithm helps in efficient data transfer. Duplicity supports many features including deleted files, full Unix permissions, and symbolic links, excluding hard links.
And the good news is, it is not really hard to use.
Duplicity may crash while copying /proc if you are backing up the root directory /. Use '--exclude /proc' to avoid this.
Open a terminal, type 'duplicity' and press the enter key to see something (probably error message) so that you can ensure it is installed in your system. If not, use a package manager or give the following commands:
sudo apt-get install duplicity sudo apt-get install ncftp
The second package is to get FTP support. Yes, it is optional.
The following example illustrates how we can store the backup of '/home/nandakumar/test' in '/home/nandakumar/backup'. Remember that the latter should be replaced with an external storage or there is no use of taking a backup. We don't do that right now for the sake of simplicity.
Here is the command:
duplicity test file:///home/nandakumar/backup
We skipped '/home/nandakumar' from the path of 'test' since the terminal is in the home folder by default (otherwise use a full path or 'cd'). However, Duplicity insists that one of the paths must be a URI. That's why we left the second one intact.
Once this command is executed, the program asks for an encryption passphrase. Give something powerful and remember that.
If this command is run repeatedly, Duplicity copies changes since the last backup only. If you want to take a full backup, use the option 'full':
duplicity full test file:///home/nandakumar/backup
Restoration is also simple:
duplicity file:///home/nandakumar/backup test
If you'd like to exclude some files/subfolders from being backed up, use the '--exclude' option:
duplicity --exclude subfolder1 test file:///home/nandakumar/backup
The non-interactive way
From the above examples, we've seen that most of the time we run Duplicity, it asks for passwords or something. This is inconvenient if we are writing a shell script. There should not be any prompt at all. That's why Duplicity supports input via environment variables. See an example below:
duplicity test ftp://email@example.com/backup
In the above example, the folder 'test' is backed up into another folder 'backup', which resides on ftp.domain.com. If you use a local storage, there is no need of setting FTP_PASSWORD.
We now know the basics of two backup utilities. However, I recommend you to read more about Duplicity to find the hidden power of it (run the command 'man duplicity').
If you are a person who works on some long-term projects, make use of these tools. But it is better to stick to the classic copy-paste method once you've the final product (e.g.: you completed writing a novel). This helps to avoid software dependencies.