Using the NTFS journal for backups

This post (in draft for almost 18 months) describes my amateur understanding of an interesting and useful NTFS feature, the USN Journal, and shows how I’m using it as part of a simple backup program in Python. It also gives some examples of how low-level Windows system calls can be made in Python, using the pywin32 modules.

One of the many features of the NTFS file system is its optional journalling. Journalling in the context of file systems usually means writing data to a transaction log, which can be replayed in the event of a crash. (NTFS does do that, but it’s referred to as the NTFS log feature.) This post addresses a different kind of journalling: that performed by the USN Journal.

The USN Journal is a log of all updates to files and directories on the volume. The purpose of it appears to be to provide an efficient way for applications, such as backup tools, to find out what changes have occurred within a given time. A USN is an Update Sequence Number: an incrementing label for each entry in the journal. A period of file system activity recorded in the journal is therefore a range between two USNs.

The journal does have some limitations, though:

  1. It only records metadata changes. It will tell you if a file was opened for writing, but not whether anything was actually written to it (let alone the before and after contents of the file).
  2. It doesn’t record external data changes, such as those caused by cosmic rays, hardware failure, or meddling in the filesystem by other operating systems.
  3. It only has limited space allocated to it. When the space is exhausted, older journal entries are lost. So, an application can use it as an optimisation, but should always provide a fallback for when the journal does not have a complete set of change data.

Journal API

The journal can be manipulated via the DeviceIoControl system call. The USN Journal is reasonably well documented on MSDN, as are the DeviceIoControl control codes relating to it.

It’s also possible to perform the manipulations in Python using the Python for Windows extensions. That’s what I’ve used for my backup program. The code for using DeviceIOControl to query the basic journal information in Python is, for instance:

import struct
import win32file
import winioctlcon

def open_volume(drive):
    volh = win32file.CreateFile('\\\\.\\' + drive, win32file.GENERIC_READ,
            win32file.FILE_SHARE_READ | win32file.FILE_SHARE_WRITE, None,
            win32file.OPEN_EXISTING, win32file.FILE_ATTRIBUTE_NORMAL, None)
    return volh

def close_volume(volh):
    win32file.CloseHandle(volh)

def query_journal(volh):
    fmt = 'QQQQQQQ'
    len = struct.calcsize(fmt)
    buf = win32file.DeviceIoControl(volh, winioctlcon.FSCTL_QUERY_USN_JOURNAL, None, len)
    tup = struct.unpack(fmt, buf)
    return tup

volh = open_volume('C:')
UsnJournalID, FirstUsn, NextUsn, LowestValidUsn, MaxUsn, MaximumSize, AllocationDelta = query_journal(volh)
close_volume(volh)
print 'Journal id is 0x%016x' % UsnJournalID
...

The format QQQQQQQ corresponds to the USN_JOURNAL_DATA_V0 structure defined for use with FSCTL_QUERY_USN_JOURNAL. Most journal commands have defined structures for input and output, which can be created and parsed using Python’s struct module. The wrapper function win32file.DeviceIoControl is slightly simpler than the underlying C function, and takes arguments for the volume handle, control code, input buffer (None if not required), and maximum size of the output buffer. The return value is the output buffer.

Robustness of journal data

The journal mechanism is designed to be robust, in particularly regarding the third limitation mentioned above.

USNs always increment, and journals have unique identifiers that change if the USN range overflows or the journal is recreated. Applications can use the journal id and the last processed USN to determine whether it is safe to use the journal as a complete record of changes, or whether they must process the entire file system, so that the journal can be used in future.

As an example, let’s say C:\Docs is backed up every week. When it finishes a backup, the tool will make a note of the last USN, X, in the journal as at the time the data is backed up. Next time it is run, it replays the journal from X onward, receiving a list of files in C:\Docs that have changed since that time. It can then efficiently backup these files only, knowing that no regular file system activity has altered the contents of any other files.

It’s possible that so much activity has occurred in the last week, that some of it is longer recorded in the journal. In this case, the earliest recorded USN in the journal will be Y > X. The backup tool detects this, and enumerate the volume’s complete USN data, running a full backup to be sure it has processed all potentially changed data. (It can still make other optimisations such as checking whether file contents is the same as the last backup, though.)

Enumerating USN data is done using the FSCTL_ENUM_USN_DATA control code. Each call to DeviceIoControl with this code will return a buffer of user-defined size, containing as many USN records as will fit. Repeated calls are made with the last received record’s FRN, until all data has been received. USN records are in the form of the variable-length USN_RECORD_V2 structure.

Reading the journal normally is a similar process, but uses the FSCTL_READ_USN_JOURNAL code. The USN to replay from is provided by the caller, and records are returned as with FSCTL_ENUM_USN_DATA, using the last received record’s USN in subsequent calls.

The Python file journalcmd.py contains code for making these calls and iterating over the results.

Other potential disruptions are when so much activity has occurred that the USN values have wrapped around, or the USN Journal has been deleted and recreated on that volume. In each case, NTFS will assign a new distinct identifier to the journal. The backup tool records this value too, and if it has changed since the last backup, then the tool knows to run a full backup.

Maintaining the FRN map

Directories pose a complication to the journal data. Each file exists in a directory, but not all actions on the file are recorded as actions on the directory (let alone its parent directories). Similarly, actions on a directory are not generally recorded as actions on all the files within it.

Directories and files are assigned unique FRNs (File Reference Numbers). An entry in the journal records the FRN and name of the affected file, as well as the FRN of the parent directory in which it occurred.

When a journal is used for the first time (as in the full backup described above), the set of directories and files on the volume is provided as a set of journal entries. But each item is described only relative to its parent. For example, the file C:\Docs\Work\Projects.txt will appear as a record containing FRN, parent FRN, and name values similar to:

1012 987 Projects.txt

Where 1012 is the FRN of the file, and 987 is the FRN of its parent. To determine the full path of the file, the program must also have received records such as:

987 901 Work
901 554 Docs
554 219 (root directory)

It can then trace back through the FRNs to find the full path for the item: /Docs/Work/Projects.txt. The advantage of separating files from their full paths is that if a directory is renamed or moved, all the files within it remain “unchanged”: only the parent (and possibly its old and new parent, if it is moved) will require journal records.

If the program needs to robustly identify the full paths for items in the journal, it needs to maintain a map from FRNs to parent FRNs and names. The map should be persisted alongside the journal id and last USN. Building the map and maintaining it can be done by montoring changes to directory items.

The Python file journal.py implements a Journal class, which encapsulates the logic for reading journals and maintaining the FRN map and journal state. It is used by the backup program for journal functionality, but can also be used as a command-line program to print paths that have changed since an earlier invocation.

Files on NTFS volumes can reside in more than one place. For instance, a hard link to an existing file can be created in the same or another directory. Currently, the backup program does not treat multiple links to the same file specially; each instance is treated as a separate file. The system call GetFileInformationByHandle could, in principle, be used to determine whether two directory entries were the same file, and hence optimise the backup by copying only one of them.

Detecting affected files and directories

The journal gives the program a list of changes that have occurred in the volume since it last ran. But how can the program find the actual affected files? Affected can have a different meaning from changed. For a backup tool, affected indicates whether the tool needs to fully process the item. The causal link between the two is subtle:

  • If a directory has changed, all its child files are affected.
  • If a file has changed, all its ancestor directories are affected. (If a directory is unaffected, then it does not need to be recursively processed.)

For example, the first rule says that if /Docs/Work is renamed to /Docs/Business, then, even though there is no change to /Docs/Business/Projects.txt, it should still be backed up.

As an example of the second rule, if /Docs/Work/Projects.txt is changed, then the backup tool must recursively process /Docs and /Docs/Work to eventually back it up. If none of the files under /Docs were changed, then the backup tool could backup /Docs in a more efficient way, such as by making a symlink to the previous backup.

The symmetry of these two rules, combined with the fact that it’s easier to look up a path’s ancestors than its children, suggests an implementation where changed files and affected directories are maintained when the journal is processed. Then a directory of the volume can be scanned for affected files and directories relatively efficiently. This is the approach taken in journal.py.

The basic algorithm for backing up a source directory is then:

Open the journal, creating a Journal class.
Update the journal from NTFS, populating its affected file data.
Back up the source directory.
Close the journal, persisting its new state to disk for next time.

To back up a directory:

Is the directory affected according to the journal?  (A new journal marks everything as affected).
If no, then create a symlink (or junction point, in Windows) to the previous backed up version of the directory.
Otherwise, back up each item in it recursively.

To back up a file:

Is the file affected according to the journal?
If no, create a hard link to the previous backed up version of the file.
Otherwise, copy it.

(The script also tries to reuse similar files, by keeping a manifest of all files with their checksums. But this is separate from its journal use.) In a future post I’ll show how to create Windows hard links, symlinks, and reparse points with Python.

Advertisements
This entry was posted in Programming and tagged , , . Bookmark the permalink.

5 Responses to Using the NTFS journal for backups

  1. Pingback: Simple but efficient backups | EJRH

  2. Simon says:

    Hi EJRH,
    I’d love to see this source code but it seems to be a broken link. I’m programming an open-source DropBox clone that uses FTP:
    https://code.google.com/p/iqbox-ftp/
    I think using the USN journal could be a huge help in making it perfectly efficient for Windows.
    Let me know. Great article

  3. ejrh says:

    Hi Simon, I’ve been slowly migrating some ones to Github and have been even slower about updating the broken links on my blog! Here’s the new location.

    It’s not a user-friendly project yet but by all means take a look. I’d be interested to know if it helps.

  4. Pingback: Windows / Google Drive file sync utility

  5. Pingback: Handle USN journal size full case | Solutions for enthusiast and professional programmers

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s