Instantaneous File-System Snapshots — How They Work
©2009 Darel Rex Finley. This complete article, unmodified, may be freely distributed for educational purposes.
A file-system snapshot is a sort of read-only window into the past, whereby you can see exactly what files and folders were in a volume (presumably a hard drive) at some time in the past — the time when the snapshot was taken. You can see everything in those files: their full, actual content at the time the snapshot was taken. And this can be done without massively replicating everything on the volume. Even better: When you ask the system to create a new snapshot, it happens virtually instantly, no matter how huge and complicated your set of files and folders. How can a snapshot like that be created in an instant?
In the following two diagrams, we will explore precisely how instant snapshots work.
Figure 1
Figure 1 shows what happened in a hypothetical file system during the first twelve days of January. Each color bar represents a file (or folder, but we’ll assume files for the sake of this example). The left end of the bar shows when the file was created, and the right end of the bar shows when the file was destroyed. If the bar ends in a big dot on the 12th, then it still exists today (the 12th). And when a bar changes color, that means that the file’s contents were changed.
It is important to understand that the file system knows nothing of the activity history depicted in Figure 1. All the system knows today (the 12th) is that there are three files (light-blue, orange, and magenta), and it knows the last-modified date-and-time of each (late on the 4th for the light-blue file, late on the 2nd for the orange file, and late on the 9th for the magenta file). The system may have other data about these files, such as their creation date, but that is unimportant to this discussion — all we need to know is that the system knows it has three files, and it knows when each was last modified.
Figure 2
Figure 2 shows the exact same activity, but this time two snapshots were taken — one at midday on the 3rd, and another at midday on the 7th. Now, the system retains old versions of files whose last-modified date predates the most recent snapshot. But the system doesn’t show the users all of these files — only the most current version of each. So on the 12th, if we peruse our file system, it looks the same as it did in Figure 1.
Now, let’s step through the twelve days and see what happened on each day:
1st — Users create three files today: red, purple, and gray.
2nd — Users create one file today: orange.
3rd — Today at about midday, a user requests the creation of a snapshot. The system simply adds the date-and-time of the snapshot to its list of snapshots. That’s it! The system doesn’t do anything else, so no matter how many files there are in the system, the snapshot is “instantaneous.” Writing the current date-and-time to a list of snapshot date-and-times is truly about as instantaneous as a file system operation can be.
Later today, a user creates a file: green.
Figure 2
4th — Today users modify two existing files: the red file is changed to light-blue, and the gray file is changed to yellow. Since the red and gray files have modification date-and-times older than the most recent shapshot (on the 3rd), they are not actually changed, but versioned. The red and gray files are retained, and new, light-blue and yellow files are created that have the same names as the red and gray files, respectively.
5th — Nothing happens today.
6th — Today a user deletes the green file. Since its modification date-and-time is newer than the date-and-time of the most recent snapshot, the system simply deletes the file, and it is lost forever.
Figure 2
7th — Today at about midday, a user requests the creation of a snapshot. The system simply adds the current date-and-time to its list of snapshots and does nothing else. Instantaneous!
Later today, a user creates a file: aqua.
8th — Today, users modify two files: the purple file is changed to black, and the yellow file is changed to dark blue. Since the purple and yellow files have modification date-and-times older than the most recent shapshot (on the 7th), they are versioned.
Figure 2
9th — Today a user changes the dark-blue file to magenta. Since its modification date-and-time is more recent than the most recent snapshot (7th), it is simply changed, and the dark-blue version of the file is lost forever.
Also today, a user deletes the aqua file. Since its modification date-and-time is newer than the most recent snapshot, the file is really deleted, and is lost forever.
One more activity for today: A user deletes the black file. Since its modification date-and-time is newer than the most recent snapshot, the file is truly deleted — however, since it is a versioned file, a new, contentless file, with an “I’m deleted” tag, is created in its place.
10th and 11th — Nothing happens on these days.
So here we are on the 12th. Any normal users of the system will see the light-blue file, but not the red one since it has the same name as the light-blue file, but has an older modification date. They also will see the orange file, and the magenta file. That’s it — They will not see the purple file since it has an older modification date than the contentless “I’m deleted” black file. And they won’t see the “I’m deleted” file either, because that kind of file is never shown.
If a user asks to see the file system as it was on 7th (midday), then the system simply ignores all files that have a newer modification date than that, and so the user will see the light-blue, purple, orange, and yellow files.
If a user asks to see the file system as it was on the 3rd (midday), then the system ignores all files that have a newer modification date than that, and so the user will see the red, purple, orange, and gray files.
Figure 2
Now, some Q&A about the snapshot system described above:
Can I create snapshots in the past?
No. Between snapshots, data can be permanently changed or deleted, and there is no way for the system to get that data back. For example, in Figure 2 the green, black, dark-blue, and aqua files once existed, but now do not. If today (the 12th) you tried to create a snapshot of the 5th, the system would have no way to know that the green file existed on that day, much less reacquire its contents. It’s gone!
I’ve heard that once I create any snapshots, my file system will grow and grow without limit, and never become smaller. Is that true?
No. Your file system will never be smaller than it was at the time of your most recent snapshot, but it can still go up and down from there. For example, in Figure 2 the black and aqua files were really deleted, and the dark-blue file was really replaced with the magenta one (which might be smaller).
However, some systems are configured to automatically generate a snapshot every ten seconds, or something like that. With that configuration, yes, your file system will grow-and-grow and never get smaller. A file will be truly deleted-or-replaced only if it is changed more than once in a ten-second time window between two snapshots.
I’ve heard that once I create any snapshots, then a user or process that makes lots of small modifications to really big files will quickly fill up my file system. Is that true?
No. After your most recent snapshot, a large, frequently modified file like that will be versioned just once, then truly changed without versioning from then on (or until you create another snapshot).
However, if you have the snapshot-every-ten-seconds configuration, then yes, your system may be brought to its knees by frequent modifications of large files. This might be a good reason not to create snapshots so frequently — or, it might be a good reason not to make lots of little changes to very big files, which is, in effect, using a large file as a file-system-within-a-file, when you could have been using the real file system to store many smaller files, and modify only the ones that needed modification.
How quickly can the system process a request to delete a snapshot? Is that instantaneous?
No, far from it. To delete a snapshot, the system must examine every file on the volume to find the files that are needed by only that snapshot, and delete those files as it finds them.
In Figure 2: If today (the 12th) you asked the system to delete the snapshot that was taken on the 7th, then the system would have to examine all eight existing files to discover that the yellow file can be deleted.
Or if you kept the snapshot from the 7th, but asked the system to delete the snapshot from the 3rd, then it would have to examine all eight existing files to discover that it can delete the red and gray files. (But not the purple file — it is needed for the 7th’s snapshot.)
If you have a system with large numbers of snapshots, and you want to delete some old ones to free up space in your file system, it is much more efficient to delete a set of consecutive snapshots — e.g. all snapshots generated in the month of August 2001 — because the system can run through all the files just once, comparing them to the month of August 2001, and thus clean things up about as efficiently as if you were deleting only one snapshot.
If my system is configured to generate a snapshot every ten seconds, won’t the snapshot list get rather long and unwieldy?
If you really want a system like that, you should probably have no snapshot list at all, but simply configure your system to version files every time they are changed or deleted. Then you can select any date-and-time from the past and see what the file system contained at that moment. And if your system gets too full, then just make (or revise forward) a limit of how far back in time the user can look, and delete all files that are no longer needed with that new limit taken into account.
Send me an e-mail!
Back to Tutorials.