Keeping computer and technology news simple.

March 17, 2008

From BFS to ZFS: past, present, and future of file systems

Why we care about file systems

Linus Torvalds

Computer platform advocacy can bubble up in the strangest places. In a recent interview at a conference in Australia, Linux creator Linus Torvalds got the Macintosh community in an uproar when he described Mac OS X's file system as "complete and utter crap, which is scary."

What did he mean? What is a "file system" anyway, and why would we care why one is better than another? At first glance, it might seem that file systems are boring technical widgetry that would never impact our lives directly, but in fact, the humble file system has a huge influence on how we use and interact with computers.

This article will start off by defining what a file system is and what it does. Then we'll take a look back at the history of how various file systems evolved and why new ones were introduced. Finally we'll take a brief glance into our temporal vortex and see how file systems might change in the future. We'll start by looking at the file systems of the path, then we'll look at file systems used by individual operating systems before looking at what they future may hold.

What is a file system?

Briefly put, a file system is a clearly-defined method that the computer's operating system uses to store, catalog, and retrieve files. Files are central to everything we use a computer for: all applications, images, movies, and documents are files, and they all need to be stored somewhere. For most computers, this place is the hard disk drive, but files can exist on all sorts of media: flash drives, CD and DVD discs, or even tape backup systems.

File systems need to keep track of not only the bits that make up the file itself and where they are logically placed on the hard drive, but also store information about the file. The most important thing it has to store is the file's name. Without the name it will be nearly impossible for the humans to find the file again. Also, the file system has to know how to organize files in a hierarchy, again for the benefit of those pesky humans. This hierarchy is usually called a directory. The last thing the file system has to worry about is metadata.

Metadata

Metadata literally means "data about data" and that's exactly what it is. While metadata may sound relatively recent and modern, all file systems right from the very beginning had to store at least some metadata along with the file and file name. One important bit of metadata is the file's modification date—not always necessary for the computer, but again important for those humans to know so that they can be sure they are working on the latest version of a file. A bit of metadata that is unimportant to people—but crucial to the computer—is the exact physical location (or locations) of the file on the storage device.

Other examples of metadata include attributes, such as hidden or read-only, that the operating system uses to decide how to display the file and who gets to modify it. Multiuser operating systems store file permissions as metadata. Modern file systems go absolutely nuts with metadata, adding all sorts of crazy attributes that can be tailored for individual types of files: artist and album names for music files, or tags for photos that make them easier to sort later.

Advanced file system features

As operating systems have matured, more and more features have been added to their file systems. More metadata options are one such improvement, but there have been others, such as the ability to index files for faster searches, new storage designs that reduce file fragmentation, and more robust error-correction abilities. One of the biggest advances in file systems has been the addition of journaling, which keeps a log of changes that the computer is about to make to each file. This means that if the computer crashes or the power goes out halfway through the file operation, it will be able to check the log and either finish or abandon the operation quickly without corrupting the file. This makes restarting the computer much faster, as the operating system doesn't have to scan the entire file system to find out if anything is out of sync.

Jurassic file systems

DEC PDP 11 with Dectape
DEC PDP-11 minicomputer with DECTape

In the early days of computers, when dinosaurs roamed freely and IBM stood above the earth like a Colossus, file systems were born. Back then, they rarely had names, because they were simply considered part of the operating system that ran the computer, and in those days operating systems themselves were rather new and fancy. One of the first file systems to have a name was DECTape, named after the company that made it (Digital Equipment Corporation) and the physical system the files were stored on (those giant, whirring reel-to-reel tape recorders that you often saw in movies made in the 1960s). The tapes acted like very slow disk drives, and you could even run the whole operating system on them if you were desperate enough.

DECTape stored an astoundingly small 184 kilobytes (kilo, not mega) of data per tape on the PDP-8, DEC's popular early minicomputer. It was called a minicomputer only because, while the size of a refrigerator, it was still smaller than IBM's behemoth mainframes that took up entire rooms. Of course, the invention of the transistor and integrated circuit (or silicon chip) allowed another whole round of miniaturization. DEC slowly became extinct while the rest of the world moved to microcomputers, or as most of us call them, computers. (IBM, miraculously, survived, but nobody is entirely sure how.)

CP/M

DEC PDP 11 with Dectape
Gary Kildall working at his computer

Gary Kildall invented CP/M in 1973 because he was lazy. He had a job programming some big dinosaur computers, but he didn't want to have to drive into work every day. So he wrote a program called "Control Program for Microcomputers" that would allow him to store files and run programs from an 8-inch Shugart floppy drive on his little computer at home.

CP/M had a file system, but it didn't have a name. If it was called anything it was the "CP/M file system." It was very simple, mostly because Gary's needs were simple. It stored files in a completely flat hierarchy with no directories. File names were limited to eight characters plus a three-character "extension" that determined the file's type. This was perfectly sensible because it was exactly the same limitation as the big computer Kildall was working with.

Gary Kildall and the company he founded to sell CP/M, Intergalactic Digital Research, soon became very wealthy. It turned out that a lot of microcomputer companies needed an operating system, and Gary had designed it in a way that separated all the computer-specific bits (called the BIOS) from the rest of the OS. Kildall also did this out of laziness because he didn't want to keep rewriting the whole of CP/M for every new computer. The moral of this story is that being lazy can sometimes be a spectacularly smart thing.

Unfortunately for Kildall, other people soon got the same idea he had. A programmer named Tim Patterson wrote his own OS called "QDOS" (for Quick and Dirty Operating System) that was a quick and dirty clone of everything CP/M did, because he needed to run an OS on a fancy new 16-bit computer, and Gary hadn't bothered to write a 16-bit version of CP/M yet. (Here's where being lazy can work against you!) QDOS had a slightly different file system than CP/M, although it did basically the same thing and didn't have directories either. Patterson's file system was based on a 1977 Microsoft program called Microsoft Disk Basic, which was basically a version of Basic that could write its files to floppy disks. It used an organization method called the File Allocation Table, so the file system itself was given the incredibly imaginative name of FAT.

Bill Gates then applied the Theory of Laziness in an even more spectacular way, bought Tim Patterson's QDOS lock, stock and barrel for $50,000, and renamed it MS-DOS. He now was able to sell it to IBM and every company making an IBM clone, and poor Gary found himself quickly escorted from the personal computing stage. From now on the world would have to deal with FAT in all its gory. Did I say gory? I meant glory.

FAT times at Ridgemont High

Given that it was originally a quick and dirty clone of a file system designed for 8-bit microcomputers in the 1970s that was itself a quick-and-dirty hack that mimicked the minicomputers of a decade earlier, FAT was not really up for very much. It retained CP/M's "8 and 3" file name limit, and the way it stored files was designed around the physical structure of the floppy disk drive, the primary storage device of the day.

The File Allocation Table described which areas of the disk were allocated to files, which were free space, and which were damaged and unusable (called "bad sectors"). Because each floppy disk had very little space (the first single-sided disks in the IBM PC could store only 160 kilobytes) the table itself needed to be very small. To keep it small, the disk was divided into clusters—groups of sectors stored next to each other on the disk. The first version of FAT was called FAT-12, because it used a 12-bit number to count the clusters (2 to the power of 12 is 4096, and each cluster was 8KB, so the maximum volume size was 32MB).

The massive storage of the 5.25 inch floppy
The original IBM PC 5.25-inch floppy disk, with superimposed tracks/sectors

FAT had no clue about directories or optimizing file storage, and just threw all the bits of files on the first part of the disk where it found any space. For a floppy disk, this didn't matter much, because you typically only stored a few files on it anyway. However, IBM was getting ready to release their PC-XT with an optional 20MB hard disk. This (for the time) enormous space meant that FAT would need a way to store files in a proper hierarchy. For MS-DOS 2.0, Microsoft added nested directories, labeled with the backslash (\) as the separator, so a file might be stored in, say, C:\MYFILES\NOTES. Why a backslash and not a forward slash, as God (and Unix) intended? Well, the forward slash was already used by some MS-DOS 1.0 programs as a modifier argument (say, FORMAT A: /S to add system files after formatting) and it was just too much work to change it now. Why C:? Well, A: and B: were reserved for the first and second floppy disk drives that everyone had to have before hard disks became standard.

Obsolescence

The introduction of hard disks soon made FAT-12 obsolete, so Microsoft came up with a 16-bit version in DOS 3.31, released in 1987. It had 32KB clusters and could access 216 of them, for an astounding 2GB maximum disk size.

And what about the problem of storage optimization? With a hard drive, a user was always copying and deleting files, and this left holes that new files were crammed into (since FAT just looked for the first space it could store stuff) which led to real problems with fragmentation. This made hard drives work harder than ever, jumping around willy-nilly trying to find all the bits of a file that were strewn about the drive. Did Microsoft fix this problem with FAT? Not at all. They let other companies make programs like Norton Utilities and PC-Tools that would defragment the whole disk in one go, giving users happy evenings to remember forever, sitting watching the screen move little rectangles around.

By 1995, hard disks were getting larger than the 2GB limit, and the 8.3 file name limit seemed even more archaic than it had been 20 years ago. To solve both these problems, Microsoft introduced FAT-32, with an 8TB (terabyte) limit and a special magic ability called VFAT that gave FAT the ability to have long file names without really having them.

VFAT took a long file name (up to 255 characters) and created a short version that fit into the 8.3 straitjacket. For example, "Super-long file name.txt" would be stored as "SUPER-~1.TXT". The remaining letters were thrown into a very strange kind of nonexistence, like living on an astral plane. They were sliced up into 13-letter chunks and stored as phantom directories that were marked with the metadata attributes of Volume Label, System, Hidden, and Read-Only, a combination that confused older versions of MS-DOS so much that they flat-out ignored them. However, if a long file name was deleted from DOS, it considered the associated phantom directory full of Volume Label entries to be empty and deleted it as well. Even stranger tricks were added to check whether the long file name matched the 8.3 one.

Some of these tricks still haunt us today, thanks to shaky third-party implementations of VFAT. For example, while writing this article I had the fun experience of watching a directory on my USB flash drive spontaneously change from "Williams files" to "WILLIA~1" when transferring a file to my iBook.

So did Microsoft address the horrible fragmentation problems of FAT in this version either? The answer is: not so much. But it did at least include a home-grown defragmenting program with Windows 95. Thanks to the power of multitasking, you could even do other things while it was defragging! Except you couldn't, because it would complain that "files had changed" and keep starting over.

This strange VFAT solution was a ghastly hack, but it did allow seamless upgrades of DOS and Windows 3.1 systems to Windows 95 and long file names, while still letting poor DOS users access these files. It also caused Macintosh users to laugh and make "MICROS~1" jokes that nobody ever understood. They probably shouldn't have been laughing, however, because the Macintosh file system had even stranger limitations.

Hello, HFS!

Most people remember the 1984 Macintosh with a kind of romantic haze, forgetting the huge limitations of the original unit. The Mac came with a single floppy drive back when PC users were starting to get used to hard disks. The original file system was called the Macintosh File System, or MFS, and had a limit of 20MB and 4,096 files. It both had directories and didn't have them. The user could create graphical "folders" and drag files into them, but they would not show up in the Open/Save dialog boxes of applications. Instead, all the file and directory information was stored in a single "Empty Folder" that would disappear if modified in any way, only to be replaced by a new "Empty Folder." This worked well for floppy disks, but really slowed down performance with hard drives. File names could be 63 characters long.

The 1984 Macintosh
The original 128K Macintosh, shown with optional second floppy drive

MFS was replaced in 1985 by a system with proper hierarchical directories. Because of this, it was called the Hierarchal File System, or HFS. For some reason, file names were now limited to 31 characters, which was just short enough to be annoying. The file, directory, and free space information was stored in a B-Tree, a type of binary storage structure that allows for fast sorting and retrieval of information. HFS used 512KB clusters with a 16-bit pointer, so the maximum size of a drive was 32GB. Later versions upped the pointer to 32 bits and could thus access 2TB at once.

MFS and HFS introduced an innovative way of handling files, called "forks." Instead of storing metadata in a separate place (such as the place directories are stored), HFS made each file into two files: the file itself (the "data fork") and an invisible "resource fork" that contained structured data, including information about the file, such as its icon. Resource forks were used for far more than metadata, though—for example, they held details of an application's interface and executable code in pre-PowerPC macs. Like prongs on a fork, the data and resource traveled around together all the time, until the file was sent to another type of computer that didn't know about forks. Fortunately, back then computers were very snobby and never talked to each other, so this was rarely a problem.

Instead of using a puny three-letter file extension to determine the file type, HFS used a massively huge four-letter "type code" and another creator code, which were stored in the file system's metadata, treated as a peer to information such as the file's creation date.

HFS didn't mess around with slashes or backslashes to separate directory names. Instead, it used a colon (:) and then made sure that the humans would never get to see this letter anywhere in the system, until they tried to include one in a file name.

All kidding aside, HFS was the first instance in history where a file system was designed specifically to adapt to the needs of the then-new graphical user interface. The whole philosophy of the Macintosh's GUI design was to hide unimportant details from the user. This "human-centric" design was intended to help people focus more on their work than the technical details of the file system.

Of course, nothing is perfect, and all systems that try to abstract away the nasty technical bits occasionally run afoul of what Joel Spolsky calls the Law of Leaky Abstractions. When something broke, such as the loss of the resource fork when a file was sent to another computer and back to the Macintosh, it was not always clear what to do to fix the problem.

HFS had some other technical limitations that could occasionally leak out. All the records for files and directories were stored in a single location called the Catalog File, and only one program could access this file at once. There were some benefits of this approach, like very fast file searches. The first Macintoshes did not have multitasking, so this was not a problem, but when multitasking was added later, it caused problems with certain programs "hogging the system." The file could also become corrupt, which could potentially render the entire file system unusable. Other file systems stored file and directory information in separate places, so even if one directory became corrupt the rest of the system was still accessible.

As was the case for other file systems covered so far, the number of clusters (known as "blocks") was fixed to a 16-bit number, so there could only be 65,535 blocks on a single partition, no matter what size it was. Because HFS (like most file systems) could only store files in individual blocks, the larger block size meant a lot of wasted space: even a tiny 1KB file would take up a full 16K on a 1GB drive. For an 8GB drive the problem got eight times worse, and so on.

This last problem was fixed with HFS+ by Apple in 1998, which came bundled with the release of Mac OS 8.1. HFS+ used a 32-bit number for numbering blocks, and also allowed 255-character file names, although versions of "classic" Mac OS (9.2.2 and earlier) only supported 32-character file names. Oddly, Microsoft Office for the Mac would not support higher than 32-character names until 2004.

Amiga file systems

The Amiga, released a year after the Macintosh, had advanced multimedia capabilities that seemed to come from a decade in the future. However, due to intense time pressures to get the original AmigaOS out the door, the file system was one of its weakest parts. It came from TripOS, an operating system developed by MetaComCo. It used 512KB blocks, but reserved 24KB of each block for metadata. The file system was dubbed OFS for "Old File System" by the Amiga engineers, who replaced it as quickly as they were able to.

The Amiga 1000
The Amiga 1000

FFS, or Fast File System, was released in 1987 along with the Amiga 500, 2000, and AmigaOS 1.3. The major difference was the that metadata was moved from each individual block into a separate space. Like HFS, it was limited to 32-character file names.

To support both OFS and FFS, the Amiga operating system was redesigned so that it could accept multiple file systems in a plug-in format, and this format was documented so that anyone could write his or her own file system if desired. Many people did that, and some of the results, such as the Professional File System (PFS) and Smart File System (SFS), are still used by Amiga fans to this day. The Amiga OS4 operating system for PowerPC Amigas also supports FFS2, a minor rewrite that added the ability to support file names 255 characters long.

Unix and Linux file systems

UFS (FFS)

Unix started out its life as a pun on MULTICS, a very serious multiuser time-sharing system that didn't like being made fun of, but then went on to leave its serious rival in the dustbin of history. Unix almost completely dominated the market for scientific workstations and servers before being neatly replaced by a work-alike clone called Linux, which started out as a pun on Unix. The moral of the story? Puns can be powerful things.

Along the way, Unix set all sorts of standards for how users would store their files. The Unix File System (UFS), also known as the Berkeley Fast File System, became the standard when researchers at the University of California at Berkeley developed a much improved version of the original Unix file system.

Unix grew out of the hacker culture rather than the commercial sector, and its file system reflects those roots. For one, the system is case-sensitive, meaning that README.TXT is a completely different file from readme.txt which is also different from Readme.Txt. Most other file systems preserve the case of a file name, but don't care how the user accesses it. This rather arcane and computer-centered view of file names has survived to this day because changing it now would break software that relies on it. Occasionally, these two worlds can collide violently: moving a web site from a Windows server to a Linux one can sometimes result in broken links when the server asks for some_file.htm and the last person who edited it saved it as Some_File.htm instead.

Aside from being picky about case, the file system exposes its hacker roots in other ways. The pointers that UFS uses to locate files on a disk are called inodes, and unlike other operating systems, Unix is quite happy to show this inode data to the end user. UFS did not start out in life having journaling, so a crash or power outage could leave corrupted inodes and lost files. The solution to this was to run a utility called fsck (for File System Check) and watch as the OS told you all its dirty inode secrets.

As in other file systems, data is stored in discrete components called blocks. The standard block size kept increasing as disks became larger, but was eventually standardized at 8KB. Because this large a block size would waste a lot of disk space, BSD implemented a clever algorithm called block suballocation where partially-filled blocks from several files could be stored in a single block.

UFS also tried to minimize excess movement of hard drive heads by storing file and metadata close to each other in groups called cylinders and attempted to keep all the files from a single directory in adjacent cylinders.

Many Unix vendors implemented their own versions of UFS, often making them incompatible with other Unices. Sun added journaling to its version in Solaris 7, and NeXT created its own version of UFS for NeXTstep.

ext2

ext2 was "inspired" by the UFS design and is essentially a work-alike clone of UFS in the same way as Linux is a work-alike clone of Unix. This allowed the easy porting of many Unix utilities such as fsck, which sounds a bit like profanity if you say it fast enough. Like UFS, ext2 lacked journaling, but it also eschewed some other safety checks that UFS had in order to run faster. Because of this, saying "fsck" over and over again was sometimes necessary.

Linux, which started as a hobby project in 1991 to replace Andrew Tanenbaum's teaching OS Minix, originally had a clone of the Minix file system, but it was limited to 64MB and was quickly replaced by ext. ext2 was developed in 1993 to address some of the limitations of the original ext and survived for many years afterwards. ext2 has the same "cylinder" system as UFS but calls them block groups instead.

ReiserFS

Because ext2 lacked journaling and Linux was built on the spirit of open source and contributions from anyone, there were plenty of people who wanted to Build A Better Linux File System. One person who actually succeeded was Hans Reiser, who modestly titled his work ResierFS.

ReiserFS added not only journaling, but attempted to improve on many other aspects of ext2. B-Tree indexing and advanced block suballocation routines sped up the file system significantly, particularly when dealing with small files on the order of 1KB.

ReiserFS garnered much praise and even major industry support from Linux distributions such as SuSE, until the wheels started to come off for reasons that were primarily nontechnical.

First, Hans Reiser decided that he was no longer going to support or update ReiserFS, preferring to work on its successor, dubbed Reiser4. The new version performed well, but it was not a clean update from ReiserFS and required users to reformat. There were some questions about the reliability and stability of Reiser4, but these could have been dealt with in time.

What really threw the community for a loop was something nobody could have foreseen. In 2006, Hans Reiser's wife, Nina, was declared missing. Blood matching her DNA profile was found on a sleeping bag in Hans' car. Hans pleaded not guilty, and his criminal trial is currently under way.

ext3

The popularity of ReiserFS had began to wane, and in the confusion over ReiserFS's future, many of those still using it switched to the safer ext3, which was essentially ext2 with journaling support added on. While not likely to win any speed derbies, ext3 retains its predecessors' legacy of time-tested reliability.

JFS

JFS was IBM's entry into the Unix file system game and added journaling to UFS. It is an open-source reimplementation of the older proprietary system used by AIX, IBM's peculiar version of Unix. JFS used B-Trees to speed up file access and also introduced the concept of extents, which are 128-byte chunks of inodes. This prevents the file system from having to dedicate fixed amounts of space for inodes the way ext2 and ext3 do.

XFS

XFS came from Silicon Graphics' version of Unix, dubbed Irix. First introduced in 1994 with Irix 5.3, XFS has been optimized for speed and reliability, winning many speed comparison tests. It is a 64-bit file system with a maximum volume size of 8 exabytes. It uses extents and has many advanced features such as being optimized for multithreading—multiple threads can operate on the same file system simultaneously.

In 2000 SGI released the XFS source code under the GNU Public license, and in the following years, many Linux distributions have added support for this file system.

IBM and Microsoft duke it out

OS/2 and HPFS

Most kids won't remember it today, but IBM once briefly toyed with the idea of competing directly with Microsoft for the prize of personal computer operating system dominance. Even more unusual was the fact that this competition was originally a partnership.

OS/2 Warp
IBM's OS/2 Warp (Version 3.0)

Even IBM, with its 10,000 layers of management and more bureaucracy than the Soviet Union, realized that DOS was badly in need of a replacement. IBM decided that it was going to design a successor—brilliantly named OS/2—which it would then fully own, but which Microsoft would do all the work of actually writing. Steve Ballmer, back before he was known for jumping up and down and throwing chairs, once described how IBM was viewed by the computing industry back then. "They were the bear, and you could either ride the bear, or you could be under the bear!" So Microsoft went along with this crazy plan.

OS/2 was to be a multitasking operating system, with a fancy GUI that was to be bolted on later. It took forever to arrive, had difficulty running DOS applications, and required more RAM than most computer users could afford in their lifetimes, so it went over about as well as New Coke. For version 1.2, which was released in 1987, IBM wanted a new file system to replace the awful FAT. Thus was born HPFS, for High Performance File System, written by a small team led by Microsoft employee Gordon Letwin.

HPFS used B-Trees, supported 255-character file names, and used extents. The root directory was stored in the middle of the disk rather than the beginning, for faster average access times. It did not support journaling, but it did support forks and had extensive metadata abilities. These new metadata were called Extended Attributes and could be stored even on FAT partitions by saving themselves in a files called EA_DATA.SF. Extended attributes were also supported in HFS+, but were not exposed in an Apple OS until Mac OS X 10.4.

Microsoft and IBM then went through a rather messy divorce right around the time Windows 3 was ready to be released. (IBM wanted to own that, too, and Microsoft really really didn't want that.) Microsoft refocused its efforts on Windows after version 3 became a smash success, while IBM kept the code it had and added a bunch of extra user interface code they had lying around from various dalliances with Apple and NeXT. Thus was born OS/2 2.0 and the object-oriented Workplace Shell, which had a brief day in the sun (and was even advertised, bizarrely, at the Fiesta Bowl) before Windows 95 arrived and crushed it into the ground. IBM later ported JFS to OS/2, much to the delight of the three people who still used it.

NTFS

Windows NT 3.1 install CD.
Windows NT 3.1 install CD. Note supported platforms!

Microsoft also knew that DOS needed a replacement, but was soured on its experience with IBM. In Bill Gates' second spectacular application of the Theory of Laziness, he hired Dave Cutler, the architect of DEC's rock-solid VMS operating system, just as DEC was going into a downward spiral from which it would never recover.

Dave Cutler took his team with him, and despite lawsuits from the dying DEC, implemented a clean-room implementation of a brand new operating system. Along with a brand-new OS came a brand-new file system, which initially didn't have a name, but was later dubbed NTFS when the OS itself was named Windows NT. NT was a marketing name that stood for New Technology, but it was still an amusing coincidence that WNT was VMS with each letter replaced by the next one.

NTFS was an all-out, balls-to-the-wall implementation of all the best ideas in file systems that Cutler's team could think of. It was a 64-bit file system with a maximum file and volume size of 264 (16 exabytes) that stored all file names in Unicode so that any language could be supported. Even the file date attributes were stretched to ridiculous limits: Renaissance time-travelers can happily set their file dates as early as 1601 AD, and dates as late as 60056 AD are supported as well, although if humanity is still using NTFS by that time, it will indicate something is seriously wrong with our civilization. It first was unveiled to the public with the very first release of Windows NT (called version 3.1 for perverse marketing reasons) that came out in 1993.

NTFS used B+Trees (an enhanced and faster version of B-Trees also supported in HFS+), supported journaling from day one, had built-in transparent compression abilities, and had extremely fine-grained security settings by using Access Control Lists (ACLs were added with NT 3.5, released in 1994). It was designed so that Microsoft could add extra metadata support until the cows came home. Indeed, NTFS's support for metadata was so extensive that sometimes it took Microsoft's operating system team a while to catch up with features that were already there.

For example, super-fast indexed searching of both files and metadata was available in NTFS since the release of Windows 2000, but it took until 2005 before Microsoft released a graphical interface that supported this, and it didn't become a part of the operating system itself until Windows Vista in 2006. Hey, sometimes things get forgotten. Anyone up for redesigning the Add New Font dialog box?

Additional features were added to NTFS in later versions of Windows NT. Version 3.0, released with Windows 2000, added the aforementioned indexed metadata searching, along with encryption and disk quotas so that students could no longer fill up the file server with pirated MP3s.

NTFS stores all of its metadata information in separate files, hidden from the regular operations of the OS by having filenames that start with the $ character. By storing everything as a file, NTFS allows file system structures to grow dynamically. The file system also resists fragmentation by attempting to store files where there is enough contiguous space, not just in the first available space on the drive.

To ease the transition from FAT-based operating systems to NTFS ones, Microsoft provided a handy tool for users of Windows 98 and earlier that would safely convert their FAT16 and FAT32 partitions to NTFS. It wouldn't go the other way around, but honestly, would you want to?

The only thing anyone could really find to complain about NTFS was that its design was a proprietary secret owned by Microsoft. Despite this challenge, open-source coders were able to reverse-engineer support for reading—and, much later, writing—to NTFS partitions from other operating systems. The NTFS-3G project allows any operating system to read and write to NTFS partitions.

Dead ends and thoroughfares

BeOS and BFS

In 1990, Jean-Louis Gasee, the ebullient former head of Apple France, had an epiphany. He decided that the problem with current desktop operating systems like Mac OS and Windows was that they were weighed down with too much baggage from supporting legacy versions of themselves. His solution was to start from scratch, creating a brand new hardware and operating system platform using all the best ideas that were available. He was inspired by the Amiga team—which had done the same thing back in 1982—and even got "AMIGA96" vanity license plates for his car.

The hardware team ran into troubles almost immediately when their choice of processor—the AT&T Hobbit—was discontinued by its parent company. Switching to the PowerPC, the team delivered a geek's dream: a fast, affordable, multiprocessor computer with tons of ports and slick Cylon-esque blinking lights. Unfortunately, the BeBox, which shipped in late 1995, sold fewer than 500 units in total. As Steve Jobs discovered with NeXT, the reality of the desktop market was that there was no room for a new hardware platform any more. Be, Inc. quickly ported its operating system to the Power Macintosh platform, and then to the much larger realm of x86-compatible PCs.

BeBox and BeOS
The PPC BeBox, running BeOS

The BeOS needed a file system, and its initial goals were grand indeed. The original hierarchical file system on the BeBox (dubbed OFS) linked directly to a relational database, allowing for all kinds of flexibility and power. Unfortunately, the database-driven design was too slow and there were problems keeping the database and file system in sync. With the move to PowerPC Macs the BeOS now had to support HFS as well, so the file system infrastructure needed to change. Thus was born the Be File System, or BFS.

BFS, written by Dominic Giampaolo and Cyril Meurillon, was a 64-bit file system that supported journaling and used B+Trees, just like NTFS. It also needed to support the database features that the original BeBox's OS had used, which allowed users to store any file information they wanted in a series of records and fields. This information was put into metadata, and users were allowed to add as many fields as they liked to each file.

This extensible metadata idea seemed novel at the time, but it's important to recognize that NTFS already supported basically the same thing. The major difference was in the user interface: the BeOS directory windows supported adding, editing, sorting, and indexed searching by any metadata field. This was extremely handy for organizing one's MP3 collection.

Back in 1996, few people were using Windows NT and NTFS, so BFS's 64-bit support, journaling, and extensible metadata all added to the impression of BeOS as being an advanced operating system. Unfortunately for Be, Inc. the company could not find a sustainable business model for selling the OS, and after a disastrous "focus shift" to dedicated Internet Appliances like the crazily-named Sony eVilla, the company ran out of money and was sold to Palm in 2001.

Mac OS X and HFS+

One of the reasons that Be, Inc. went out of business is that Jean-Louis Gassee was banking on his old company, Apple, bailing him out by buying the BeOS to use for their next operating system. Apple's traditional Mac OS had not aged very well. Its lack of memory protection, automatic memory allocation, and preemptive multitasking was starting to hurt the company very badly, and when Apple's internal replacement, Copland, fell apart in a pile of sticky entrails, the company went shopping for a replacement. Solaris, BeOS, and even the hated Windows NT were all in contention for the prize, and BeOS was the leading candidate. The ever-colorful Gassee said he "had Apple by the balls and was going to squeeze until it hurt."

He squeezed a little too hard. Someone inside Apple made a phone call to Steve Jobs at NeXT, and the rest is history. Steve came back to Apple riding on a white charger and brought his NeXTstep team with him. After some confusion over whether Apple was going to offer both the traditional Mac OS and a high-end version called Rhapsody, Steve took full control and decided that he was going to merge the two. Thus was born Mac OS X.

Mac OS X version 10.0
Macintosh OS X 10.0, released in 2001.

The merging was not entirely pretty. NeXT's core was based on Mach, which itself had become messily entangled with BSD Unix back in its academic past. Unix lived for the command line, whereas MacOS had shunned the CLI as being unfriendly. Unix used traditional file extensions to identify file types, whereas MacOS hid this information in a resource fork. And finally, Unix's file system was the UFS, whereas Mac OS ran on the HFS+. Deciding which side to support was always a battle.

For some of these battles, the NeXTies won. File extensions were championed as the "new way" to identify file types. For the first release of OS X, the user was asked to choose between UFS and HFS+ when installing the OS on a new hard drive partition. For compatibility reasons, however, HFS+ was chosen as the default. Choosing UFS was also not recommended because it was case-sensitive, and would therefore break some third-party applications that were sloppy about file name capitalization.

Over time, the influence of the NeXT people waned, and some of their hard-line decisions were revisited. UFS support was finally dropped in Leopard, the latest version of OS X. File extensions, while still supported, were no longer mandatory.

There was still the issue of bringing HFS+ up to more modern standards. In OS X 10.2 (Jaguar), journaling was added to the file system, although it was turned off by default and could only be enabled via the command line. In 10.3 (Panther) it was enabled by default. Apple also hired Dominic Giampaolo, co-creator of BFS, and he worked to add extensible metadata, journaling, the initial implementation of Spotlight, and FSEvents. Lastly, NTFS-style fine-grained file permissions were added in Mac OS X Server 10.4.

ZFS and the future of file systems

Many people wondered why Apple didn't just ditch HFS+ and replace it with something newer and sexier, like Sun's ZFS. The Zettabyte File System, announced in 2004, is a 128-bit file system that supports files of ridiculous file sizes (16 exabytes) and an absolutely ludicrous limit of 256 zettabytes (2 to the power of 78 bytes) for the total accessible size of the file system. Project leader Jeff Bonwick said that "Populating 128-bit file systems would exceed the quantum limits of earth-based storage. You couldn't fill a 128-bit storage pool without boiling the oceans." It would literally take a computer made of pure energy, emitting enough energy to bring the entire world's oceans to a boiling point, to fill up the limits of a 128-bit file system. It seems unlikely that anyone is going to build a 256-bit file system any time soon.

ZFS also has a novel way of dealing with the old bugbear of multiple partitions. ZFS builds on top of virtual storage pools called zpools, so all connected drives appear to be part of a single gigantic partition. Drives can be seamlessly strung together in various virtual RAID configurations, which can then automatically "self-heal" if data on one mirror is damaged. ZFS can also automatically take snapshots of every single change made to a file, saving only the differences, so no data can ever be lost.

ZFS Storage Pools
ZFS Storage Pools. Come on in, the water's fine! It's not even boiling yet!

There are other fancy ZFS features too numerous to list here. It will basically do everything but cook you dinner, so why doesn't Apple just put it into Mac OS X?

Part of the problem is that ZFS is still maturing, and Sun is still working out the kinks. However, the greater issue is that even if all the bugs are all fixed, moving a whole user base over to a new file system is an uphill task.

File systems are expected to be completely reliable, and users often keep old data lying around on drives formatted with traditional file systems. Microsoft managed to move over some FAT users to NTFS with the conversion utility built into Windows, but primarily the shift came about through attrition, as old Windows 98-era computers were thrown away and replaced by new machines with XP pre-installed. FAT still haunts us to this day, as most flash drives are formatted with FAT32. Why? Because as one of the oldest file systems available, it is also the most understood and easiest to implement.

Often it is easier to stick with well-established file systems, even when the cost of switching is ostensibly "free." The example of Linux, which is completely open source and allows anyone to write a new file system for it, is a useful one. Despite valiant attempts to establish ReiserFS as a new standard, and the measurable superiority of systems like XFS, most Linux users are still using ext3. ext3 is not new. It's not super fast. It's not sexy. It won't cook your dinner. But it is tried and true, and for many people, that is more important.

Microsoft recently tried to resurrect the original BeOS database-driven file system idea with WinFS, which was originally scheduled to be included with Windows Vista. However, delays in releasing the operating system caused Microsoft to take WinFS out of the operating system and instead move it as an optional part of its SQL database product. The future of WinFS remains murky, but Microsoft may try to resurrect it for a future release of Windows.

NTFS is likely to stick around for many years in the future, simply out of sheer inertia. HFS+ may kick around for a few years longer as well. Even FAT may still be on our thumb drives, haunting us with the ghost of CP/M long after everyone has forgotten what that even was.

While file systems may not, by themselves, seem exciting, their history tells us the story of how computers and operating systems have evolved over the years. "By his works, shall ye know him" is true for both humans and file systems. By knowing how the OS stores a humble file, one is provided a glimpse into the limitations and aspirations of its designers.

Source and credits.


Previous entries: