random thoughts

View My GitHub Profile

16 December 2020

Revamping the file system layout

by

Intro

A traditional Linux file system tree in the root file system has quite a number of directories with special purpose, documented in the Filesystem Hierarchy Standard (FHS). An operating system installation by default populates most of those directories with differerent kinds of files, e.g. by means of a package manager. This article analyzes the situation and proposes a radical simplification.

Traditional file system tree

The following diagram shows such a traditional file system tree as used by a default openSUSE installation on a classical filesystem such as ext4 in 2020:

tree1

The colors visualize different kinds of files while gradients indicate a mix of types on the same file system.

The file types according to color are the following:

An example for a mixed file type hierarchy is /etc. The operating system ships files that are not actually meant to be modified, like /etc/profile, or worse, /etc/ld.so.cache, which isn’t even a config file at all. A typical workstation mounts /home from a separate partition. This is visualized by the puzzle piece. Also, the EFI boot partition is mounted onto /boot/efi.

BTRFS, snapshots and transactional mode

With the introduction of BTRFS, the operating system gained the ability to take snapshots and roll back the system in case of troublesome updates. So it was required to define what should be part of a snapshot and what not. User documents for example must not be rolled back. Furthermore, databases can’t really be snapshotted nor rolled back by the OS as the structure is application-specific and follows its own transaction mechanism. The usual configuration files in /etc need to be rolled back though, as some are tied to the software version installed.

That lead to the separation of the filesystem into five subvolumes, namely /root, /var, /srv, /opt and /usr/local. /tmp now usually even resides on tmpfs. That means the operating system’s package manager cannot really install files in these directories anymore. Otherwise, a rollback would lead to an inconsistent package database. So the number of locations with mixed OS files and data got reduced at the cost of more subvolumes.

The following diagram shows a typical file system tree a default openSUSE installation on BTRFS in 2020:

tree2

In transactional mode (e.g. MicroOS), there’s a further complication. The green parts of the tree are read-only at runtime. So the writable parts of /etc is actually located below /var. An overlayfs makes those files appear in /etc/.

Grouping and separating by data type

The data on the purple partitions is actually of a similar kind. Is there any gain in having separate partitions or subvolumes for these by default? Probably not. So, in order to reduce the amount of subvolumes again, they could be moved into the /var volume, for example /var/home, /var/srv, and so on, with the original directory as symlink. If the workload is known exactly, an admin could still make an educated decision to have separate partitions.

tree3

With this simplification, only /boot and /etc still mix OS files with other types.

Boot files

The initramfs is generated by a script, triggered by package installation. The boot loader files (usually grub2) are managed via scripts. By moving the kernel image out of /boot and into the operating system space, e.g. /usr/lib/linux, /boot would become entirely managed by scripts. Since modern system have an EFI boot partition anyway, that boot partition can be mounted right onto /boot.

tree9

Config files

There are already ongoing efforts to move all files that are not actually meant to be edited, or only serve as default, from /etc to /usr, for example /usr/etc.

So then, /etc would actually only contain locally generated configuration (e.g. by the admin). Therefore, it could actually be separated from the rest of the operating system and the OS tree can become read-only.

For MicroOS, that would mean that, when there are no longer OS files in /etc, the lower directory (of the overlayfs) would basically be empty, therefore the need for an overlay vanishes and /etc can become its own (sub)volume without further tricks.

tree10

This new tree no longer has any directories with mixed file types.

UsrMerge

What’s left is an operating system that owns the root file system and /usr. The split between / and /usr is actually a legacy concept that no longer applies. The initramfs can mount all partitions just fine, so there’s no need to have operating system files directly in / anymore. Other vendors already merged all operating system files into /usr to further simplify things:

tree7

/ without OS

Now there’s basically nothing left directly within / that needs to be shipped by the OS, just a bunch of mount points and symlinks that point to /var or /usr. Since the operating system’s package manager has no other business outside of /usr, the actual root directory could be assigned to the /etc/ volume, which is to say, / and /etc are located on one subvolume (same stat -c %d value).

tree6

Potential

With that clear separation of file types an OS tree limited to /usr there’s potential to use the system in new ways

Consequences and TODO

Summary

A traditional Linux file system layout already carries the idea to separate the OS, config and data. Actually storing those different types of files in separate locations and limiting the scope of package management was never implemented with all consequences. Doing so would be a major task but still evolutionary step that makes small systems simpler while retaining all flexiblity of today’s systems. With the new layout, it’s possible to use the same technology and (binary) packages for building traditional Linux systems as well as new, compact systems for Edge/IoT, set-top boxes or routers and container hosts and runtimes with very little effort.

Appendix

Volume properties

With the files separated and stored on BTRFS as outlined in this article, the subvolumes resp partitions holding the content can have different properties

Category RW CoW Snapshot
config no yes yes
data yes no no
boot no no no
OS no yes yes

The OS and config volumes can leverage copy-on-write benefits as well as snapshotting. The boot volume might be a simple file system such as FAT (EFI boot partition for example), so advanced file system features cannot be expected there. Only the data volume needs to be permanently writeable.

tags: