by
A traditional Linux file system tree in the root file system has quite a number of directories with special purpose, documented in the Filesystem Hierarchy Standard (FHS). An operating system installation by default populates most of those directories with differerent kinds of files, e.g. by means of a package manager. This article analyzes the situation and proposes a radical simplification.
The following diagram shows such a traditional file system tree as used by a default openSUSE installation on a classical filesystem such as ext4 in 2020:

The colors visualize different kinds of files while gradients indicate a mix of types on the same file system.
The file types according to color are the following:
These are files that are usually written once by the admin or a
config tool to configure operating system features or services and,
once that’s done, are only read. Such files are traditionally placed in /etc.
Objects created by
users or services, such as documents, images, videos or personal
files of users. For services, these are e.g. databases or caches. The
operating system may also produce data, such as log files. Data is
both read and written frequently. Files are located below
/var, for example /var/log or
/var/cache. /var/lib is an application-specific hierarchy
where anything can be stored.
Boot
loader files, kernels and initial ramdisks.
The OS is shipped as packages or image by the operating system vendor. It is
only written to during installation or upgrade of the system, and otherwise
considered read-only. These files would be overwritten on updates
of OS components. In a traditional system, these are installed in
root filesystem / as well as /usr, but also in /etc,
/srv, /opt and /var, mixing with user or service created
files.
For
example, anything on tmpfs, but also virtual file systems that are
kernel interfaces, e.g. /proc, /sys, /dev etc. The content is
lost after reboot, or managed by the kernel anyways.An example for a mixed file type hierarchy is /etc. The operating system ships files that
are not actually meant to be modified, like /etc/profile, or worse,
/etc/ld.so.cache, which isn’t even a config file at all.
A typical workstation mounts /home from a separate partition. This is
visualized by the puzzle piece. Also, the EFI boot partition is
mounted onto /boot/efi.
With the introduction of BTRFS, the operating system gained the
ability to take snapshots and roll back the system in case of
troublesome updates. So it was required to define what should be
part of a snapshot and what not. User documents for example must not be rolled
back. Furthermore, databases can’t really be snapshotted nor rolled back by the OS as
the structure is application-specific and follows its own transaction
mechanism. The usual configuration files in /etc need to be rolled
back though, as some are tied to the software version installed.
That lead to the separation of the filesystem into five subvolumes, namely
/root, /var, /srv, /opt and /usr/local. /tmp now
usually even resides on tmpfs. That means the operating system’s
package manager cannot really install files in these directories
anymore. Otherwise, a rollback would lead to an inconsistent package database.
So the number of locations with mixed OS files and data got reduced at the cost
of more subvolumes.
The following diagram shows a typical file system tree a default openSUSE installation on BTRFS in 2020:

In transactional mode (e.g. MicroOS), there’s a further complication. The green
parts of the tree are read-only at runtime. So the writable parts of /etc is
actually located below /var. An overlayfs makes those files appear in
/etc/.
The data on the purple partitions is actually of a similar kind.
Is there any gain in having separate partitions or subvolumes for
these by default? Probably not. So, in order to reduce the amount of subvolumes
again, they could be moved into the /var volume, for example /var/home,
/var/srv, and so on, with the original directory as symlink. If the workload is
known exactly, an admin could still make an educated decision to have separate
partitions.

With this simplification, only /boot and /etc still mix OS files with other
types.
The initramfs is generated by a script, triggered by package installation. The
boot loader files (usually grub2) are managed via scripts. By moving the kernel
image out of /boot and into the operating system space, e.g. /usr/lib/linux,
/boot would become entirely managed by scripts. Since modern system have an
EFI boot partition anyway, that boot partition can be mounted right onto
/boot.

There are already ongoing efforts to move all files that
are not actually meant to be edited, or only serve as default, from /etc to
/usr, for example
/usr/etc.
So then, /etc would actually only contain locally generated configuration (e.g. by the admin). Therefore, it could actually be separated from the rest of the operating system and the OS tree can become read-only.
For MicroOS, that would mean that, when there are no longer OS files in /etc,
the lower directory (of the overlayfs) would basically be empty, therefore the need for an overlay
vanishes and /etc can become its own (sub)volume without further tricks.

This new tree no longer has any directories with mixed file types.
What’s left is an operating system that owns the root file
system and /usr. The split between / and /usr is actually
a legacy concept that no longer applies. The initramfs can mount
all partitions just fine, so there’s no need to have operating
system files directly in / anymore. Other vendors already merged
all operating system files into
/usr to further
simplify things:

/ without OSNow there’s basically nothing left directly within / that needs to be shipped by the OS,
just a bunch of mount points and symlinks that point to /var or /usr. Since
the operating system’s package manager has no other business outside of /usr,
the actual root directory could be assigned to the /etc/ volume, which is to
say, / and /etc are located on one subvolume (same stat -c %d value).

With that clear separation of file types an OS tree limited to /usr
there’s potential to use the system in new ways
/usr is actually mounted from a
static image, network or locally installed and updated by a package manager,
i.e. both the transactional update mechanism as of today, as well as an A/B
scheme would be possible, even in parallel./usr tree as a whole could be replaced atomically at runtime
without reboot./etc is no longer needed in transactional mode/etc and /var just based on
operating system defaults coming from /usr. Resetting a system
can be done by erasing those directories./, including /etc, as tmpfs. That may include /var,
or could mount it from disk to still have data/containers on disk./usr are read-only, independent of any
configuration, such a tree created from packages could be
used as shared runtime for containers or apps (e.g. flatpak)./boot on a trivial file system can remove the need for very
complex boot loaders like grub. It also means the boot loader does
not have to decrypt the disk to load the kernel in case of full
disk encryption./usr. Anything else would be out of scope, i.e. the UsrMerge proposal has to
be implemented./usr,
including /etc. A concept for config and data migration scripts would be
needed, i.e. similar to how web applications migrate SQL databases back and
forth on upgrades or downgrades./boot, ideally according to the
systemd boot loader specification.
Depending on whether the boot loader in use can read the OS partition, the
kernel may have to be copied there. The application needs to be aware of
snapshots and know which kernel/initrd combination boots what snapshot./etc
does not work anymore as the /usr tree is not functional without
it. A replacement needs stay within /usr boundaries./usr tree, in case a system boots
up with empty config, the partitioning scheme has to adhere to the
discoverable partition specification.
A similar spec would be required for the case where the OS tree is actually
BTRFS subvolumes.A traditional Linux file system layout already carries the idea to separate the OS, config and data. Actually storing those different types of files in separate locations and limiting the scope of package management was never implemented with all consequences. Doing so would be a major task but still evolutionary step that makes small systems simpler while retaining all flexiblity of today’s systems. With the new layout, it’s possible to use the same technology and (binary) packages for building traditional Linux systems as well as new, compact systems for Edge/IoT, set-top boxes or routers and container hosts and runtimes with very little effort.
With the files separated and stored on BTRFS as outlined in this article, the subvolumes resp partitions holding the content can have different properties
| Category | RW | CoW | Snapshot |
|---|---|---|---|
| config | no | yes | yes |
| data | yes | no | no |
| boot | no | no | no |
| OS | no | yes | yes |
The OS and config volumes can leverage copy-on-write benefits as well as snapshotting. The boot volume might be a simple file system such as FAT (EFI boot partition for example), so advanced file system features cannot be expected there. Only the data volume needs to be permanently writeable.
tags: