by
A traditional Linux file system tree in the root file system has quite a number of directories with special purpose, documented in the Filesystem Hierarchy Standard (FHS). An operating system installation by default populates most of those directories with differerent kinds of files, e.g. by means of a package manager. This article analyzes the situation and proposes a radical simplification.
The following diagram shows such a traditional file system tree as used by a default openSUSE installation on a classical filesystem such as ext4 in 2020:
The colors visualize different kinds of files while gradients indicate a mix of types on the same file system.
The file types according to color are the following:
/etc
./var
, for example /var/log
or
/var/cache
. /var/lib
is an application-specific hierarchy
where anything can be stored./
as well as /usr
, but also in /etc
,
/srv
, /opt
and /var
, mixing with user or service created
files./proc
, /sys
, /dev
etc. The content is
lost after reboot, or managed by the kernel anyways.An example for a mixed file type hierarchy is /etc
. The operating system ships files that
are not actually meant to be modified, like /etc/profile
, or worse,
/etc/ld.so.cache
, which isn’t even a config file at all.
A typical workstation mounts /home
from a separate partition. This is
visualized by the puzzle piece. Also, the EFI boot partition is
mounted onto /boot/efi
.
With the introduction of BTRFS, the operating system gained the
ability to take snapshots and roll back the system in case of
troublesome updates. So it was required to define what should be
part of a snapshot and what not. User documents for example must not be rolled
back. Furthermore, databases can’t really be snapshotted nor rolled back by the OS as
the structure is application-specific and follows its own transaction
mechanism. The usual configuration files in /etc
need to be rolled
back though, as some are tied to the software version installed.
That lead to the separation of the filesystem into five subvolumes, namely
/root
, /var
, /srv
, /opt
and /usr/local
. /tmp
now
usually even resides on tmpfs. That means the operating system’s
package manager cannot really install files in these directories
anymore. Otherwise, a rollback would lead to an inconsistent package database.
So the number of locations with mixed OS files and data got reduced at the cost
of more subvolumes.
The following diagram shows a typical file system tree a default openSUSE installation on BTRFS in 2020:
In transactional mode (e.g. MicroOS), there’s a further complication. The green
parts of the tree are read-only at runtime. So the writable parts of /etc
is
actually located below /var
. An overlayfs makes those files appear in
/etc/
.
The data on the purple partitions is actually of a similar kind.
Is there any gain in having separate partitions or subvolumes for
these by default? Probably not. So, in order to reduce the amount of subvolumes
again, they could be moved into the /var
volume, for example /var/home
,
/var/srv
, and so on, with the original directory as symlink. If the workload is
known exactly, an admin could still make an educated decision to have separate
partitions.
With this simplification, only /boot
and /etc
still mix OS files with other
types.
The initramfs is generated by a script, triggered by package installation. The
boot loader files (usually grub2) are managed via scripts. By moving the kernel
image out of /boot
and into the operating system space, e.g. /usr/lib/linux
,
/boot
would become entirely managed by scripts. Since modern system have an
EFI boot partition anyway, that boot partition can be mounted right onto
/boot
.
There are already ongoing efforts to move all files that
are not actually meant to be edited, or only serve as default, from /etc
to
/usr
, for example
/usr/etc
.
So then, /etc would actually only contain locally generated configuration (e.g. by the admin). Therefore, it could actually be separated from the rest of the operating system and the OS tree can become read-only.
For MicroOS, that would mean that, when there are no longer OS files in /etc
,
the lower directory (of the overlayfs) would basically be empty, therefore the need for an overlay
vanishes and /etc
can become its own (sub)volume without further tricks.
This new tree no longer has any directories with mixed file types.
What’s left is an operating system that owns the root file
system and /usr
. The split between /
and /usr
is actually
a legacy concept that no longer applies. The initramfs can mount
all partitions just fine, so there’s no need to have operating
system files directly in /
anymore. Other vendors already merged
all operating system files into
/usr
to further
simplify things:
/
without OSNow there’s basically nothing left directly within /
that needs to be shipped by the OS,
just a bunch of mount points and symlinks that point to /var
or /usr
. Since
the operating system’s package manager has no other business outside of /usr
,
the actual root directory could be assigned to the /etc/
volume, which is to
say, /
and /etc
are located on one subvolume (same stat -c %d
value).
With that clear separation of file types an OS tree limited to /usr
there’s potential to use the system in new ways
/usr
is actually mounted from a
static image, network or locally installed and updated by a package manager,
i.e. both the transactional update mechanism as of today, as well as an A/B
scheme would be possible, even in parallel./usr
tree as a whole could be replaced atomically at runtime
without reboot./etc
is no longer needed in transactional mode/etc
and /var
just based on
operating system defaults coming from /usr
. Resetting a system
can be done by erasing those directories./
, including /etc
, as tmpfs. That may include /var
,
or could mount it from disk to still have data/containers on disk./usr
are read-only, independent of any
configuration, such a tree created from packages could be
used as shared runtime for containers or apps (e.g. flatpak)./boot
on a trivial file system can remove the need for very
complex boot loaders like grub. It also means the boot loader does
not have to decrypt the disk to load the kernel in case of full
disk encryption./usr
. Anything else would be out of scope, i.e. the UsrMerge proposal has to
be implemented./usr
,
including /etc
. A concept for config and data migration scripts would be
needed, i.e. similar to how web applications migrate SQL databases back and
forth on upgrades or downgrades./boot
, ideally according to the
systemd boot loader specification.
Depending on whether the boot loader in use can read the OS partition, the
kernel may have to be copied there. The application needs to be aware of
snapshots and know which kernel/initrd combination boots what snapshot./etc
does not work anymore as the /usr
tree is not functional without
it. A replacement needs stay within /usr
boundaries./usr
tree, in case a system boots
up with empty config, the partitioning scheme has to adhere to the
discoverable partition specification.
A similar spec would be required for the case where the OS tree is actually
BTRFS subvolumes.A traditional Linux file system layout already carries the idea to separate the OS, config and data. Actually storing those different types of files in separate locations and limiting the scope of package management was never implemented with all consequences. Doing so would be a major task but still evolutionary step that makes small systems simpler while retaining all flexiblity of today’s systems. With the new layout, it’s possible to use the same technology and (binary) packages for building traditional Linux systems as well as new, compact systems for Edge/IoT, set-top boxes or routers and container hosts and runtimes with very little effort.
With the files separated and stored on BTRFS as outlined in this article, the subvolumes resp partitions holding the content can have different properties
Category | RW | CoW | Snapshot |
---|---|---|---|
config | no | yes | yes |
data | yes | no | no |
boot | no | no | no |
OS | no | yes | yes |
The OS and config volumes can leverage copy-on-write benefits as well as snapshotting. The boot volume might be a simple file system such as FAT (EFI boot partition for example), so advanced file system features cannot be expected there. Only the data volume needs to be permanently writeable.
tags: