Jump to page content

Multidimensional Filing System

Caveat

If, from reading these notes, you conclude that I am off my rocker, you won’t be the first, and you may even be right.

No doubt there is a dozen and one reasons why none of this would ever work, but perhaps somewhere deep down there is a tiny fragment that could be used for something.

Contents

Overview

This was one of Jenni’s notions, although neither of us ever formed a clear idea of what it represented.

(Jenni’s name was “multi-dimensional file system” but since these are my notes, it’s going to be “filing system” as a nod to Acorn, since my first computer was a Beeb.)

Described below are some ideas of how it might be used.

Dimensions

Dimensions allow a directories and files to appear in more than one place. This can be done to allow work material or entertainment media to be classified in multiple simultaneous ways and it allows items from self-contained directories to be projected out into other locations. The 2019 draft of this page was based primarily on tags, while in the 2022 version dimensions are overlay onto an otherwise hierarchical file system. Both approaches have pros and cons.

Tags allow individual items to be grouped together with related items. They are ideal for social media, but could prove unwieldy for project work where directory trees are a better fit. A dimension could be akin to a tag but able to be applied to both directories and files. Each dimension would appear as the union of all items tagged as being in that dimension. This would allow any file or directory to appear in multiple places as required. A directory tagged as being in a dimension would appear within that dimension in its entirety including all subdirectories and files.

A tag-based process would require a separation between presentation name and internal name of files, so that multiple files with the same name could appear unhindered within the same dimension when the tag is applied item-by-item. The idea of files having internal numeric IDs independent of the display name already exists in the Macintosh HFS and HFS+ file systems, and is not unlike the POSIX notion of separating directory entries from inodes. POSIX still requires that files are accessed by name, however, and Mac OS file record tuples still used the name of a file rather than its ID.

Applying dimensions solely at the directory level would be safer but not immune to conflicts.

Tags

Within the 2019 all-tags design, tags are broadly divided into three classifications: system, state and user. System tags may only be set by the operating system, or at the operating system’s discretion. State tags provide additional state of a file, but do not necessarily affect its filtering. The details below refer to the 2019 all-tags design, but some principles equally apply to dimensions.

Each tag associated with a file or directory also bears the timestamp when the tag was applied.

Each tag has a security descriptor which controls who can see the contents of the tag, and who can apply and who can remove it from files. This would also apply to the 2022 dimension system, for example preventing the Application dimensions being arbitrarily applied in an insecure manner.

State tags

State tags provide non-grouping characteristics, allowing files to be marked without placing them into a separate sub-group. Files can be hidden, or shown differently, according to these tags, but are not treated differently from their peers in terms of grouping.

State tags include:

Deleted
This forms the MDFS equivalent to the Recycle Bin or Trash. “Deleted” files are still exactly where they were before, but are simply hidden from view. The tag-applied timestamp provides deletion order, to allow users to find files that were recently deleted.
Hidden
Hide the file by default. May not be authorised as hidden files serve too few legitimate purposes.
Favourite
These are files that the user wishes to have ready access to. Tagging a file as a favourite makes this process generic instead of application specific.
Recent
This allows the OS to locate recently-accessed files. This also relies on the tag-applied timestamp to function.

It may also be possible to colour-code files as on the Mac using a set of state tags. Morever, classic Mac–style labels may be possible, using a set of state tags that each have a colour associated. Being state tags, they do not affect file classification and grouping.

Applications

POSIX systems store the various files from packages in different locations, with no visible indication of where each file comes from and how they relate. Windows applications are more self-contained, while macOS and RISC OS applications are typically fully self-contained. The Multidimensional Filing System could provide a means to achieve both approaches simultaneously. For an example application Foo Writer, the application dimension “Foo Writer” would be a chroot-like space laid out as follows:

The application’s primary executable is tagged at the file level so that the Applications::Exec dimension is a simple list of executables akin to /usr/bin or the Macintosh’s Applications directory. Supporting applications not intended to be run separately would be stored inside the application’s directory at the discretion of the developer, not tagged as any system dimension.

This application would be added during installation to the system’s Applications dimension. This is similar to a union system except that no folder contents are merged. The Applications dimensions would then look something like the following:

That is to say, each subfolder of the Foo Writer directory would automatically appear in the corresponding Applications dimension directory as a distinct entry.

Considering the notes below on the similarity to tags, it may be that each subdirectory of Applications above would be its own dimension, with each subdirectory of the application being mapped into the corresponding dimension.

Where an application provides both a user-facing application and a daemon, the daemon executable will be placed into Daemons::Exec and its corresponding manifest in Daemons::Manifests.

The directory contents above is all that is needed to be a complete package, thus the Packages dimension is also applied. Directories in the Applications dimension are user-facing applications, while those in the Packages dimension are packages. For example, a driver package would be in the dimensions Packages and Drivers, while a daemon would be in the dimensions Packages and Daemons. Some additional thought is needed to refine the tagging process to allow all the proper identification to take place, e.g. a single package that contains a daemon and its management applications. The dimension assignment allows the installed programs UI to separate out application packages from driver packages: the user is able to identify what each package is installed for. Packages that only provide library functionality will be added to Packages and Libraries. Dependency information in the package manifests will ensure that the user cannot mistakenly remove functionality, and will allow dependencies to be cleaned up when they become redundant.

IDs

Every file is identified to the operating system by a unique numeric ID. Ideally these would be GUIDs, so that they can be tracked across volumes: a file could be successfully located no matter which which volume it now resides on (so long as the catalogue of that volume is available for querying). GUIDs require 16 bytes of storage, and if this is considered too great, a bare minimum they will be a sequential ID on their enclosing volume, in which case they would be paired with a volume ID for successful identification. A 32-bit (“mere” 4 bytes) sequential ID would allow an implausible 4 billion files per volume, but a 64-bit integer, while being more space hungry, would be quicker to read and write.

File IDs will be accessible to users, but users will generally have no reason to see or be shown IDs; IDs exist to ensure that the OS can unambiguously locate and reference files no matter what changes occur to their names or tags. This permits open files to be renamed and re-tagged at will, just as renaming open files was permitted on the classic Macintosh. Native Event Model applications will be notified if any open files are renamed or re-tagged so that they can update their user interface. Legacy Model (I/O stream–based) applications such as classic command-line utilities will not receive such notifications, and may fail under these conditions as they are likely to be using file paths, which will become invalidated under such conditions.

Paths

Since full command-line access is required, it’s likely that MDFS will be a tag overlay onto an otherwise hierarchical file system. Using a pure tag-based approach would make it impossible to reference files in a meaningful way on the command line and it would prevent working directories from existing.

One problem that remains is determining how the system would build a canonical representation of the disk structure. File system exploration and any kind of recursive processing (e.g. space totalling, permissions application) must not be trapped into processing any file more than once. Microsoft’s idea of creating “Documents and Settings” and then renaming it “Users” ultimately led them to make the appallingly grievous error of a cyclic graph file system with redundant subtrees and this must be prevented. User project spaces will be the hardest to produce canonical trees for, unlike system dimensions that are well-defined.

In the 2019 version of this page, dimensions were tags that were going to completely replace paths. Each file would be identified via a canonical tag list, with paths taking the following form:

[volume](tags)filename

The volume name is enclosed inside […] and the tags list inside (…), with either one being otherwise implicit.

In this 2022 version, dimensions are suggested to sit alongside directory hierarchies and allow files and folders to appear in multiple locations according to need. Paths are likely to resemble those of UNIX systems, although paths are an inherently wrong notion for graphical systems as they prevent one or more very useful and commonplace characters being used in filenames. Although there is nothing to stop any character being used in a filename (applications would have be discouraged from blind concatenation and think of paths in a similar vein to SQL prepared statements) presenting a path to a user with either escape characters or ambiguous characters would be unwise.

Hierarchical paths can essentially take two forms: UNIX and non-UNIX. UNIX paths place a magic location outside of all volumes (removable volumes are mounted inside the namespace owned by another volume), while other systems treat every volume separately, with fixed and removable volumes treated equally.

The multidimensional principle means that the root could comprise any number of directories, made from a list of dimensions tagged as root, e.g. Applications, Volumes, Users. Under this principle, nothing would be allowed to exist directly within the root, and creating new directories within the root would be illegal.

Examples might include:

As noted, the root of each volume would itself be tagged into the Volumes dimension causing it to appear under /Volumes.

Volumes

Volume identification is as-of-yet undecided. There are pros and cons for each option:

Filenames

Each file will have a name. Under the canonical-tags-as-paths design, names were required to be unique within each tag combination, specified to ensure that paths are unique.

Names are UTF-8 and may contain any printable character including space. Attempts to use control characters (ASCII 0x00 to 0x1F) will be rejected to avoid programming error arising from failure to handle confusing names, but files with corrupt filenames will still be recoverable via their IDs.

The the canonical-tags-as-paths design, filenames could not begin with an open parenthesis “(”, as this is the start of the tag sequence, or with an open bracket “[”, as this is the start of the volume identifier. Under the conventional path basis, choosing a suitable delimiter is trickier. The UNIX and Mac choices of “/” and “:” are common human syntatic characters that should be allowed in filenames. There are a huge amount of Unicode characters that could be used instead, such as bullet “•” or arrow “→” that are impossible to type on basic keyboard and unlikely to be put into filenames even by users with access to advanced character entry. On a universally Unicode operating system, such characters could easily be added to keyboard input, causing a conflict with paths.

The use of a slash is the most appealing choice to more technical folks, but a slash is also used to write dates in many countries. While it’s also true that such dates will not currently sort correctly in filenames (only Asian big-endian slash-delimited dates will work this way) Windows has since XP sorted numbers in filenames correctly and there is no reason why the OS could not also recognise dates and sort those properly also (see also the notes on sorting, below).

For now, slash is being used here.

Tags

In the 2019 tag-based design, files are grouped and located using tags. Tags are divided up into various classifications, and within a path, tags are presented in the canonical order of dimension, username, application, user tags in alphabetical order, then finally state tags. State tags are not part of the standard path and are normally omitted. State tags are separated from the other tags with a slash.

Permissions

Background

Anyone who has spent time around NTFS knows that NTFS permissions are fundamentally broken by design. Permissions are cached on each file and directory, and these permissions are capable of (and do) get out of sync with the parent permissions. A file or directory will show that it is inheriting permissions from its parent, even though the permissions displayed are manifestly different to the parent. This is a perfect example of data normalisation failure. In addition, changing the permissions on a directory involves rewriting the permissions on every descendent, which can be very time consuming (far more so than the comparable operation on Linux).

NTFS also resets explicit (non-inherited) permissions when moving a directory: the explicit permissions are replaced by those of the new container. For example, moving a network share in Windows from one volume on a server to another (to manage storage capacity) not only deletes the share (instead of relocating it) but it also erases all the custom permissions set on that directory.

Traditional (octal) UNIX permissions are based on the user and not the containing directory, which can be frustrating and awkward as setting directory permissions does not enforce the same on newly-created files. The plus side of UNIX octal permissions is that they are simple and self-evident, and able to be shown in full detail in a simple command-line listing, while Windows permissions are unlimited in length and cannot be concisely presented.

Implementation

MDFS permissions are not yet defined. However, they will be fully normalised: inherited permissions will be read from the container dynamically. With modern RAM capacity and solid-state storage, this is an acceptable trade-off against dealing with denormalised data.

Whether file and directory permissions should be carried over with a move operation is unfortunately situation-dependent. Possibly, the ability to carry over directory permissions during a move would be an optional part of a move operation. File permissions are seldom set specifically so if this has been done, this may be worth issuing a warning on any copy or move request.

Sorting

Consideration needs to be made as to sort order. Windows since XP has sorted numbers in filenames by human standards (i.e. 1 < 2 < 12 < 20) but likely only within the shell, and direct access to directory listings seems to bypass this enhancement. (The kernel/shell split in Windows is a monumental error in general: the whole system should behave as one. Windows has a ridiculous number of separate APIs, and no good ones.) Sorting will always function identically for any user. The system may choose to operate in a neutral locale or split-neutral locale where human sorting is turned off, or directory enumeration could employ a flag to disable human sorting or disable all sorting where speed is critical (e.g. backup software that has no interest in file ordering). Disabling of all sorting is necessary due to the complexity of sorting in a Unicode world.

If slashes are to be allowed in names, then strings like “5/3/22” could be interpreted as dates and sorted accordingly, according to the user’s locale rules (as this is either the 5th of March or 3rd of May in 2022). There is no means to universally sort names unless the originating locale is stored with each file!

Storage

It is clear from the details above that MDFS volumes will require an indexed database to hold the volume contents. It is not going to be practical to walk the directory tree looking for items matching a particular tag or dimension. The HFS and HFS+ file systems for Mac OS used simple balanced tree structures to hold details on all files on a volume, paired with a separate “desktop database” file to contain file comments and file type details. This approach was however criticised for its lack of concurrency although details were not made clear as to why, and whether this limitation was more of a byproduct of the lack of threading and pre-emptive multitasking on the Mac when HFS was conceived, rather than a limitation of database engines in general.

Relational databases implement various levels of locking, and if the volume data were to be held in a database, appropriate lock granularity would be required to limit the extent to which a program could be starved of disk access.

Attributes

Template/stationery pad

Just as with the Macintosh, templates will be a system concept rather than an application one. This raises a few questions.

Firstly, how are icons to be specified? The Macintosh principle is that stationery-aware programs supply additional icons where the first character of the four-character file type is replaced with “s”, e.g. a file type of “TEXT” is changed to “sEXT” when resolving the file icon (per Resources at the White Files). Since filename extensions are now universal, either marking a file as a template requires a filename extension change that follows a particular pattern, or the filename extension is to remain unaffected as the file format itself has not changed. Some means of generating automatic template icons would be useful for where program authors did not anticipate users creating templates from their documents.

Secondly, should this characteristic be defined using a filesystem attribute (as on the Mac), a dimension or tag, or a change in filename extension? The use of a dimension allows templates to be automatically added to an New menu, but users will want templates that are not added to that menu. The contents of any New menu would have to originate from a directory or dimension where items could exist without the relevant attribute set.

Finally, some means is needed to instruct the application to handle the template. The Macintosh approach of stationery-aware applications may suffice, unless there is a way to ensure that all applications incorporate the relevant logic. Office applications add an Edit context menu command to templates (so long as shift is held), and this behaviour needs to be made system-wide, to allow users to edit templates without needing to fiddle with the file status. This requires applications to be made aware of whether the file is to be cloned as a new untitled document, or to be edited directly.

User metadata

Some degree of user metadata is always healthy. Examples include:

User metadata should also apply equally to directories. Ideally the implementation would be identical; see below under hidden files for further comments. Each metadata item would have a type key that references a binary stream, akin to file system forks or NTFS alternate data streams. There is a caveat that this data would not be transferrable to all other file systems, and either these features would simply not function on volumes without adequate metadata support, or some kind of hidden files would be needed to store it (as seen on macOS).

There is a possibility for applications to be allowed to store additional metadata for purposes not defined by the operating system. This would be an accountability violation however as it would allow applications to have secretive implementations that users would have no way to understand and troubleshoot. System-defined metadata types would be guaranteed a means to observe their existence and alter or remove the information, while application-defined metadata would have no meaning to the operating system, which in turn would have no means to display the data.

Dynamic maps

Dynamic maps are process-specific aliases between canonical paths and individual packages. When an application is launched, each reference to a package generates a dynamic map to the selected version. For library packages, the path in /Library to the package is aliased to the most recent version acceptable to the application.

When an application is updated by one user while it’s still open by another (e.g. on a terminal server), the application process’s dynamic maps to the application package and all library packages are repointed to the now deprecated packages in use within that process. Once the application is closed and the dynamic maps are deleted, further requests to map the application package will access the most recent package version.

Hidden files

Hidden files and directories are a common choice on various operating systems. Examples include:

Hidden files and directories are unfriendly, and allow the accountability principle to be violated by creating invisible implementation details. Ideally, hidden files would not exist at all.

Configuration data on UNIX and Linux systems uses hidden files and directories due to a lack of proper system organisation. There will not be configuration files of this nature under Layer Config as per-user configuration will be held in its own dimension out of sight. Application runtime files and long-term storage will have a dimension equivalent to AppData in Windows or ~/Library in macOS.

Folder configuration will not need a hidden file either. Folder characteristics will be set by dimensions or file system metadata.

Windows has deprecated thumbnail databases, although even the most recent version of Windows 10 (22H2 at the time of writing) retains the bug where File Explorer cannot delete a folder with a Thumbs.db file in due to being too stupid to realise that File Explorer itself is holding the file open. There is no current proposal for how thumbnail caching is best handled. Windows historically used NTFS alternate data streams where possible, but this is impossible with FAT-based media. Whether thumbnails should be a property of the images (stored within or alongside the images themselves) or a property of the user (stored within the user’s local profile and duplicated for each user viewing shared material) needs to be assessed.

Custom icons would hopefully be achieved using rich metadata associated with the directories themselves, rather than stored in a special hidden file. The custom icon system in classic Mac OS is a great idea, but it relies on resource forks and has the limitation that setting a custom icon on a file that already had its icon set that way (as was required to get 32-bit icons on applications) would permanently erase the initial custom icon. To get a custom icon on a directory, a hidden file is created so as to associate a resource fork with the directory. Resource forks are unlikely to be introduced into MDFS, but rich metadata—somewhat akin to unlimited forks or NTFS alternate data streams—will ideally apply equally to directories as it does files, and thus any use of this for custom icons will not require special files be created.

Another use for the Hidden flag in Windows is to bury irritations. For example, some applications continually recreate their desktop shortcuts with every update, even if the user originally stated that no such icons should be created. The only way to fight this disease of stupidity in Windows is to hide the icons, as many installers will fortunately neither recreate the shortcut nor remove the Hidden flag. This problem is solved by declarative software packages where the installation process is controlled by the operating system. Another such irritation is folders created for no good purpose or created in the wrong place, such as software (including Microsoft’s own Outlook and Feedback Hub of all things) that puts application data under Documents instead of AppData, or creates worthless folders in Documents or the profile root such as “3D Objects”. Since these folders are either mandatory or forcibly recreated, the only way to fight this ineptitude is to hide the folders. There is no way to enforce that software vendors uphold correct practice, so some consideration is needed to ascertain how to encourage tidiness to the extent that the user does not need to be at odds with the computer.

Hopefully there will be so little call for hidden files and directories that they can be omitted entirely, at least on MDFS volumes. The VFS will still need to understand the concept in order to correctly handle other file systems.