Jump to page content

File types

Caveats

Most if not all pages are just rough notes, and these pages as a whole are far from complete. More notes will be added in time, eventually, maybe.

If, from reading these notes, you conclude that I am off my rocker, you won’t be the first, and you may even be right.

No doubt there are a dozen and one reasons why none of this would ever work, but perhaps somewhere deep down there is a tiny fragment that could be used for something.

Contents

Hierarchy and grouping

There are multiple ways that file types can be grouped. Consequently there is a need for cross-application ownership of types, where the various verbs associated with a file type are aggregated from the different applications involved with each type.

File types with the same internal format can be grouped for the purposes of editing; this applies primarily to text-based file formats such as plain text, HTML, XML, CSS, JavaScript, JSON, Perl, C and so forth. Some types may only have a single editor (e.g. source code) while others may have separate viewers and editors, in particular HTML.

File types with the same high-level behaviour can also be grouped for the purposes of management, e.g. image files collectively can be thumbnailed, rotated etc. This concept introduces the need for type handlers, e.g. image decoders, audio decoders, archive decompressors and source code formatters that are available system-wide as modules and not constrained to a single application. In a broad sense, file types should be decoupled from application software where possible. In the 16-bit Windows days (and maybe beyond) there was an image filter concept: a DLL that could render a specific type of image for import into an application. While these tended to be shipped with applications for their exclusive use, they were fairly interchangeable between applications and it appears that it was possible to register them system-wide so that any application that supported image filters could use them.

File types with the same functional level can also be grouped, e.g. files classified as a document can be printed.

File types belonging to the same application will want to share common verbs, e.g. enqueuing a music file. It should not be necessary to define the same verbs directly against every file type. Note however that applications may support more than one broad class of file, so further groups may arise.

Much of this is already implemented in Microsoft Windows to varying degrees, but there is no proper co-ownership. Custom shell extensions are required for many of the features. A shell extension that provides thumbnailing of specialist image types is not a proper image decoder that can be used by other applications to load images. End users are no longer permitted to add custom verbs at any level; this was possible via the Registry up to Windows 10, but abolished in Windows 11 with the user required to develop custom software to add a simple menu item, with the attendant peformance loss of many DLL calls to render a file’s context menu.

Careful consideration is required to formulate a file type system that is inclusive, flexible, friendly to the end user and robust.

Split responsibility

As described on the applications page, an application could mark the accepted file types listed in its manifest as “primary”/“native” and “secondary”/“accepted”, or “for viewing” and “for editing”. For example, HTML files would be set as “primary”/“native” or “for viewing” in the file associations section of a browser’s manifest, while a text editor would list HTML files as “secondary”/“accepted” or “for editing”.

This would allow the file type’s View and Edit verbs to belong to different applications, giving rise to shared responsibility over the file type. The icon would be set by the primary/viewing association to reflect the outcome of opening the file normally.

Type assignment

The computer industry has broadly settled on filename extensions as the only way to indicate the type of a file, despite its flaws. The most obvious problem is clashes between different programs that claim the same extensions. Mac OS never resolved this matter but it was mitigated by a combination of type registration (the expectation that developers register file types and use only type codes not listed as being used by anyone else), a larger namespace (four case-sensitive characters) and the fact that file types were also associated with the program that created them. RISC OS used generic 12-bit hexadecimal codes, while Psion went with a much larger namespace using 32-bit UIDs.