Cache Manager
Caveats
Most if not all pages are just rough notes, and these pages as a whole are far from complete. More notes will be added in time, eventually, maybe.
If, from reading these notes, you conclude that I am off my rocker, you won’t be the first, and you may even be right.
No doubt there are a dozen and one reasons why none of this would ever work, but perhaps somewhere deep down there is a tiny fragment that could be used for something.
Contents
Overview
Cache Manager is the system component that provides in-RAM and on-disk data and file caching for the operating system and application software. As such, it is a “synergiser™” for temporary storage to fairly distribute a computer’s available capacity between processes and users. The Cache Manager will evict material from the cache when system-defined hard limits or application-defined soft limits are reached or when cache pressure is unevenly distributed.
Cache management for in-RAM data is required in order to allow processes competing for memory to be fairly served. This is most important on server nodes and remote desktop session host servers where multiple services or multiple users may be in competition for RAM.
Cache management is also required for longer-term cache storage where multiple users on the same computer may be storing large amounts of cached data. Examples in the common office environment include local mail storage and offline content belonging to cloud-based file storage solutions (for example Dropbox or Microsoft OneDrive).
The simplistic approach to caching is to set some kind of storage limit. This creates a risk of over-commitment, where multiple users or multiple services in conjuction fill the RAM or the disk, potentially without ever hitting their individual limits. The Cache Manager avoids this error by sharing available resources between applications fairly.
It may be necessary for each user account to have its own Cache Manager worker service to avoid the risk of data theft from a centralised cache service. Equalisation of cache pressure would be handled by a central service that has no access to the cached data itself. The only caveat to this approach is the need to evict cache pages from signed-out sessions, for example on computers shared by multiple users, only one of whom may be signed in at any given time. A granular permissions model would allow the central Cache Manager service to remove specific data without being able to read it back; the use of a Cache dimension in MDFS would allow suitably restricted management of files.
Cache types
In-RAM caches
In-RAM cache entries are opaque memory pages with application-defined look-up keys.
Examples of in-RAM caching:
- File storage: recently-used file segments, prefetched file segments
- Database engines: query plans, query results, table segments
- Mail server: mailbox pages
In-RAM caches would be held only in memory and would be discarded upon process exit. Each cache may have a hard quota (set by the operating system) or soft quota (set by the application to no greater than any hard limit) as well as storage priority. Where the total demand exceeds available RAM, cache pages will be evicted in the reverse order of the storage priority and balanced between applications.
On-disk caches
On-disk cache entries represent complete files.
Examples of on-disk caching:
- Offline and working storage for cloud file systems
- Browser cache
- Possibly also temporary files
Just as with in-RAM caching, each entry would be referenced using an application-defined key.
Temporary files are an edge case. Just as with actual caches, they represent a competitive data growth. However, it may prove wiser to implement temporary files in the storage system such that they are automatically deleted when released. There is no reason to retain a temporary file once the owning application has closed the file handle on it, and the greatest hazard they present is when they are allowed to accumulate indefinitely. Further, all temporary files will be purged during boot (on the basis of their existence in a Temporary dimension in MDFS) and potentially even on signing out, meaning that a simple reboot would unclog a system where temporary files have overrun the disk, something that is not possible with semi-permanent storage such as a browser cache.
There is no reason why the Cache Manager couldn’t create any cache in a temporary dimension, so long as it retained a handle to every open temporary file or disabled immediate purge on release (relying on sign-out purge instead). In all other cases, the cache contents would be preserved when the owning process exits. The need to evict entries even when the owning application is not running means that applications must expect any cache entry to have been removed without notice.
There would be optional soft and hard cache size limits on on-disk caching also.