Jump to page content

Erin Meta-Content Management System

Revision: 0.4-public 27th November 2005

Revision history

  1. 0.4 (2005/11/27): Briefly described and illustrated an AJAX object editing UI; elaborated on support for and implications of the Last-Modified header field
  2. 0.3 (2005/11/05): Mentioned User-friendlier hyperlinks under document preprocessing.
  3. 0.2 (2005/09/06): Added a Link elements and site navigation section and cleaned up the enclosing HTML and HTTP content section a bit (formerly ÒMain HTML and HTTP contentÓ).
  4. 0.1 (2005/09/03): First public release

Overview

First and foremost, this document represents merely a rough set of ideas as to how I envision my CMS design. It is a collection of related ideas that I think would work well together. No concrete specifications have been set and no implementation of the system exists. The document is subject to change and any part of it could be radically revised any any point. This document is made publicly available with the intention that potential contributors and developers may be found to help finish the design and implement the system. The document is however written mostly as though it were documentation for live system; pay no attention to this!

Loosely based on the ColdFusion-based Allaire/Macromedia Spectra CMS, the basic principle of Erin is provide convenient HTML re-use and global updating across a website from HTML templates and a database of site content. In the same way that CSS exists to provide presentation for pure semantic HTML, a CMS of the likes of Erin exists to generate HTML pages from raw data. Given that a lot of semantics and structure are repeated across a site, Erin pages are constructed from simple classes that help to re-use the HTML content.

The design of Erin is a strange blend of ideas; the predominant design is based on ColdFusion, with some LISP and PHP thrown in for good measure. In a sense, Erin could be considered a meta-CMS as it represents something closer to a CMS construction kit than a CMS. Unlike systems such as phpNuke, Erin does not contain software for running a website. Erin contains mechanisms used to build software for running a website. Erin is first and foremost an object store, abstracting from the task of designing database tables and making complex SQL queries the ultimate goal of requesting conceptually simple data retrievals. For example, to display the title, link, and summary of the most recent five articles would require little more than a block of HTML containing some class fields, and a single line of code to call up up to five objects.

Unlike Spectra, Erin is very light on code. The focus of the system is productive site design, and this is backed up by a suite of service classes to handle a variety of common page formatting techniques such as lists, tables of data, and tables of contents. Together with the simple object model these mechanisms allow a site developer to easily build navigation lists, summary tables, and paged search results tables into a site, with a very small number of simple object method invocations. An HTML pre-processor is available to increase the flexibility and construction simplicity of content HTML. With the HTML preprocessor you can syntax highlight blocks of code, generate tables of contents from headings, abstract link targets away from the HTML itself and other content HTML-related tasks. HTML generation via a wiki-code-like symbol set is another possibility to ease the workload of generating large passages of text for site content.

Most notably, I have some trouble structuring documents, especially when I cannot decide which aspect of a larger whole needs to be explained first. I trust that the structure of this page is not too difficult to follow.

And a nod to Sander Tekelenburg.

Contents

Object Orientation

Overview to Erin objects

Erin uses classes and objects to handle all aspects of page display. As such, there are two types of classes in Erin: service classes and content classes. Content classes, as the name suggests, are the outward representation of data in the database, used to store and retrieve pieces of data. Content classes may be created to represent news items, articles, images, and any other data element of a site. They are used to store and retrieve data at all levels of the site, from page structure down to individual elements on a page.

Service classes on the other hand are used to build up the visual structure of the site: sections, lists of links and summaries and so forth. They act as templates for generating various views on the content. Each class has a virtual default object used for the most rudimentary rendering, and you can create subclasses for more custom rendering.

Iteration

Iteration is the main mechanism used to drive page display using Erin. Iteration refers to automatic generation of lists of objects on a page. For example, a siteÕs front page may wish to show the summary of the most recent five news items. If news items are objects of class NewsItem and have a summary method, you could reveal the most recent five news summaries thusly:

<obj id="Iter." params="class: NewsItem; call: summary; limit: 5; sort: date,desc">

This tag inserts an object into the page ouput, of class Iter. The use of id="Iter." (with a dot suffix) requests the default object of the class. Class Iter goes to the database and calls the requested method (Summary in this case) on each object it finds. The call above also performs an descending sort on the date field and limits the output to five objects. The method call on the Iter default object (call="Main") is omitted as the default method is being requested. The reason for colon/semicolon syntax inside the params parameter is to offer a clear visual differentiation for when reading code back.

Another possible and simpler syntax for this task would be the following:

<obj class="NewsItem" call="summary" iter="limit: 5; sort: date,desc">

This method displays multiple objects through stating the intended class instead of the intended object, implicitly selecting all objects of the class at once. This has the advantage of slightly clearer syntax, but the gain of using class Iter is that you can subclass Iter to provide external control of output. For example, if the intended output is to be a table, hooking abstract methods in an Iter subclass lets you fill in the table row and cell tags and provide any other kinds of wrapper content around the bare output of the method called on the content objects. Iter is of course a good example of a service object. It does not represent data but rather a view of the data at any given instant.

The most obvious and most important example of iteration is for navigation. By selecting a suitable method on a Page subclass a list of all available pages of that type can be inserted onto any page. More on site structure is given under the Page objects and sectioning section.

Using Erin objects

Classes are defined using plain text code. For example, the following class is used to represent an image link box (a box whose contents bear a link containing an image and a caption):

#class StandardImageLinkBox property objectURL (string) property linkURL (string) property width (integer) property alt (string) property title (string) property height (integer) <!main> <div class="imagebox"> <a href="$(= linkURL)"><img src="$(= objectURL)" height="$(= height)" width="$(= width)" alt="$(= alt)"><br>$(= title)</a> </div> </!main>

(Note: The line break between a method start tag (e.g. <!main>) and the first line of HTML is not output; nor is the final line break before the closing method tag output.)

The above code shows how to define properties, methods (using <!method-name> tags; <!main> is the default method) and write property values into the HTML. Methods at this stage in the design solely exist to generate HTML output using template code, but Turing-complete language support will be required to support building proper Web applications. Following Erin’s tradition of copying the design of ColdFusion, this will be achieved with <codeblk>, the contents of which are pure code. Methods may also be able to be marked as content="code" if they are to contain code foremost.

At present, Erin follows a LISP-like design of functional operator syntax: operators and functions bear identical, LISP-style syntax. The special operator = writes out its operand(s) to the HTML (as in MacASP). This is definitely subject to change!

Class methods can also register to receive parameters. For example, the following method wishes to be able to receive two integer parameters:

<!cell inparams="row (integer) col (integer)"> ... </!cell>

Because parameters are passed and received by name, methods can be passed more parameters than they are interested in receiving; I think this is how AppleScript already implements event handling. Unlike AppleScript, Erin parameters may be passed and registered in any order. Hook method handling is simplified by this technique, and no method calls or definitions ever contain unused parameters (which in C could present themselves as a row of meaningless and confusing zeroes).

The ? operator queries whether a parameter has been passed. The following code returns an error message and exits if row and col were not both specified:

(or (not (? row)) (not (? col)) (exit 'row and col not passed'))

Calling objects

Objects are inserted with the <obj> tag. The id property selects an object, and the call property selects a method name to call on the object. The params property provides a mapped list of parameters to the method call. An example call of an object of the above StandardImageLinkBox class might look like:

<obj id="a-big-fish">

Following on from the previous example, here is a call to display just the summary of the most recent news item:

<obj class="NewsItem" call="summary" select="date,desc,1">

This example shows selecting an object based on a selection criterion and class. "ddate,desc,1" selects the first item (1) based on an descending sort (desc) of the date property of the object. The call property has been used to call the summary method instead of the default method.

As shown above, objects are referenced directly by name. A consequence of this is that all objects live in a global namespace; it may be required for the sake of namespacing that class be always specified to reduce conflicts. Where an object ID is to be hidden from user view entirely, UUIDs or simple object serial numbers (such as an automatic increment on the object table) would make life easier to manage; in practice, many objects may not have humanly-readable IDs.

Object properties are hidden by default, unless marked as EXPORT, thus:

property title (string EXPORT)

When params is used to pass arbitrary parameters to a method, a special syntax applies. Parameters are addressed by name in any order; strings are implicit, unless leading or trailing spaces are needed; otherwise, strings are enclosed in single quotation marks. This syntax exists for differentiation purposes. For example, this call to a StandardImageLinkBox object includes two extra parameters, both strings:

<obj id="a-big-fish" params="useID: main-image; cust-caption: This just in">

Coding Erin

Until a less perverse syntax is determined, coding will use an HTML-embedded LISP-like language. A dollar sign is used to indicate dropping into code for output purposes, with the $ being followed directly with a LISP-style expression complete with parentheses. For example, to output the value of the title property you might write:

$(= title)

This would be embedded in HTML as follows:

<h2>$(= title)</h2>

Because the equals sign is an operator, a space must be present between the = and the operand. A similar syntax can be used for mathematical operations:

$(+ sectionNumber 1)

Here, the presence of another operator means that output to the page is implicit. The above code fragment does not add 1 to sectionNumber, but rather it outputs the result of sectionNumber + 1. The same syntax is used for function calls:

<li> <h3 class="h-inline">$(= title)</h3> <p>$(firstpara article-body)</p> </li>

The final example above includes a function that returns just the first paragraph of a block of text. You can nest function calls of course; the following line makes sure only the first 255 characters of a block of text are used:

$(trimstring (firstpara article-body) 255)

Like in LISP, hyphens are permitted in identifiers because the syntax uses prefix notation, and a subtract requires that the minus come first: $(- foo 1)

Code blocks

Blocks of code can be included using the <codeblk> tag. This allows output-free expressions, and coherent use of branching and loops. Within a code block it is mandatory to use the = operator to generate output. I shall leave the exact nature of the syntax undefined for now.

Using Erin

Working with sites

The primary user interface to Erin is the Web. Classes and objects are created and edited via the Web, although provision for editing code in a text editor using plain text files will also be made. Some sort of automatic synchronisation (or manual if necessary) will be provided to update database tables and parsed bytecode when the text file source code of classes is updated.

Erin will come with a built-in Web-based object editor. This will take the form of a simple form generator, which takes the property list of a class and builds a form from it, with one input field per property. This can then be used to create and edit objects directly. This may well suffice for working with a simple site, although for more secure and more integrated sites it will be necessary to extend this system using custom validation and event-handling code. The default Erin Web UI will itself be written in Erin; this will allow for custom interfaces to be created by simply subclassing and wrapping the code already in place. The object editor will be powered by the automatic form handling and object manipulation Erin tags, and these can be extended and hooked to gain finer control over the response taken to a form.

Of course, for larger-scale data entry and syndication input more direct means will be supported.

AJAX object editing UI

One particularly nice way to edit a site would be to use AJAX to write an editor. To start with, pages would bear a padlock icon in the corner:

This may be only shown in “admin mode” or for recognised IP addresses to save having a potentially obtrusive icon on every page. Clicking it will prompt for a password, checked against the server:

This would switch the page into object edit mode. This mode allows for inline editing of the page’s core object, which subclasses Page:

Every property of the object becomes adorned with an edit control, and the object itself gains several controls. The controls are as follows:

Changes can be made directly to any child object of any page object. At this stage it would be hard to describe in more detail how most objects would be edited, but text objects would probably switch to a <textarea> tag for editing.

This system could also be made to let you edit the HTML and CSS directly (there is a precedent for this) but the main priority is site changes. It may also be wise to include a method for editing classes (such as the class of the current page) and objects referenced by iterators within that class such as local navigation. Editing of higher objects such as those used to contain global site content and navigation would also be useful, but some of this would be in the realm of the general site editor system.

HTML and HTTP content

Pages on websites need not only the body HTML, but the header HTML as well. In some cases, custom HTTP is also required. This document does not as yet deal with how this is all constructed but it must be borne in mind that such support will need to be implemented.

The <Response> tag

The <response> tag is a way to signal that a page should perform a task other than 200 OK. The response tag can be used to call upon a pre-defined error template page and furnish it with the appropriate HTTP error code. For example, the 404 response page may include a reporting system to allow visitors to report 404s and request a response from the site maintainer. You can call the tag thusly:

<response code="404">

The response tag can also perform redirects, again using template pages if necessary:

<response href="/new/page/address" mode="meta,5">

In the example above, mode="meta,5" instructs Erin to return a page containing a meta redirect to the new URL, with an automatic redirect after 5 seconds. Using code="301" instead would send a 301 HTTP response pointing to the new URL. Relative URLs are automatically fully qualified when using HTTP redirects.

Fatal errors and wrapper HTML

One of the most difficult aspects of the likes of PHP, MacASP and CFML is that on experiencing a fatal error, there is no prescribed method of wrapping up the page. At the point where the error occurs, the HTML output might be in the middle of several nested divs, a table or a list as well as inside the code </body> and </html> tags. Page rendering could be severely damaged by not correctly closing all open tags, and page navigation, aesthetics and house style could be compromised by omitting later fragments of HTML.

I am not sure how to handle this conveniently in Erin. The use of goto is still practised as a very practical way to jump to error-handling code. HTML pre-processing to clean up missing closing tags could applied. Pages composed of excessive levels of nested if are troublesome to maintain and very difficult to follow. Because HTML output is generally sent to the user-agent in a continuous stream, one cannot undo any output sent already unless continuous output is turned off.

For now, it will suffice that this issue has been clearly raised, to be addressed at a later date.

Link elements and site navigation

Sander Tekelenburg explains the <link> element and why and how to use it. Since Erin knows about the pages and sections of your site, it seems sensible and wise to make use of this knowledge to permit Erin to automatically populate the HTML header with <link> tags that point to all the appropriate pages.

To an extent, this process can be automated, but to fully effective, some understanding has to be had of the structure and nature of the site. This document does not propose how this metadata is to be defined and used, but as before, the issue has at least been raised for address.

Last-Modified header field

There are various applications available that track site updates, in order to tell you when your favourite sites have updated. The iCab Web browser for the Macintosh includes this ability. The Last-Modified HTTP field states when a resource was last modified. For simple Web pages and other files, this is easy: stat the file on disc for its last-modified date.

It gets a little more complicated for dynamic sites. For simple database content accessed by PHP or ASP you could maintain columns in key tables defining the last time a change was made, compare that with the last time the script was changed, and use the most recent date. But with a content management system it is a lot more complicated. An Erin page will typically be composed of quite a number of objects and iterators which in turn can contain objects and iterators themselves and so forth. To provide a last-modified date, a register needs to be kept of the most recent object modified date seen when reading from objects and when invoking methods on objects that generated output. The final result can be placed into the HTTP header.

This does mean that page content will have to be fully buffered before transmission; the plus side of this is that you get the Content-Length header for free as well. The down side is that this will prevent scripts from providing incremental output, and slower and more complex scripts that generate a lot of output will appear to work more slowly as the page must be generated in full before transmission. However, with page caching in force, sites which bear infrequently changing pages can use the cached copy to determine the last-modified date and begin transmission immediately. For sites that do require pages to be generated dynamically every time, switching off the last-modified date result for such pages would be more sensible and this could be set as a flag on Page subclasses or at runtime.

Using collections

Collection displays using <obj> and the Iter class were introduced earlier under Iteration. A few notes are in order as to how to extend collection display into two dimensions with wrapper HTML display (so-called outer HTML). Iterations can be extended by overriding hook methods defined in the Iter class in a child class. A variety of such methods exist, for different types of iterations. For example, to generate a set of <div>s of article summaries:

#class ArticleSummaryIter inherit Iter param in-obj (object) <!cell> <div class="summarydiv"> <obj ID=in-obj call="writePageLink"> $(in-obj.summary) </div> </!cell>

Note that param items are global parameters that persist across all hook method calls, to save passing them around. Also note that variables and objects can be passed as primary (tag) parameter values by omitting the quotation marks around the parameter value. ID=in-obj passes the object in-obj as the ID parameter, whereas call="writePageLink" passes a string parameter. No method so far is defined for passing variables inside secondary (object) parameters: parameters passed to objects using the params tag parameter.

A rather inflexible way to generate a set of such <div>s would be to rely on inner HTML: write a method on class Article which generates such a <div>. However, this outer-HTML example is more flexible as you can generate multiple views on the same set of objects.

The <!cell> method is hooked for displaying the content from each object. By default, this merely calls the specified method on the object with no outer HTML (no <div>s or <li>s. For multi-dimensional output, subclasses can register a set of parameters for cell:

<!cell inparams="row (integer) col (integer is-empty (flag)"> ... </!cell>

There are extra hook methods available for full two-dimensional output, for creating tables. These are hooked for beginning of row and end of row HTML and so forth. I presently lack a good example of such HTML (at HTML 4 level) in order to demonstrate this idea in practice.

Page objects and sectioning

Custom iterators drive one of the most important aspects of Erin and any CMS: site structure and navigation. All pages on the site are objects of classes inherited from class Page. A section of the definition of class Page is given below:

#class Page derived address (string) property parentSection (string) property title (string EXPORT) property bulk (string) #the main HTML for a basic page <!writePageLink> <a href="$(= address)">$(= title)</a> </!writePageLink> <!main> #a dump of bulk goes here </!main>

Note that derived properties are ones that are generated automatically and not stored with the object; they are essentially references to internal functions and represent useful object properties that are never defined anywhere. Note also that as per HTML and HTTP content, no method has yet been decided on how to specify the HTML head section.

The original idea was possibly that the existence of a section property would drive the site navigation system without the need for explicit section objects. Assuming every property was its own table column, this may be possible, with SQL code such as:

SELECT UNIQUE (name) FROM Page

(Forgive me if my SQL is erroneous) Depending on the exact nature of the internal implementation, separate tables may get created for each and every class, or a single Objects table may exist with a single column for property data. Using one table per class, and internal SQL code devoted to populating a sections list, would be a rather specialised solution, and would run counter to the iterators idea, probably needing a pseudo-class to iterate. Further, there would be no notion of parent section.

Nor would there be any notion of hierarchy. One could put the full path to each page into each Page object, but that would violate good database design. The use of Section objects to which a page could point, would help:

#class Section derived address (string) property parentSection (string) property title (string EXPORT)

One could then iterate over class Section, providing that a filter method (akin to SQL’s WHERE) was provided to specify the desired parent section. For example:

<obj id="UnorderedList." params="class: Section; call: showAnchor; filter: parentSection,; HTMLid=topnav">

would filter on parentSection == "" giving just the top-most sections. This would output a list of links to all your site’s top-level sections. Assuming one such section was Artwork, you could filter sub-sections thusly:

<obj id="UnorderedList." params="class: Section; call: showAnchor; filter: parentSection,Artwork; HTMLid=subnav">

However, it is not necessarily that simple. For example, a page may wish to appear in several sections, or one may wish for a site to have a rootless, flat and truly cyclic structure (the way a Wiki does) or have multiple paths leading to the same section or page, for people who might approach the information via different routes through the site. Support for this design complicate matters considerably.

Listing pages

To list all pages within a section, you could use:

<obj id="UnorderedList." params="class: Page; call: showAnchor; filter: parentSection,MacApps">

The above code would filter the page objects list for all pages in the specified section and call ShowAnchor on each one. This assumes of course that all pages of all Page subclasses share the same database table, or that some complex database footwork is going on in order to simulate this. Further, it should be possible for the sake of flexibility to iterate over all objects within a section for cases when a section contains both pages and sub-sections.

The only way that this would make sense is if Erin supported class interfaces, such that you could have both Page and Section implement the SectionDweller interface, and the following code would then work:

<obj id="UnorderedList." params="class: SectionDweller; call: showAnchor; filter: parentSection,Artwork; sort: class,asc,title,asc">

Depending on the design it might be better in many cases to iterate sections and pages separately, but the above would make a nested list of sections and pages possible.

Custom pages

Most pages on a site will realistically follow a subclass-defined layout. Most likely, each site section will bear its own Page subclass defining the layout. For, say, a section of software for download, each application might be an object thus:

#class ApplicationDownload inherit MyPageDesign property appName (string) property versionHistory (VersionHistoryItem[]) property screenshots (ImageLink[]) ...

The <!main> method would then draw a whole page. The body of <!main> would contain the HTML code used to display pages. Owing to the disjointed nature of the content, it is especially imperative that as much formatting as possible be left to outside CSS, and the HTML be as semantic as possible.

Note also the [] in the image link and version history item properties. This is because application pages have a one-to-many relationship with such items. VersionHistoryItem itself is a user class because each one contains a date and a version number, where each version number is actually a path to an entry in a changelog file. ImageLink is also a class, as it represents a link to an image with a name and target URL.

Finally, here is a longer page class example:

#class NewsItemPage inherit MyPageDesign property datePosted (date) property topic (object) <!main> <h1>$(= localPageTitle)</h1> <p><obj ID=author call="writePageLink">, $(writeDate datePosted @LOCALE.UK.DATE.LONG)</p> $(= bulk) </!main> <!summary> <h2 class="inline">$(= title)</h2> <obj ID=topic call="writePageLink"> $(firstPara bulk) <p><a href="$(= address)">Read more...</a></p> </!summary>

Quotation marks around Erin tag property values are mandatory; otherwise, the text following the equals sign is assumed to be a variable or property name.

Erin features

Paged listings

Something that one tends to find oneself re-implementing every single time we write custom database look-ups (and even file-based look-ups), is a system to divide the results into pages. On Telcontar.net, stdin (the multi-guestbook) is a database-based system with this, and the notices mechanism is a file-based system with this. Re-implementing such look-ups and pagination over and over can be very tedious; the notices system in particular allows reading in both directions (newest to oldest (default) and oldest to newest, and customised numbers of items per page (but not pages whose quantity of entries is determined by the length of the entries and a defined optimum page length in bytes or words). Bear in mind that the reverse order option changes the human semantics of previous and next page (previous changes from Older to Newer), and there is a lot of URL parameter persistence to maintain while determining just what to show. This is without even showing a page bar of numeric page links.

Erin’s iterator system already goes some way towards alleviating the hassles of showing sets of items. It should not be too hard to adapt it to also support paged-mode displays. Things to consider would be:

One would like to make it as easy as possible to just write a page that shows an iteration and have some small change automatically deal with reading the URL, listing one page worth of items and generating a page list bar or navigation bar. To customise the appearance of results lists, the pagination class would be subclassed and hook methods implemented to provide the HTML.

Searches

Every content management system needs a search feature. It would be interesting to see a search results system driven off iteration in Erin. For example, this search form:

<form action="_self" method="get"> Find: <input type="text" name="findwhat"><br> In: <input type="radio" name="inwhat" value="all"> All / <input type="radio" name="inwhat" value="news"> News<br> <input type="submit" name="submit" value="Search"> </form>

This would then be passed to a tag such as:

<obj class="Search" ...>

This would effectively be an iterator subclass or something thereto related, returning all matching objects. Paging as discussed above would be involved to divide up the results into pages. The question becomes, what would go where the ellipsis is? The params property is not really geared up to passing in structs (such as the contents of the URL parameter list or POST formdata parameter list, and the current paged-results status struct). Further, you need something set up to map the form results to object filter parameters. This could be interesting to design. Of course, this system would be related to the form handler used to drive Web content editing, discussed under Working with sites.

Conversely, forms could be automatically generated from information describing the sort of search that one wishes to make available. It will also be interesting to tackle the problem of specifying free-text search as an object filter.

Document preprocessing

One is now doubtlessly aware of the idea of a wiki, and how they are used. While a wiki represents a formally structured site too restrained for Erin, there are some interesting ideas in some wiki systems. TWiki for example has an automatic TOC generator based on document headings. Erin is targeted at folk who are already proficient in HTML 4, and such people are not likely to want to get involved in confusing and overly complex wiki syntax, but it would make sense to at least borrow some of the automation tools.

This document has so far only discussed mechnanisms used for managing separate objects on a page. However, the fact remains that the largest object of many Web pages is a long passage of text, replete with items like headings, a table of contents, footnotes, hyperlinks to items on the same site, and so forth. Maintaining such items can prove tedious, and it would be nice to see Erin also able to assist with managing this data.

The most obvious example would be a table of contents tag, thus: <toc>. Insert one of these where you want a TOC to go and one will be generated on the fly when the page is assembled. My personal preference when hand-coding HTML is to use textual named anchors for section headings such as to make the URLs clearer to understand. For those who would prefer to be spared the effort of hand-typing <a name="..."></a> around the title of every heading, automatic insertion of named anchors for headings should also be supported with numeric named anchors.

It would also be nice to support automatic insertion of “Return to top” links throughout the page, as well as section contents lists after each level 2 heading.

Relocation of pages

On every site, one day there is bound to be a need on at least one occasion to relocate a page from one section to another. The act of which will cause all links to the page to break. I am not going to discuss site management strategies here but it may be possible to aid this process by using pre-processing to adjust all local links. For example:

<a page="farming-methods" />

would be processed to fill in the complete relative URL and page title from the object. Another approach would be of course,

<obj ID="farming-methods" call="writePageLink">

However, the former method could be used to extend the system to external links as well, which are also hopelessly subject to change (at least, if your acquaintances are 16 years old or younger!).

User-friendlier hyperlinks

Sander Tekelenburg explains at length the notion of User-friendlier hyperlinks: displaying meta information for hyperlinks. Such information informs visitors as to the content type of download hyperlinks. As Sander notes, content management systems are the ideal way to make this possible. Quite how this should be done is open to debate, both whether the information belongs on the page as plain text or left to browser and user CSS to display the information as necessary, and whether in Erin it should be implemented in a link class or by the HTML preprocessor.

Content preprocessing

The most ovbious example (after formatting this document by hand!) of content preprocessing is code formatting. This takes the form of replacing blocks of code in plain text (with HTML tags not escaped) and drowning them in span soup. Appropriate CSS file requests would then be added to the page’s header accordingly. Perhaps the use of an <ecode> tag would be used to mark out code for processing. With intelligent parser support, Erin could determine whether an inline (<code>) or block (e.g. <code class="codeblock">) element is being used; otherwise, a selector parameter or two different tags may be needed.

Implementing pre-processor operations

There are a multitude of possible pre-processing operations that people may wish Erin to perform on their HTML, and quite a few would be available by default. However, it would not make sense to have them all operating by default as you would waste CPU time performing inapplicable searches against the HTML and making unwanted changes.

Perhaps the best solution would be to have the pre-processor based on a regular expression system, with the site author able to add, edit, comment out and remove active pre-processor operations. Basic Perl regular expressions are probably not the best choice, as they seem to make some of the relevant operations such as nested searches, very hard. For example, to affect every row in a certain table means a search pattern that begins with a <table ...>, contains at least one row, and finishes with </table>. However, within a single basic expression you cannot perform an operation on every row found within that table. Perl does offer some very powerful search and replace operations but it starts looking very complicated to use.

Instead, a similar but fundamentally different system may be needed, or perhaps just a wrapper around regular expressions to allow for easy access to very deep and complex yet conceptually simple functionality.

Internal Representation

Page caching

Pages, on creation and alteration, are generated once and a cached copy is stored for re-use. This may either be a file on disc (whose path written into the object as its cachedFilename property), or an extra column in the database for Page objects. In future, when the page is accessed, the cached HTML is sent to the user-agent. The data is re-generated when the page is changed. The cached page may have a UID filename, or could actually be stored at the appropriate path on the site, with an .html filename, making the site appear to be and fuction as efficiently as a static site.

This is basically a reverse use of MacASP’s #static setting, which instructs MacASP to cache the page and not re-execute the code. Since the majority of most sites’s content does not change with each access, all pages may as well be cached by default, unless flagged as transient (such as pages bearing RSS feeds).

Of course, pages reached by a POST submission will cause the page to be re-evaluated. The other way to change the output of code is through URL parameters, e.g. a GET form submission. Particularly with GET, you can have an infinite virtual space of pages, all of which could be cached. MacASP had an interesting take on static pages: MacASP caches pages against the requested URL, so a page accessed via different URL parameters would result in a cache entry for each set of parameters. Care has to be taken in MacASP to ensure that #static is not called on large URL parameter spaces unless you would like the entire lot cached, which would waste memory. For example, search results pages would risk causing this problem. Erin-based sites may wish to differentiate between transient (e.g. search results) and cachable (e.g. forum topic) pages, using URL parameters for the former (such as ?name=Bob&state=MN) and virtual paths (/Forum/Tips-and-Tricks/topic-38) for the latter.