Jump to page content

Conversation logging improvement proposals

I am a little disappointed with the present state of conversation logging (IRC channels, instant messaging chats etc) as far as posting such logs onto the Web and other analysis goes. Some chat software only offers plain text formatting, so attempting to turn this into a Web page for posterity is a complex procedure involving lots of complex regular expressions. And that software which does provide HTML output seems to be still using HTML 3.2 and quite possibly bad HTML 3.2.

There are at least two alternatives, two of which are demonstrated below: using HTML and using XML.

HTML 4

The simplest change to make in terms of concept is to simply switch from HTML 3.2 to HTML 4. The result is cleaner code and that the formatting is no longer hard-coded into each line of HTML. For example:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <!-- BongoChat 0.5 conversation log --> <html> <head> <title>Log</title> <style type="text/css"> <!-- .nick, .timestamp { color: blue; font-weight: bold } .selfnick, .selftime { color: red; font-weight: bold } .text, .selftext { color: black } ... --> </style> </head> <body> <p><span class="nick">Joe</span> <span class="timestamp">(4:13 pm)</span> <span class="text">Morning Bill.</span></p> <p><span class="selfnick">Bill</span> <span class="selftime">(4:13 pm)</span> <span class="selftext">Hello Joe</span></p> ... </body> </html>

The above HTML and CSS would result in the following text when rendered:

Joe (4:13 pm) Morning Bill.

Bill (4:13 pm) Hello Joe

[Jesse Barwick would like to point out that the part of the day denoted by pm does not constitute the morning. D’oh. Apparently the above chat fragment would make a cute t-shirt design.]

By altering the stylesheet data at the top, the log can be completely reformatted. An additional benefit of this is that the resulting file is going to be smaller, as there is no redundancy in repeated <font> tags all the way through the HTML.

As of Snak 5.0, Kent Sorenson’s Macintosh IRC client supports HTML logging; apparently based on my HTML code above, Nick Shanks created a default CSS file to provide the formatting. I have posted an example page showing how logging in Snak works.

XML

The above method has a possible disadvantage that although the formatting is abstracted from the content, the content may still be to a degree difficult for a machine to understand for parsing logs. The semantic style names will help greatly in identifying what each piece of text represents, but those wanting something wholly machine readable could adopt an XML log format instead. Quite how much of an advantage this poses over HTML 4, I am not certain. The style names alone can be used to offer a lot of clues as to what is happening; for example, an IRC action could have a class of its own, and IRC server messages would again have a class. But for completeness I present an example of a log in XML format.

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE irclog ...> <channel name="test"> <session join="1093681012"> <line self="true" nick="joe|hwk" from="joe" type="action" timestamp="1093681444"> likes cheese </line> ... </session> </channel>

The above XML could later be parsed to generate an HTML/CSS version, such as the following:

[08:16:52] You have joined #test
[08:24:04] * joe|hwk likes cheese

One point of interest to note is the from property. I am no longer certain what I meant with this when I originally devised this system. It may be a mechanism for tracking nick changes, such as for log analysis software that likes to know who is who, and would be confused by nick changes.

The self property is also worthy of note. With HTML 4/CSS, lines of text that belong to the user may well have separate classes (e.g. selftime vs timestamp), making them effectively distinct regardless of the appearance afterwards. Here, when the data comes to be turned into HTML/CSS, the self property can be ignored if the output program does not wish to give special treatment to the user performing the logging.

An alternative method to the extra classes, though, for HTML/CSS may be:

<p class="self"><span class="nick">Bill</span> <span class="timestamp">(4:13 pm)</span> <span class="text">Hello Joe</span></p>

Applying an overall class of self to the paragraph makes for a cleaner approach semantically even though it will most likely complicate the CSS involved.


Updated Thursday 22nd December 2005 to correct bugs and moreover to reflect the implementation of this idea found in the Snak IRC client. You all love <abbr>, tags don’t you?