Previous Up Next
Globule > Documentation

B  File structures and protocols

B.1   report.log structure

Each section of a web-site being exported or imported has a .htglobule directory which contains accounting information for that web-site section. In this directory resides a.o. the report.log. This log contains information which should be collected at the origin server to make decisions, statistics and a merged log of requests (which can be converted into a access.log).

The report.log format is completely different than apache-style log files, because this report.log contains much more information than just requests and has fields which are more suitable for a distributed environment than the traditional access log formats such as common and combined log formats

To aid future development, the report.log is not a strict format, but instead is a free-format file with limited rules on how to separate records of requests and other relevant data and the fields of data inside a record. It does not describe which fields in which order should be present.

The report.log is a series of unstructured records of events. Each record is contained on a single line. Lines which start with a hash sign (#) should be ignored and can be used for comments. Each line contains one or multiple fields with data. Fields are in principle separated with one or multiple spaced or tabs.

A field is either a single letter, used in the report.log to identify different type of events or is a key--value pair. Key and value are seperated with either a equal sign (=), a colon (:) or semi-colon (;). The different separators serve different purposes:

Used to separate a key from a value, where the value can only be a number. These numbers should bare some relation to each other. For instance, identifiers in principle bear no relation to each other as two persons with ID 3 and one with ID 5 have no logical personal bonds with each other, not can you induce that there should also be a person with ID 4.
However a timestamp would be suitable to use with this, as there is a logical enumeration of time.
The semi-colon is a general key--value pair seperator, where the value field should not be interpreted as a number, but as some identifier. Normally, there is a limited amount of possible values for a certain key in the report.log. In other words; you should not expect to see generic text, but only keywords or identifiers as values in a semi-colon field.
The colon field serves the same purpose as the semi-colon separator, but the colon can only be used as the last key--value pair and the value in that comes after the colon may contain spaces and/or tabs.

The following event types can be in the report.log:

R a document has been requested by some browsing user;
U a document update has been detected;
I the document has been invalidated;
A to indicate that the policy of a document has changed;
E to indicate that a document has been evicted from the cache.

The following fields can be expected:

t= The timestamp when the event occured.
path: The path component of a URL, starting without the initial location from which it was exported (or imported).
old; The previous (replication) policy that has been used.
new; The new (replication) policy to be used on a document.
lastmod= A timestamp with the last modification time of the document.
docsize= The (new) document size.
client; The IP number of the peer (e.g. the browsing user doing the request).
elapsed= The amount of time needed to do something (serve a request for instance).
sndsize= The number of bytes reported to be shipped.
browser; The User-Agent reported from the browsing user (very optional information).
referer The Referer field in the request reported by the browsing user (very optional information).

Timestamps and durations are in apr_time_t precision, normally microseconds. Sizes are in bytes.

Normally, but not guaranteed, the following fields are present for each event type:
R t, client, elapsed, sndsize, browser, referer, path
U t, lastmod, docsize, path
A t, old, new, path
E t, path
I t, path
February 27, 2006

Previous Up Next