|
|
Wikipedia Workload Analysis
Authors: Guido Urdaneta, Guillaume Pierre, Maarten van Steen.
Source: Technical report IR-CS-041, Vrije Universiteit, September 2007. Revised: June 2008.
|
An improved version of this paper has been accepted by the Elsevier Computer Networks journal. Better read and cite the journal version instead of the tech report.
|
Abstract
|
We study an access trace containing a sample of Wikipedia's
traffic over a 108-day period aiming to identify appropriate
replication and distribution strategies in a fully decentralized
hosting environment. We perform a global analysis of the whole
trace, and a detailed analysis of the requests directed to the
English edition of Wikipedia. In our study, we classify client
requests and examine aspects such as the number of read and save
operations, significant load variations and requests for
nonexisting pages. We conclude that differentiation is
important, but that replica management may be problematic.
|
Download
|
The
tech report, in PDF (255,819 bytes).
|
|
The final journal version (significantly improved since the tech
report version) will appear here shortly...
|
Bibtex Entry
@TechReport{,
author = {Guido Urdaneta and Guillaume Pierre
and Maarten van Steen},
title = {Wikipedia Workload Analysis},
institution = {Vrije Universiteit},
year = {2007 (revised: June 2008)},
number = {IR-CS-041},
address = {Amsterdam, The Netherlands},
month = sep,
note = {\url{http://www.globule.org/publi/WWA_ircs041.html}},
}
|
gpierre@cs.vu.nl
|
|
|