|
|
Wikipedia Workload Analysis for Decentralized Hosting
Authors: Guido Urdaneta, Guillaume Pierre, Maarten van Steen.
Source: Technical report IR-CS-041, Vrije Universiteit, September 2007. Revised: June 2008.
Abstract
|
We study an access trace containing a sample of Wikipedia's
traffic over a 108-day period aiming to identify appropriate
replication and distribution strategies in a fully decentralized
hosting environment. We perform a global analysis of the whole
trace, and a detailed analysis of the requests directed to the
English edition of Wikipedia. In our study, we classify client
requests and examine aspects such as the number of read and save
operations, significant load variations and requests for
nonexisting pages. We conclude that differentiation is
important, but that replica management may be problematic.
|
Download
Bibtex Entry
@TechReport{,
author = {Guido Urdaneta and Guillaume Pierre
and Maarten van Steen},
title = {Wikipedia Workload Analysis},
institution = {Vrije Universiteit},
year = {2007 (revised: June 2008)},
number = {IR-CS-041},
address = {Amsterdam, The Netherlands},
month = sep,
note = {\url{http://www.globule.org/publi/WWA_ircs041.html}},
}
|
gpierre@cs.vu.nl
|
|
|