Scalability of Web Applications in CDN Environments

Author: Tobias Groothuyse
Source: Masters thesis, Vrije Universiteit, July 2006.


The scalability bottleneck of data-intensive Web applications often turns out to be the database. This bottleneck can be alleviated by decreasing the load posed on the database or increasing the database throughput. Existing solutions are either based on query caching that intend to decrease the database load or database replication that intend to increase the throughput of the database. However, as we show in this thesis, none of these solutions provide sufficient scalability for demanding applications.

This thesis combines three approaches in order to alleviate the database bottleneck. We first study the impact of distributed and hierarchical organizations for database query caches. Second, we propose a novel data replication technique which improves the throughput compared to a fully replicated database. Third, we introduce a cost-based query routing policy to improve the load balance of the database nodes at run time. All approaches exploit the fact that an application's query workload is based on a small set of read and write templates. Our evaluations show that a combination of distributed and hierarchical caching can result in hit ratios up to 60%. Furthermore, we show that both the novel replication technique as well as the cost-based routing can significantly increase the scalability. Finally, our evaluations show that for a fixed workload a combined approach using distributed and hierarchical caching as well as the novel data placement and cost-based routing can process 95% of all queries within 100ms compared to only 1% for a fully replicated database.


Bibtex Entry

  author = 	 {Tobias Groothuyse},
  title = 	 {Scalability of Web Applications in CDN Environments},
  school = 	 {Vrije Universiteit},
  address = 	 {Amsterdam, The Netherlands},
  year = 	 {2006},
  month = 	 jul