CloudTPS user manual

This is a prototype implementation of CloudTPS, a scalable transaction manager for cloud data sores. CloudTPS implements strongly consistent transactions, join queries and secondary-key queries, even in the presence of server failures and network partitions. It is implemented as a middleware layer which can run on top of an existing cloud data store (we currently support HBase and SimpleDB backends).

A detailed description of CloudTPS’s design can be found in two papers:

“CloudTPS: Scalable Transactions for Web Applications in the Cloud“. This paper describes the general architecture and the support of multi-item transactions.
“Consistent Join Queries in Cloud Data Stores“. This paper describes the support of join queries in CloudTPS.

1. Warning

This is experimental software which is intended to be a proof of concept rather than a usable product. We release the code in the hope that it will be useful to the scientific community. WE HOWEVER RECOMMEND NOT USING IT FOR ANY SERIOUS DEPLOYMENT.

2. Installation

CloudTPS is implemented as Java Servlets and is platform-independent. We present the install process in Linux.

CloudTPS requires the following software packages to be installed:

JDK 6.0
Tomcat 6.0
Google protocol Buffer 2.4.0a
AWS SDK for Java 1.1.8
HBase 0.20.6
postgresql-9.0-801.jdbc4.jar (only necessary for performance comparison with PostgreSQL)

The library packages of 3), 4), 5) and 6) should be copied into “$TOMCAT_DIR/lib/”, where “$TOMCAT_DIR” stands for the directory of the Tomcat installation. Note that these software may depends on other libraries. These libraries must also be copied into “$TOMCAT_DIR/lib/”.

Note that you *must* use the exact versions of JDK 6.0 and Tomcat 6.0 to execute CloudTPS correctly. Newer versions of other softwares may be compatible with CloudTPS, but we make no guarantees.

Deploying CloudTPS simply requires to copy the CloudTPS war package to the Tomcat webapps directory:

cp CloudTPS.war  $TOMCAT_DIR/webapps/

After that, start Tomcat which will then automatically deploy CloudTPS into directory “CloudTPS”.

3. Tomcat configuration

Detailed explanation of Tomcat 6.0 installation and configuration can be found in http://tomcat.apache.org/tomcat-6.0-doc/index.html. We discuss here only the basic configuration steps required to execute CloudTPS. If you are familiar with Tomcat configuration, you can skip this section and go directly to “CloudTPS configuration.”

1. Configure the environment: You may add a line at the end of file “.bashrc” in your home directory

export JAVA_HOME="Path of your JDK"

2. Increase the memory size for tomcat
CloudTPS needs to store loaded data in memory, so it would need larger memory than the default one. In “$TOMCAT_DIR/bin/catalina.sh”, you can add something like the following line:

export JAVA_OPTS="$JAVA_OPTS -Xms256M -Xmx1024M"

This will set your memory size of JVM with 256MB minimum and 1024MB maximum.

3.Turn on the mapping for the invoker servlet of Tomcat.
You can turn it on by changing its configuration files in “$TOMCAT_DIR/conf/web.xml”: remove the comment block “<!–“, “–>” around:

<servlet>
 <servlet-name>invoker</servlet-name>
 <servlet-class>
 org.apache.catalina.servlets.InvokerServlet
 </servlet-class>
 <init-param>
 <param-name>debug</param-name>
 <param-value>0</param-value>
 </init-param>
 <load-on-startup>2</load-on-startup>
</servlet>

and

<servlet-mapping>
 <servlet-name>invoker</servlet-name>
 <url-pattern>/servlet/*</url-pattern>
</servlet-mapping>

4. Remove some unnecessary warnings: in “$TOMCAT_DIR/conf/context.xml”, replace:

<Context>

with:

<Context privileged="true">

5. Performance Tuning: to push CloudTPS into high load, Tomcat must be tuned properly. We recommand increasing the number of threads of Tomcats. To achieve that, the following line in file “$TOMCAT_DIR/conf/server.xml” can be changed into:

<Connector port="8080" protocol="HTTP/1.1" acceptCount="1500"
 maxThreads="1000" connectionTimeout="20000" redirectPort="8443" />

You may increase the “acceptCount” and “maxThreads” even further if necessary.

6. Disable unnecessary logs from the Amazon AWS SDK when using SimpleDB: Create a file named “log4j.properties” in the “$TOMCAT_DIR/lib/” directory. This file should contain the following content:

log4j.rootLogger=WARN, A1
log4j.appender.A1=org.apache.log4j.ConsoleAppender
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
log4j.appender.A1.layout.ConversionPattern=%d [%t] %-5p %c -  %m%n
# Or you can explicitly enable WARN and ERROR messages for  the AWS Java clients
log4j.logger.com.amazonaws=WARN

7. Start Tomcat:

cd $TOMCAT_DIR/bin/
./startup.sh

You can type the URL into the browser and see the welcome page of tomcat:

http://localhost:8080/

You can now invoke a servlet by the following URL:

 http://$SERVERIP:$PORT/$PROJECTNAME/servlet/$SERVLETNAME

4. CloudTPS configuration

CloudTPS configuration files

To configure CloudTPS run across a cluster of nodes, or only locally on your computer, a number of correctly formatted configuration files have to be placed at a specified directory. In CloudTPS, the directory is “../expConf”, which is hard-coded. Note that it is case-sensitive in linux.

So we recommand you to ALWAYS start tomcat at its “bin” directory:

cd $TOMCAT_DIR/bin/
./startup.sh

Then the you can use the following command to create the directory to contain configuration files:

cd $TOMCAT_DIR/
mkdir expConf

The following files should be present in this directory:

role.conf                 defining the CloudTPS membership and the role of each node
system.conf               defining the IP addresses of nodes
TP.conf                   defining internal configuration parameters of CloudTPS
id.conf                   defining the id of local server in the membership
datagen.conf              defining the data size scale (for built-in applications)
hbase-site.xml            (If using HBase)
AwsCredentials.properties (If using SimpleDB)

Example configuration files are provided in directory “exampleConfigFiles” containing two sets of examples. The set called “local” is for local deployment of CloudTPS on a single tomcat instance. The set called “distributed” is for a distributed deployment on three tomcat instances.

Note that **NO** comments are allowed in configuration files. In the following of this document, the comments starting with “#” are only for the convenience of explanation. They should not be present in the real files.

role.conf

The first line is always “origin”. The result of lines are formatted as follows:

 ServerID <space> Role <space> null/all

The ServerID starts from 0.
The assignment of ServerID to participanted servers must be continuous.

An example for locally deployed CloudTPS.

origin
0 Client null                   # The workload generator
0 Servlet all                   # The transaction manager
0 IDService all                 # The timestamp manager
0 DataService all               # The master (obsolete, only for compatibility)
0 DataComponent all             # The HBase node (Master & Region Server)

An example for distributed deployed CloudTPS with three transaction managers.

origin
0 Client null                   # The workload generator
1 Servlet all                   # The transaction manager 1
2 Servlet all                   # The transaction manager 2
3 Servlet all                   # The transaction manager 3
4 IDService all                 # The timestamp manager
4 DataService all               # The master (obsolete, only for compatibility)
5 DataComponent all             # The HBase node (Master Server)
6 DataComponent all             # The HBase node (Region Server)

Note that in any configuration of role.conf, there should always be at least one “Client”, one “Servlet”, one “IDService”, one “DataService” and one “DataComponent”, even in the case of using Amazon SimpleDB. The “DataComponent” with smallest “ServerID” is identified as the Master HBase Server.

id.conf

This configuration file contains only one line containing the local server ID. Each server has a different id, start from 0, as defined in “role.conf”.

Example:
0

system.conf

This file contains the IP address for each server, identified by its server ID.

Example:

0 192.168.0.1
1 192.168.0.2
2 192.168.0.3
3 192.168.0.4
4 192.168.0.5
5 192.168.0.6
6 192.168.0.7

When using Amazon SimpleDB, the serverID corresponding to “DataComponent” means no sense, so any IP will do, e.g., 127.0.0.1.

TP.conf

This configuration file gives control over CloudTPS’s behavior. All parameters have default values in CloudTPS, so missing some of them should not be an immediate problem. In the following, you can find out the functionality for each of them, including the recommended configuration.

The port used by the Tomcat server. Sometimes, you may want to use other ports other than 8080, such as 4080.

TomcatPort=8080

If you intend to run CloudTPS with multiple tomcat instances within the same machine, you can enable allowMultiPorts=true, then each LTM operates on different port so that can be executed within the same machine. The port is defined as “4” + #ServerID + “80”. For example, for node #0, the port is 4080. For node #1, it is 4180, etc. Remember to also change the port setting in the corresponding configuration file “server.xml” according to its #ServerID as defined in “id.conf”.

By default, allowMultiPorts is set to be false

allowMultiPorts=false

Defining the name of the CloudTPS War Package

projectName=CloudTPS      #the name of the CloudTPS War package must be: CloudTPS.war

numOfReplicas is the number of replicas be maintained for each transaction and data item. Therefore, if configured to N, it can tolerate N-1 simultaneous transaction manager failures. Do not set it larger than the total number of transaction managers. The minimum number is 1, which means no replication.

numOfReplicas=2

DB defines the nature of the underlying cloud data store. It CloudTPS is deployed on HBase, use “HBase”. Note that it is case sensitive. Use “SimpleDB” if deployed on Amazon SimpleDB.

DB=HBase

For HBase, we need to horizontally partition tables into sub-tables to achieve better load balancing. For SimpleDB, we need to horizontally partition tables into domains to scale out for more write throughput. The following parameter defines how many partitions for each table. Note to use the same value as when doing data generation. For “TPC-W” under Amazon SimpleDB, this parameter is turned off, as different tables are partitioned in different number of partitions. Instead, CloudTPS will use the predefined value in the function of “getNumOfTablePartitions(DataTable table)” of class “org.zhouw.stps.adapter.SimpleDBAdapter”.

numOfTablePartitions=3

Defining active databases to be loaded into CloudTPS. The name must be identical to the database names defined in “org.zhouw.stps.app.DataModel.initDataModels()”.

database=JoinApp    #test application for Join
database=tpcw        #tpcw application with join queries and RW transactions

CloudTPS generates a number of performance logs. You can selectively turn on some of them. We recommend to only set doClientTransactionLog=Y, as it records the client perceived transaction response time. You need to setup the directory where the performance log are stored.

PerfLogPath=/home/log
doClientTransactionLog=Y    #the client-perceived transaction response time
doIdServiceLog=N        #the time for requesting a timestamp
doloadDataLog=N            #the time of loading a data item from cloud data store
dotpNodeLog=N            #the trace of accessed data items for each transaction
doDSLog=N              #(obsolete)
docheckPointLog=N        #(obsolete)the time for checkpointing data item to the cloud data store

Control the CloudTPS system behavior:

isDoCheckPoint=Y        #defines if CloudTPS checkpoint the updates back to the cloud data store
isDoFaultTolerant=Y        #defines if LTMs start recovery process when meeting network I/O exceptions
isDebug=N            #if defines "Y", when meeting transaction timeout, the system will print debug information and shutdown immediately

Thread pool size, the following values should be fine

TPNodeBGThreadPoolSize=800        #the number of workers to send network messages
CPBGThreadPoolSize=80        #the number of workers to access the cloud data store

Configuring workload generator for build-in applications.

testDuration=600000  # the duration of performance test: in milli-seconds
numOfClients=100     # the number of EBs to generate workload
sleepTime=500        # the time interval before an EB sends the next request when it receives a response: in milli-seconds

Controlling memory management behavior. Set doMemoryManagement=N disables the other parameters of memory management. Note that “bufferLimit” is not the same as the actual JVM memory consumption. The actual memory consumption can be several times larger.

doMemoryManagement=Y      # if "N", then disable the following three parameters
bufferLimit=100000000     # the total size of data items allowed in the local buffer (in Byte)
minimumMemory=N           # Is deleting any not-used data items immediately?
allTableShareBuffer=Y     # (obsolete)

For Simulating Network partition, when you invoke the servlet
“SimulateNetworkPartitionService”

retryInterval_mini=200    #in milliseconds. Time to retry connecting to other LTMs is defined:
#retryInterval_mini+random(0,retryInterval_var)
retryInterval_var=3000    #in milliseconds
NPDuration=10000          #in milliseconds, The duration of simulated network partition
NPport=29932              #The false port which LTMs will use to communicate, so as to simulate network partition

Defining the workload of TPC-W applications:
“org.zhouw.stps.app.otpcw.TPCWWorkload” and

numOfROTranForEB=5      #The number of RO-Transactions sent in each round
numOfRWTranForEB=1      #The number of read-write transactions sent in each round
workloadType=2          #0-1:benchmarks for TPC-W
#0:half simple queries, half join queries for RO-Transactions
#1:all join queries for RO-Transactions
#2:micro-benchmarks, the following parameters take effect
joinDepth=1
numOfJoinDataItems=2
isUpdateIndex=Y
numOfWriteDataItems=4

Controlling the consistent hashing across LTMs, using Virtual Nodes to achieve better load balance. Minimum: 1, usually, 50-100 will work fine.

numOfVirtualNodes=100

datagen.conf

This file is needed only when generating data for the TPC-W application. The recommanded value generate 10000 items, 144000 customers and 129600 orders.

NUM_ITEMS=10000  # Number of rows in the Item table
NUM_EBS=50       # Coefficient for determining the number of customers and orders
NUM_TASKS=50     # don't change this
NUM_THREAD=50    # don't change this

hbase-site.xml

Edit hbase-site.xml according to the HBase manual. This file is used by CloudTPS to determine the location of HBase Master Server. The file can be copied from your HBase configuration file directory.

AwsCredentials.properties

Edit AwsCredentials.properties according to the AWS manual. To allow CloudTPS accessing Amazon SimpleDB, you should provide your AccessKeyID and SecretAccessKey.

5. Built-in applications

Two applications are already built inside CloudTPS, they are located in the package “org.zhouw.stps.app”:

A simple application which manages information about articles and authors. It is available in package “org.zhouw.stps.app.join”.
A simplified version of the TPC-W application as described in our paper “Consistent join for cloud data stores” paper. Note that this example showcases the use of join queries. It is available in package “org.zhouw.stps.app.otpcw”.

Be careful if you decide to test the TPC-W application on SimpleDB: as it will create almost 90 domains and generate a large amount of data. Every operation on SimpleDB costs real money!!

If looking carefully into the source, you may find another application implemented in package “org.zhouw.stps.app.tpcw”. This is a legacy application with no joins that we used for our CloudTPS paper. We recommand to ignore it. Part of its functionality has been removed in order to remain compatiable to current implementation. We may completely remove it in the next release.

Data Generation for build-in applications

You have to generate data for these applications before any performance tests can be applied. CloudTPS supports generating data for both applications in both HBase and Amazon SimpleDB (as defined in TP.conf). Just execute the “main()” function of the following classes to generate data. Or start the tomcat, and invoke a servlet.

1) Generating data for the “Article” application

Execute the main function in the class of “org.zhouw.stps.app.join.DataGen”. Alternatively you can call DoJoinAppDataGen servlet by the URL:

http://localhost:port/CloudTPS/servlet/DoJoinAppDataGen

2) Generating data for the “TPC-W” application

Execute the main function in the class of “org.zhouw.stps.app.otpcw.TPCWDataGen”. Alternatively you can call DoJoinAppDataGen servlet by the URL:

http://localhost:port/CloudTPS/servlet/OTPCWDataGenService

Start CloudTPS

Now you can start CloudTPS by the following process:

1. Start Hadoop and HBase. If deployed on SimpleDB, this step can be skipped.

2. Start the CloudTPS system:

For each node acting as Local Transaction Managers (LTMs), i.e., act as “Servlet” in “role.conf”: just start Tomcat.
For the node of “timestamp manager”, i.e., the one which acts as “IDService”: just start tomcat
For the node “master”, i.e., the one which acts as “DataService”: This node is not useful anymore.

Lastly, if you want to test the deployment and performance of CloudTPS with the buid-in applications, you can execute following commands at each node which acts as clients, i.e., act as “Client” in “role.conf”

Invoke Servlet “OTPCWTestService” to start the test on TPC-W application.

wget http://localhost:8080/CloudTPS/servlet/OTPCWTestService

Invoke Servlet “ShowAppData” to start the test on “Article” application.

 wget http://localhost:8080/CloudTPS/servlet/ShowAppData

The complete database would be shown in the page, including the automatically generated index entries. In the inputbox, you can input simple SQL queries to query the data of “Article” application. The query can only include following case-insensitive keywords: “SELECT” “FROM” “WHERE” “AND” “=” “.” The data model definition can be found in “org.zhouw.stps.app.join.JoinAppDataModel”. Example SQLs would be:

Select * From article, author where ar_id = "?" and au_id = ar_firstAuthorID
Select * From article, author where au_id = "?" and au_id = ar_firstAuthorID
Select * From article, author where author.au_id = "?" and au_id = ar_firstAuthorID and ar_year = "?"

Analyze performance log

Each node of CloudTPS, including client nodes, generates a number of performance log files, according to the configuration of “TP.conf”. We provide you with the tools to analyze them, so that you can find out the throughput, response time, memory space usage, hit rate and etc. The source node are provided in directory “LogResultGen”. Note that the first and last 3 minutes of performance logs are ignored by these tools.

The analysis of these logs can be done in two steps:

1) Each CloudTPS node executes the following command on its local performance data.

java  org.zhouw.stps.monitor.ResultGenerator inputPath, outputPath, configurationPath

inputPath: The path of the performance logs
outputPath: The path of the generated results
configurationPath: The path of the configuration files ($TOMCAT_DIR$/expConf)

2) Collecting all analyzed performance data from all nodes into the same directory, and then
performing the global analysis.

java  org.zhouw.stps.monitor.FinalResultReduce inputPath, outputPath, configurationPath

inputPath: The path of the performance logs
outputPath: The path of the generated results
configurationPath: The path of the configuration files ($TOMCAT_DIR$/expConf)

The analyzed results are a group of txt files. The most useful one is the “AllClientTransaction_Overview”. It records the total response time, throughput, etc. The other files can be visualized by using gnuplot. We also provide the gnuplot script to generate figures of performance. The scripts are in the directory of “graphScript”.

CloudTPS ShutDown

The servlet “PrepareShutDown” should be called before closing tomcat, so as to prevent possible memory leaks.

http://127.0.0.1:8080/CloudTPS/servlet/PrepareShutDownDeveloping New Applications

If you want to deploy new applications onto the CloudTPS, please read the document “development.txt” for more details.

7. Developing new applications

In general, application development consists of two parts:

server-side
2) client-side

Server-Side development

The first step is to define the data model of new application for CloudTPS.

CloudTPS maintains multiple databases, each is a collection of tables. Tables in different databases can have the same name. An application may access multiple databases. The active database names should be defined in “TP.conf”, so CloudTPS will load their schema and applications can access them.

1) The schema of databases can be defined by creating a “schema” class. This “schema” class extends from class “org.zhouw.stps.appDataModel“. Take the built-in TPC-W application as an example, the class “org.zhouw.stps.app.otpcw.TPCWAPPDataModel” defines the schema. In the constructor, it invokes the following functions:

//Firstly, define the unique database name:
DataModel(String name);

//Defines a table
super.inputOneTable(	DataModel dm, 		// dm=this
			String tablename, 	//the name of this table
			String pkname, 		//the name of the primary key
			String[] cols		//the list of names of columns
			)
//Defines an index table on a Secondary-key
super.inputOneIndexTable( String baseTableName, //the original table name
			  String baseKeyName	//the name of the SK
			)

//Defines a foreign-key relationship
inputOneFKtoPK(	String baseTableName, 		//the referenced table name
		String fkTableName, 		//the name of the table that contains the foreign key
		String fkAttrName, 		//the name of the foreign key
		boolean isPKtoFK		//will query access backward, i.e., Known PK and then find out the matching FKs
						//if true, CloudTPS will maintain index attributes for it.
		)

Note that at the end of the schema definition, “super.fillTablesByPhysical();” MUST be invoked.

2) Register the defined database

Register the database with its name into the dataModel repository of CloudTPS. Take “org.zhouw.stps.app.otpcw.TPCWAPPDataModel” for example, then edit the function of “org.zhouw.stps.app.DataModel.initDataModels()” adding two lines:

TPCWAPPDataModel tpcwDM=new TPCWAPPDataModel();
result.put(tpcwDM.getDBName(), tpcwDM);

3) Configure “TP.conf”

database={dbName}

Client-side development

We will explain writing and submitting of join queries and read-write transactions by examples. More examples of join queries can be found in “org.zhouw.stps.app.otpcw.TPCWQueries“. Examples of read-write transactions can be found in “org.zhouw.stps.app.otpcw.TPCWTransactions“.

Note that before submittng a join query as a “query plan”, this “query plan” must be processed like this:

makePlanReady(plan);

Writing Join queries

Case 1: Simple primary-key queries

//SELECT c_fname,c_lname FROM customer WHERE c_id = ?
public static QueryPlan getCustomerNameS(int c_id) 		//the join query is returned as a QueryPlan object
{
	QueryPlan plan=new QueryPlan(dm.getDBName()); 		//initiate the query plan
	JoiningTable table=new JoiningTable(0, "customer"); 	//create a JoinTable
	table.allowedRows.add(String.valueOf(c_id));		//add root records
	table.projections.add("C_FNAME");			//add a column to be returned, "*" means all
	table.projections.add("C_LNAME");			//add a column to be returned
	plan.tables.put(0, table);				//register the JoinTable
	plan.rootTable=table;					//set the root table
	makePlanReady(plan);					//must be invoked before submitting the QueryPlan object
	return plan;
}

Case 2: Join two tables

//SELECT * FROM item,author WHERE item.i_a_id = author.a_id AND i_id = ?
public static QueryPlan getItemAndAuthor(int itemID)
{
	QueryPlan plan=new QueryPlan(dm.getDBName());		//initiate the query plan
	JoiningTable itemTable=new JoiningTable(0, "item");	//create a JoinTable "item", id "0"
	JoiningTable authorTable=new JoiningTable(1, "author"); //create a JoinTable "author", id "1"
	itemTable.allowedRows.add(String.valueOf(itemID));	//add root records
	itemTable.projections.add("*");				//"*" means all
	authorTable.projections.add("*");			//"*" means all
	JoinEdge edge=new JoinEdge(0, "i_a_id".toUpperCase(), 	//creates a JoinEdge, indicating the FK "i_a_id" to PK "a_id"
		JoinEdge.FK, 1, "a_id".toUpperCase(), JoinEdge.PK);	//relationships, JoinTable is identified by its id

	plan.tables.put(0, itemTable);				//register both JoinTables
	plan.tables.put(1, authorTable);
	plan.rootTable=itemTable;				//set the root table

	plan.inputEdge(edge);					//input the edge
	makePlanReady(plan);					//must be invoked before submitting the QueryPlan object
	return plan;
}

Case 3: Self Join

//SELECT J.i_id,J.i_thumbnail from item I, item J where (I.i_related1 = J.i_id) and I.i_id = ?
public static QueryPlan getRelatedItem(int itemID)
{
	QueryPlan plan=new QueryPlan(dm.getDBName());		//initiate the query plan
	JoiningTable itemTableI=new JoiningTable(0, "item");	//create a JoinTable "item", id "0"
	JoiningTable itemTableJ=new JoiningTable(1, "item");	//create a JoinTable with the same name "item", but with id "1"

	itemTableJ.projections.add("i_id".toUpperCase());	//add column to be returned
	itemTableJ.projections.add("i_thumbnail".toUpperCase());//add column to be returned

	plan.tables.put(0, itemTableI);				//register both JoinTables
	plan.tables.put(1, itemTableJ);

	JoinEdge edge=new JoinEdge(0, ("i_related1").toUpperCase(), //creates a JoinEdge, indicating the FK "i_a_id" to PK "a_id"
		JoinEdge.FK, 1, "i_id".toUpperCase(), JoinEdge.PK); //relationships, JoinTable is identified by its id
	plan.inputEdge(edge);

	plan.rootTable=itemTableI;				//set the root table
	itemTableI.allowedRows.add(String.valueOf(itemID));	//set the root records
	makePlanReady(plan);					//must be invoked before submitting the QueryPlan object
	return plan;
}

Case 4: Secondary-key queries

// SELECT * FROM customer, address, country WHERE customer.c_addr_id = address.addr_id
// AND address.addr_co_id = country.co_id AND customer.c_uname = ?
public static QueryPlan getCustomer(String c_uname)
{
	QueryPlan plan=new QueryPlan(dm.getDBName()); 		//initiate the query plan
	JoiningTable custTable=new JoiningTable(0, "customer"); //create a JoinTable "customer", id "0"
	JoiningTable addrTable=new JoiningTable(1, "address");	//create a JoinTable "address", id "1"
	JoiningTable ctryTable=new JoiningTable(2, "country");	//create a JoinTable "country", id "2"
	JoiningTable indexTable=new JoiningTable(3, 		//create a JoinTable for the index table, id "3"
		DataModel.getIndexTableLogicalName("customer", "c_uname"));

	custTable.projections.add("*");				//add all columns to be returned
	addrTable.projections.add("*");				//add all columns to be returned
	ctryTable.projections.add("*");				//add all columns to be returned
	plan.tables.put(0, custTable);				//register a JoinTable
	plan.tables.put(1, addrTable);				//register a JoinTable
	plan.tables.put(2, ctryTable);				//register a JoinTable
	plan.tables.put(3, indexTable);				//register a JoinTable

	JoinEdge edge=new JoinEdge(0, "c_addr_id".toUpperCase(), 		//creates a JoinEdge,
		JoinEdge.FK, 1, "addr_id".toUpperCase(), JoinEdge.PK);
	JoinEdge edge2=new JoinEdge(1, "addr_co_id".toUpperCase(), 		//creates a JoinEdge,
		JoinEdge.FK, 2, "co_id".toUpperCase(), JoinEdge.PK);
	JoinEdge edge3=new JoinEdge(0, "c_uname".toUpperCase(), JoinEdge.FK, 3, //creates a JoinEdge,
		DataModel.getIndexTablePKName("customer").toUpperCase(), JoinEdge.PK);

	plan.inputEdge(edge);					//input the edge
	plan.inputEdge(edge2);					//input the edge
	plan.inputEdge(edge3);					//input the edge

	plan.rootTable=indexTable;				//set the root table
	custTable.allowedRows.add(c_uname);			//set root records
	makePlanReady(plan);
	return plan;
}

Submiting Join queries

Applications submit join queries via the function of “org.zhouw.stps.client.TransExecutor”.

public static ROTranResult runROTran(QueryPlan plan);
public static ROTranResult runROTran(QueryPlan plan, String name);

In the second function, you can set a name for the join query. This name will appear in the performance log of CloudTPS. This function is blocking, it will not return until the CloudTPS finishes executing the join query.

Writing Read-Write Transactions

A Transaction composed a list of SubTransaction, each of which only access one data item. We have to firstly implement sub-transactions. The application can group any number and any type of sub-transactions into one transaction. But each sub-transaction of a transaction must access a different data item. One can implement a SubTransaction by invoking the functions:

To update a data item:

public static SubTransaction genUpdateSubTranForClient(String dbName, String strTblName, String strRowKey,
			Iterable  colsToBeDeleted, \\ the list of column names to be deleted
			Iterable > colsToBeUpdated, \\the list of updates Map

			Iterable  colsToReturn)		\\ columns to be returned

To Insert a data item, will abort if the data item already exists:

public static SubTransaction genInsertSubTranForClient(String dbName, String strTblName, String strRowKey, Iterable > colsToInserted)

To remove a data item:

public static SubTransaction genDeleteAllSubTranForClient(String dbName, String strTblName, String strRowKey)

Finally, creating a transation instance is easy, just use the following construct function:

public Transaction(
	SubTransaction[] trans  		//an array of SubTransaction instance
)

public Transaction(
	SubTransaction tran  		//for single-row transaction
)

The following is the example of Updating two items in a transaction:

//	Begin Transaction;
//	UPDATE item SET i_cost = ?, i_image = ?, i_thumbnail = ?, i_pub_date = CURRENT DATE WHERE i_id = ?;
//	UPDATE item SET i_cost = ?, i_image = ?, i_thumbnail = ?, i_pub_date = CURRENT DATE WHERE i_id = ?;
//	Commit Transaction;

public static Transaction getUpdateItems(int itemid1, int itemid2)
{
	Transaction tran;				// the transaction to be returned
	Date d=globalCal.getTime();
	HashMap colsToBeUpdated=new HashMap(); // the updates for item1
	colsToBeUpdated.put("i_cost".toUpperCase(), 100);
	colsToBeUpdated.put("i_image".toUpperCase(), "Image"+itemid1);
	colsToBeUpdated.put("i_thumbnail".toUpperCase(), "thumbnail"+itemid1);
	colsToBeUpdated.put("i_pub_date".toUpperCase(), d.toString());

	HashMap colsToBeUpdated2=new HashMap(); // the updates for item2
	colsToBeUpdated.put("i_cost".toUpperCase(), 200);
	colsToBeUpdated.put("i_image".toUpperCase(), "Image"+itemid2);
	colsToBeUpdated.put("i_thumbnail".toUpperCase(), "thumbnail"+itemid2);
	colsToBeUpdated.put("i_pub_date".toUpperCase(), d.toString());

	SubTransaction st=SubTransaction.genUpdateSubTranForClient(dm.getDBName(), //generate update subtransaction for item1
			"item", String.valueOf(itemid1), null, colsToBeUpdated.entrySet(), null);

	SubTransaction st2=SubTransaction.genUpdateSubTranForClient(dm.getDBName(), //generate update subtransaction for item2
			"item", String.valueOf(itemid2), null, colsToBeUpdated.entrySet(), null);

	tran=new Transaction(new SubTransaction[]{st1, st2});			//group into a Transaction
	tran.name="UpdateItemInfo";						//give a name to the transaction
	return tran;
}

Submiting Read-Write Transactions

Applications submit “Transaction” objects to CloudTPS via “org.zhouw.stps.client.TransExecutor.runTransaction(Transaction t)M” and pass the instance as the parameter. The “runTransaction” function is a blocking function, it will return only after the transaction has either COMMIT or ABORT.

Implementing Transactions with generalized operations

To support transaction with more generalized operation semantics, one can extend the SubTransaction Class and create their own SubTransaction subclasses. However, this would introduce modification to the core of CloudTPS, we therefore not recommand it.

Data generation

You may need to partition the data tables, either in HBase and SimpleDB. Otherwise, just set numOfTablePartitions=1. The following guides apply for both HBase and SimpleDB.

In class “org.zhouw.stps.app.common.DataGenCommon“, CloudTPS provides support to efficient generate application data. After defining the database schema, one can invoke the following function to create all table partitions according to the setting of “numOfTablePartitions” in “TP.conf”. This is a blocking operation.

public static void createAllTables(DataModel dm)

After all table partitions are created, the application can insert data items via the function:

public static void putOneRow(DataTable tableSchema,  String pkvalue, LinkedHashMap rowData)

This function will automatically generates updates of index attributes. So the index management is also transparent to applications even for data generation. Note that this function is however not a blocking function. It returns before the data item has been written to the cloud data store. So don’t send updates too fast, or the request queue may overflow.

Performance Logs

CloudTPS can still provide performance logs for your newly created applications, as long as you have configured them properly in “TP.conf“.

Recompiling CloudTPS

You can then compile the source code of CloudTPS to generate the WAR deployment package. A java IDE such as “Eclipse WTP” can be used to the compile the source code. The IDE can be downloaded here: http://www.eclipse.org/downloads/
Then select “Eclipse IDE for Java EE Developers”. After starting Eclipse with a workspace, you can create a dynamic Web Project and copy the CloudTPS source codes into its source folder. After refreshing your project, Eclipse will compile CloudTPS automatically. A WAR package can then be exported by right-click on your project in the “Project Explorer” and select “export”–> “WAR File”.

Note that the name of WAR package **MUST** be equal to the parameter “projectName” defined in “TP.conf”. The default name is “CloudTPS.war”.

6. Contact

If you meet problems in deploying CloudTPS, you can contact us by email at zhouw@few.vu.nl.

Note that this is an experimental prototype implementation, there are still bugs in it and it may not be stable. I make no guarantees of a new version in the future to fix these bugs. The comments in the source code may not be up-to-date, and some of them can be misleading. Do not trust them 🙂

Good luck!

-Zhou Wei-