The Streaming Architecture pattern describes a J2EE
architecture that provides asymptotically better memory performance on
the application server than Sun's current, endorsed standards.
This article illustrates the streaming architecture approach through
a practical example, and provides hard performance data to back
the theory underlying the approach.
Implementation
In this example, we will be sending presidential data from a
MySQL 5.0 database to the client's
browser. A single record will consist of a presidential thumbnail (7KB
average size), dates of office, name, and a one paragraph summary, courtesy
of Wikipedia.
This page, only viewable in Firefox, shows
output for all presidents.
Our solution will consist of the following components:
A data transfer object to represent a row of presidential data.
A page template that is used for producing the output page.
A request handler that receives the client request and sends a
response. In this example, a servlet is used, but if one was using
a framework like Struts to develop applications, this would be
an Action Handler instead.
Data Transfer Object
The data transfer object is a straightforward POJO (plain old Java object).
It is populated from the result set and then passed to the output page
for rendering.
President.java
public class President implements Serializable {
private String name;
private Date officeStart;
private Date officeEnd;
private byte[] thumbnail;
private String summary;
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
...
}
Page Template
The page template departs significantly from standard J2EE design. The key
difference is that usually data is retrieved in its entirety and stored in
request attributes before control is forwarded to a JSP, but here there is
no such restriction. Instead the page template is created by the servlet,
and, as each row of data is processed, it is immediately formatted using
the page template and written to the client. Recall that this is a key aspect
of streaming architecture: As results are retrieved from the back-end, they
are written immediately to the client.
In this example,
StringTemplate is used for output. However, any other templating
engine can be used, including a JSP processor (in which case lifecycle
methods would include JSP fragments), or
Velocity.
presidents.stg
group presidents;
header() ::= <<
<html>
<head>
<style type='text/css'>
th {font-family: arial; font-size: 10pt; background: #efefef; text-align: left;}
.name {width:1.5in;}
.desc {width:3.5in; text-align: justify;}
img {margin-right: 12pt;}
td {vertical-align:top;
padding: 2pt;
padding-top: 6pt;
border-bottom: 1px solid #626262;}
</style>
</head>
<table cellspacing='0'>
<thead>
<th>Picture</th>
<th>Name</th>
<th>From</th>
<th>To</th>
<th>Summary</th>
</thead>
>>
line(president,thumbnail) ::= <<
<tr>
<td><img src='data:image/jpeg;base64,$thumbnail$'/></td>
<td class='name'>$president.name$</td>
<td class='from'>$president.officeStart$</td>
<td class='to'>$president.officeEnd$</td>
<td class='desc'>$president.summary$</td>
</tr>
>>
footer(timeByte1, timeRow1, timeClose) ::= <<
</table>
<pre>
Time till byte 1 = $timeByte1$ ns.
Time till row 1 = $timeRow1$ ns.
Time till close = $timeClose$ ns.
</pre>
</html>
>>
Request Handler
The request handler is the most important piece of a streaming architecture
implementation. Below is the entire code for the request handler class:
StreamingServlet.java
public class StreamingServlet extends HttpServlet {
private StringTemplateGroup presidentTemplate;
public void init(ServletConfig config) throws ServletException {
InputStreamReader in = new InputStreamReader(getClass().getClassLoader()
.getResourceAsStream("org/ahmadsoft/stream/template/presidents.stg"));
presidentTemplate = new StringTemplateGroup(in, DefaultTemplateLexer.class);
}
public void doGet(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
1 long a = System.nanoTime();
2
3 StringTemplate header = presidentTemplate.getInstanceOf("header");
4 PrintWriter out = new PrintWriter(new OutputStreamWriter(response
5 .getOutputStream()));
6 long b = System.nanoTime();
7 out.print(header.toString());
8
9 long c = -1;
10 try {
11 Class.forName("com.mysql.jdbc.Driver");
12 Connection connection = DriverManager
13 .getConnection("jdbc:mysql://localhost/mysql?user=root&password=XXX ↵
&defaultFetchSize=10&useCursorFetch=true");
14 Statement stmt = connection.createStatement();
15 stmt.setFetchDirection(ResultSet.FETCH_FORWARD);
16 ResultSet rs = stmt.executeQuery("SELECT * FROM PRESIDENT");
17
18 BASE64Encoder base64 = new BASE64Encoder();
19 while (rs.next()) {
20 if (c == -1)
21 c = System.nanoTime();
22 President p = new President();
23 p.setName(rs.getString("TX_NAME"));
24 p.setOfficeStart(rs.getDate("DT_OFFICE_ST"));
25 p.setOfficeEnd(rs.getDate("DT_OFFICE_END"));
26 p.setSummary(rs.getString("TX_SUMMARY"));
27 p.setThumbnail(Util.getBytes(rs.getBlob("BL_THUMBNAIL")));
28
29 StringTemplate line = presidentTemplate.getInstanceOf("line");
30 line.setAttribute("president", p);
31 line.setAttribute("thumbnail", base64.encode(p.getThumbnail()));
32 out.print(line.toString());
33 }
34 connection.close();
35 } catch (Exception e) {
36 e.printStackTrace(out);
37 }
38
39 long d = System.nanoTime();
40 StringTemplate footer = presidentTemplate.getInstanceOf("footer");
41 footer.setAttribute("timeByte1", Long.valueOf(b-a));
42 footer.setAttribute("timeRow1" , Long.valueOf(c-a));
43 footer.setAttribute("timeClose", Long.valueOf(d-a));
44 out.print(footer.toString());
45 out.close();
}
}
In line 3, the page header is created, and in line 7 it is written out, since,
in this case, it is independent of the data. If a flush call was made
after line 7, the user would see output within a split second of submitting
the request, which improves perceptions of application responsiveness.
Lines 11 through 15 open a connection to the database and configure the statement.
It is critical to configure the connection and statement so that the database server
does not send the entire result set to the application server, but rather sends chunks
on demand. Clearly if the entire dataset is sent in a single piece to the application
server, we will have nothing to show for our labors! To accomplish this in MySql 5
requires setting useCursorFetch=true and setting the fetch direction
of the statement to ResultSet.FETCH_FORWARD. DB2 automatically chunks the data
if the statement is scrollable (either ResultSet.TYPE_SCROLL_INSENSITIVE or
ResultSet.TYPE_SCROLL_SENSITIVE). Last, note that normally a connection
would be obtained from an application server's connection pool, but for the sake
of producing an application that can be deployed easily, this example takes a
shortcut.
Lines 19-33 loop through the result set, populating a president object and writing it
to the client in each pass. This is the crux of the streaming architecture: No more
than a single row is kept in memory at once.
Finally, the footer is rendered on line 44 and the output stream to the client
is closed on line 45.
Testing
After illustrating the concepts behind streaming architecture with a detailed
example, let us validate its theoretical performance advantage.
In this section, the streaming architecture is compared against two other
implementations, described below, that implement currently-prescribed best
practices:
Remote EJB. A standard remote EJB implementation. A request
handler receives a request, calls a remote EJB and retrieves an array of
presidents. The presidents are passed to an output page for rendering (for
simplicity, this example reuses the PresidentialPágina, but normally a JSP
would be used—the change does not impact performance significantly).
Local EJB. Identical to the remote EJB implementation except
that there is no serialization overhead when retrieving the list of presidents.
Most modern J2EE applications make use of this local EJB calls to improve
performance.
The source code for all three can be downloaded from the Resources section.
Recall that streaming architectures offer, in theory, identical
response times with asymptotically lower memory usage. Lower memory
translates into the ability to handle more simultaneous requests, and hence,
better scalability. Therefore, we will use Microsoft's Web Application
Stress tool to pound each of these implementations continuously for one
minute each. The tool will be run with three different concurrency
settings: 4 concurrent requests, 16, and 32. In addition, the amount of
data returned will also be adjusted across the following set of values: 86
rows, 129 rows, 172 rows, 301 rows, 473 rows, and 989 rows. We will be
interested in seeing how well each implementation handles the load.
The table below shows the size of the result page returned by the server
for each number of database rows. Note that the sizes below include the
images.
Rows
Page Size (bytes)
86
879,530
129
1,318,997
172
1,758,467
301
3,076,877
473
4,834,758
989
10,108,398
The tests were performed on an Intel Core 2 Duo 6400 with 2GB of RAM,
running MySQL Ver 14.12 Distrib 5.0.27, for Win32 (ia32), Tomcat 5.5,
and Java 1.5.0_09 for Windows. RAM allocation for Tomcat was 64MB max.
Results and Analysis
Below are the results of the tests, with throughput measured in requests
per minute.
4 threads
16 threads
32 threads
86 rows
Streaming
329
405
416
EJB Local
375
495
501
EJB Remote
467
460
473
129 rows
Streaming
344
319
300
EJB Local
354
360
351
EJB Remote
331
327
328
172 rows
Streaming
262
246
237
EJB Local
285
277
268
EJB Remote
250
241
219
301 rows
Streaming
95
117
124
EJB Local
152
145
EJB Remote
148
473 rows
Streaming
60
71
75
EJB Local
102
EJB Remote
93
989 rows
Streaming
44
27
37
EJB Local
48
EJB Remote
Indicates that the test did not complete due to an OutOfMemory
error.
There are several interesting trends within this data. First, regardless
of implementation, the throughput of the server does not scale linearly as
the number of simultaneous requests increases from 4 to 16 to 32. The reason
is that because on the Core 2 Duo test machine, 4 simultaneous threads were
enough to saturate both cores. As a corollary, the average response times
increased as the concurrency went up (for full result data, including average
response times, please see Resources).
Secondly, the 86 row results are clearly an anomaly. For 4 threads, the EJB remote throughput
is higher than the EJB local implementation's throughput, despite the fact
that the only difference between the two implementations is extra serialization
in the former! From this, we conclude that 86 rows of test data was not enough
data to cause implementation differences to dominate throughput results,
i.e., other factors on the machine were the dominating factors.
The 129 and 172 row results show that when memory resources on the machine are not
under stress (in this case because a small number of rows are being returned), all three
approaches have roughly comparable throughput. However, the EJB local implementation
consistently outperforms the streaming implementation by about 10-15%. This result,
I cannot explain.
The 301, 473, and 989 row results validate claims of the streaming architecture's
scalability. With large datasets and high concurrency, the traditional implementations
place too much stress on available memory and the application server melts down.
The streaming implementation, however, continues to handle the load. In fact, for
the 301 row dataset, the streaming implementation was able to handle a whopping 128
simultaneous requests without going down! That represents 8 times the load handled
by the local EJB implementation, in the same amount of memory!
Several readers have pointed out that by increasing the available memory to Tomcat
to a larger number, say 1GB as is common in many production environments, the EJB
implementations could have easily handled the larger datasets. This is a valid point,
but it does not change the fact that a streaming architecture makes far better use of
available memory, and is, therefore, able to handle more simultaneous requests for
large datasets, no matter what the specific free memory setting is.
Feedback
I would welcome feedback on this article, especially on issues that fundamentally affect
the approach outlined here.
Robert Cooper pointed out to me that by passing a "lazy collection" to a JSP—i.e.
a collection that transparently fetches chunks of data from the database server on demand—
one would achieve a similar level of performance. He further pointed out that Hibernate,
Toplink, and other ORM tools provide such a feature.
mwst.mdb (~1MB) contains the full database of
results obtained during testing. Note: this file must be opened with the Web Application
Stress Tool.
source Browse the
benchmark source online using OpenGrok.
stream.zip (930KB) contains an Eclipse
project that includes MySQL DDL, a data uploader to populate tables for the test, and
all three implementations referenced in this article. Please note that this project
uses sysdeo's Eclipse Tomcat Launcher plugin.