Software - Articles

Streaming Presidents

The Streaming Architecture pattern describes a J2EE architecture that provides asymptotically better memory performance on the application server than Sun's current, endorsed standards. This article illustrates the streaming architecture approach through a practical example, and provides hard performance data to back the theory underlying the approach.


In this example, we will be sending presidential data from a MySQL 5.0 database to the client's browser. A single record will consist of a presidential thumbnail (7KB average size), dates of office, name, and a one paragraph summary, courtesy of Wikipedia. This page, only viewable in Firefox, shows output for all presidents.

Our solution will consist of the following components:

  1. A data transfer object to represent a row of presidential data.
  2. A page template that is used for producing the output page.
  3. A request handler that receives the client request and sends a response. In this example, a servlet is used, but if one was using a framework like Struts to develop applications, this would be an Action Handler instead.

Data Transfer Object

The data transfer object is a straightforward POJO (plain old Java object). It is populated from the result set and then passed to the output page for rendering.
public class President implements Serializable {
	private String name;
	private Date officeStart;
	private Date officeEnd;
	private byte[] thumbnail;
	private String summary;
	public String getName() {
		return name;
	public void setName(String name) { = name;

Page Template

The page template departs significantly from standard J2EE design. The key difference is that usually data is retrieved in its entirety and stored in request attributes before control is forwarded to a JSP, but here there is no such restriction. Instead the page template is created by the servlet, and, as each row of data is processed, it is immediately formatted using the page template and written to the client. Recall that this is a key aspect of streaming architecture: As results are retrieved from the back-end, they are written immediately to the client.

In this example, StringTemplate is used for output. However, any other templating engine can be used, including a JSP processor (in which case lifecycle methods would include JSP fragments), or Velocity.

group presidents;

header() ::= <<
    <style type='text/css'>
      th {font-family: arial; font-size: 10pt; background: #efefef; text-align: left;}
      .name {width:1.5in;}
      .desc {width:3.5in; text-align: justify;}
      img   {margin-right: 12pt;}
      td    {vertical-align:top; 
             padding: 2pt; 
             padding-top: 6pt; 
             border-bottom: 1px solid #626262;}
<table cellspacing='0'>

line(president,thumbnail) ::= <<
    <td><img src='data:image/jpeg;base64,$thumbnail$'/></td>
    <td class='name'>$$</td>
    <td class='from'>$president.officeStart$</td>
    <td class='to'>$president.officeEnd$</td>
    <td class='desc'>$president.summary$</td>

footer(timeByte1, timeRow1, timeClose) ::= <<
Time till byte 1 = $timeByte1$ ns.
Time till row 1  = $timeRow1$ ns.
Time till close  = $timeClose$ ns.

Request Handler

The request handler is the most important piece of a streaming architecture implementation. Below is the entire code for the request handler class:
public class StreamingServlet extends HttpServlet {
    private StringTemplateGroup presidentTemplate;

    public void init(ServletConfig config) throws ServletException {
        InputStreamReader in = new InputStreamReader(getClass().getClassLoader()
        presidentTemplate = new StringTemplateGroup(in, DefaultTemplateLexer.class);

    public void doGet(HttpServletRequest request, HttpServletResponse response)
            throws ServletException, IOException {

 1      long a = System.nanoTime();
 3      StringTemplate header = presidentTemplate.getInstanceOf("header");
 4      PrintWriter out = new PrintWriter(new OutputStreamWriter(response
 5              .getOutputStream()));
 6      long b = System.nanoTime();
 7      out.print(header.toString());
 9      long c = -1;
10      try {
11          Class.forName("com.mysql.jdbc.Driver");
12          Connection connection = DriverManager
13                  .getConnection("jdbc:mysql://localhost/mysql?user=root&password=XXX  ↵
14          Statement stmt = connection.createStatement();
15          stmt.setFetchDirection(ResultSet.FETCH_FORWARD);
16          ResultSet rs = stmt.executeQuery("SELECT * FROM PRESIDENT");
18          BASE64Encoder base64 = new BASE64Encoder();
19          while ( {
20              if (c == -1)
21                  c = System.nanoTime();
22              President p = new President();
23              p.setName(rs.getString("TX_NAME"));
24              p.setOfficeStart(rs.getDate("DT_OFFICE_ST"));
25              p.setOfficeEnd(rs.getDate("DT_OFFICE_END"));
26              p.setSummary(rs.getString("TX_SUMMARY"));
27              p.setThumbnail(Util.getBytes(rs.getBlob("BL_THUMBNAIL")));
29              StringTemplate line = presidentTemplate.getInstanceOf("line");
30              line.setAttribute("president", p);
31              line.setAttribute("thumbnail", base64.encode(p.getThumbnail()));
32              out.print(line.toString());
33          }
34          connection.close();
35      } catch (Exception e) {
36          e.printStackTrace(out);
37      }
39      long d = System.nanoTime();
40      StringTemplate footer = presidentTemplate.getInstanceOf("footer");
41      footer.setAttribute("timeByte1", Long.valueOf(b-a));
42      footer.setAttribute("timeRow1" , Long.valueOf(c-a));
43      footer.setAttribute("timeClose", Long.valueOf(d-a));
44      out.print(footer.toString());
45      out.close();

In line 3, the page header is created, and in line 7 it is written out, since, in this case, it is independent of the data. If a flush call was made after line 7, the user would see output within a split second of submitting the request, which improves perceptions of application responsiveness.

Lines 11 through 15 open a connection to the database and configure the statement. It is critical to configure the connection and statement so that the database server does not send the entire result set to the application server, but rather sends chunks on demand. Clearly if the entire dataset is sent in a single piece to the application server, we will have nothing to show for our labors! To accomplish this in MySql 5 requires setting useCursorFetch=true and setting the fetch direction of the statement to ResultSet.FETCH_FORWARD. DB2 automatically chunks the data if the statement is scrollable (either ResultSet.TYPE_SCROLL_INSENSITIVE or ResultSet.TYPE_SCROLL_SENSITIVE). Last, note that normally a connection would be obtained from an application server's connection pool, but for the sake of producing an application that can be deployed easily, this example takes a shortcut.

Lines 19-33 loop through the result set, populating a president object and writing it to the client in each pass. This is the crux of the streaming architecture: No more than a single row is kept in memory at once.

Finally, the footer is rendered on line 44 and the output stream to the client is closed on line 45.


After illustrating the concepts behind streaming architecture with a detailed example, let us validate its theoretical performance advantage. In this section, the streaming architecture is compared against two other implementations, described below, that implement currently-prescribed best practices:

  1. Remote EJB. A standard remote EJB implementation. A request handler receives a request, calls a remote EJB and retrieves an array of presidents. The presidents are passed to an output page for rendering (for simplicity, this example reuses the PresidentialPágina, but normally a JSP would be used—the change does not impact performance significantly).
  2. Local EJB. Identical to the remote EJB implementation except that there is no serialization overhead when retrieving the list of presidents. Most modern J2EE applications make use of this local EJB calls to improve performance.
The source code for all three can be downloaded from the Resources section.

Recall that streaming architectures offer, in theory, identical response times with asymptotically lower memory usage. Lower memory translates into the ability to handle more simultaneous requests, and hence, better scalability. Therefore, we will use Microsoft's Web Application Stress tool to pound each of these implementations continuously for one minute each. The tool will be run with three different concurrency settings: 4 concurrent requests, 16, and 32. In addition, the amount of data returned will also be adjusted across the following set of values: 86 rows, 129 rows, 172 rows, 301 rows, 473 rows, and 989 rows. We will be interested in seeing how well each implementation handles the load.

The table below shows the size of the result page returned by the server for each number of database rows. Note that the sizes below include the images.

RowsPage Size (bytes)

The tests were performed on an Intel Core 2 Duo 6400 with 2GB of RAM, running MySQL Ver 14.12 Distrib 5.0.27, for Win32 (ia32), Tomcat 5.5, and Java 1.5.0_09 for Windows. RAM allocation for Tomcat was 64MB max.

Results and Analysis

Below are the results of the tests, with throughput measured in requests per minute.
4 threads 16 threads 32 threads
86 rows
Streaming 329 405 416
EJB Local 375 495 501
EJB Remote 467 460 473
129 rows
Streaming 344 319 300
EJB Local 354 360 351
EJB Remote 331 327 328
172 rows
Streaming 262 246 237
EJB Local 285 277 268
EJB Remote 250 241 219
301 rows
Streaming 95 117 124
EJB Local 152 145
EJB Remote 148
473 rows
Streaming 60 71 75
EJB Local 102
EJB Remote 93
989 rows
Streaming 44 27 37
EJB Local 48
EJB Remote
Indicates that the test did not complete due to an OutOfMemory error.

There are several interesting trends within this data. First, regardless of implementation, the throughput of the server does not scale linearly as the number of simultaneous requests increases from 4 to 16 to 32. The reason is that because on the Core 2 Duo test machine, 4 simultaneous threads were enough to saturate both cores. As a corollary, the average response times increased as the concurrency went up (for full result data, including average response times, please see Resources).

Secondly, the 86 row results are clearly an anomaly. For 4 threads, the EJB remote throughput is higher than the EJB local implementation's throughput, despite the fact that the only difference between the two implementations is extra serialization in the former! From this, we conclude that 86 rows of test data was not enough data to cause implementation differences to dominate throughput results, i.e., other factors on the machine were the dominating factors.

The 129 and 172 row results show that when memory resources on the machine are not under stress (in this case because a small number of rows are being returned), all three approaches have roughly comparable throughput. However, the EJB local implementation consistently outperforms the streaming implementation by about 10-15%. This result, I cannot explain.

The 301, 473, and 989 row results validate claims of the streaming architecture's scalability. With large datasets and high concurrency, the traditional implementations place too much stress on available memory and the application server melts down. The streaming implementation, however, continues to handle the load. In fact, for the 301 row dataset, the streaming implementation was able to handle a whopping 128 simultaneous requests without going down! That represents 8 times the load handled by the local EJB implementation, in the same amount of memory!

Several readers have pointed out that by increasing the available memory to Tomcat to a larger number, say 1GB as is common in many production environments, the EJB implementations could have easily handled the larger datasets. This is a valid point, but it does not change the fact that a streaming architecture makes far better use of available memory, and is, therefore, able to handle more simultaneous requests for large datasets, no matter what the specific free memory setting is.


I would welcome feedback on this article, especially on issues that fundamentally affect the approach outlined here.

Robert Cooper pointed out to me that by passing a "lazy collection" to a JSP—i.e. a collection that transparently fetches chunks of data from the database server on demand— one would achieve a similar level of performance. He further pointed out that Hibernate, Toplink, and other ORM tools provide such a feature.