Publish once, deploy everywhere

If you are architect, you are often asked to design an application serving ten thousand and more users and thousand and more locations. Many companies and government organizations have offices or agencies in every town. You don’t need expensive marketing studies to recognize the fact. As a consumer and citizen, wherever you live, you expect to find banks, travel agencies and government offices. However when the RFP falls on your desk, you know that you will suffer. Ok, it doesn’t have to be a garden of roses, but things should be simpler today.

Basically you have to choose between Charybdis, the whirlpool, aka the fat client with a client/server protocol and Scylla, the reef, aka the Web server farm. In any case, you suffer wreck. The customer maintains a comprehensive list of all client/server projects that failed in the last couple of years and asks questions about deployment. He knows less Scylla but he learns quickly about bandwidth requirements and outages of redundant, fault-tolerant server farms. The fact is that the customer is right in both cases.

The technology made enough progress during the last couple of years to consider another approach, PageBox. To explain it, I introduce first the technology that it uses.

Java

Java is well known for its portability and comprehensive network support. It is the language that allows downloading a piece of code from Internet and running it safely in a browser. It provides:

1. Class loaders able to download code from sources such as HTTP servers

2. Java 2 security that allows running code with rights depending on its signature, on how much the host trusts its certificate.

J2EE application servers

A J2EE application server is a set of containers that provides an environment for Web and EJB archives. Web archives and EJB archives are deployment units. The most important for our need is the Web archive. It gathers servlets, JavaServer Pages, taglibs, beans, java classes, libraries and resources.

As the interface between containers and deployment archives is well specified, it is relatively easy to insert a package that the Application Server sees as a Web Archive and that the Web archive sees as a container.

Publishing framework

Web archives have a drawback for a large-scale deployment: they include procedural (Java) code. On one hand it is good as it allows applying Object Oriented Programming concepts and Design Patterns to presentation but on the other hand it is tough and expensive to get the same level of robustness as with for instance XSL templates.

Therefore it is useful to consider alternatives such as Publishing frameworks. An example of Publishing framework is the Apache project’s Cocoon.

When a user asks for a page, it starts a pipeline whose the first step is a producer. The producer produces an XML document. The next step is a processor.

A processor is something that handles an input XML document and generates an output XML document. What I call a network processor is a generic processor that parses the input document, uses that information to call a network service (RMI, CORBA, EJB, SOAP, XML-RPC) and uses the response to generate the output message. A special case of network processor is a Web service processor. In that case, the processor is generated from the WSDL definition of a Web service.

The next step is a standard XSL processor. It transforms the document that it receives from the network processor using an XSL style sheet. The last step is a formatter that converts the document that it receives from the XSL processor into a stream returned to the user.

A publishing framework runs in an Application server and the Application server sees the publishing framework as a Web archive. XML and XSL files are regular resources and network processors are regular java classes. We have a layered architecture like this:

The figures above are a bit contrived. It makes sense to install EJBs on the same Application server as the presentation because EJB are not only a remote invocation protocol but also a model for object management. It is not the case for SOAP or XML-RPC, which are designed for invocations across Internet.

Presentation and data access on different machines

Therefore the diagram below makes more sense:

It addresses well our initial problem. The Web archive or presentation doesn’t access central databases. We can install as many of them as we want and as close as we need to the users. We can have roughly the same bandwidth requirement as client/server applications because we use client/server protocols and we no longer need a big central server farm. However the customer objects: “With that solution, we have extra servers to administrate with FTP as main tool.” And it is still right; we need a better solution.

PageBox

We can insert a layer between the Application server and the Web archive or Publishing framework:

This PageBox layer acts as an Application server for the Web archive and the Publishing framework and acts as a Web archive for the Application server. The PageBox layer takes care about deployment and security with a simple principle: download the Presentation or the Web archive just like a browser downloads an applet and run it in a sandbox.

The whole picture is no more complex: Support libraries exist to avoid downloading libraries shared by different presentations or web archives. An issue remains however: we have to configure the PageBox to let it know where it can download a presentation.

Publish once, deploy everywhere

We have a population of PageBoxes, which need to know where to download presentations and a population of Presentation providers who want to deploy their presentation. We can use a Publish and Subscribe protocol to automate the deployment:

1. The PageBox host subscribes to a repository.

2. The presentation author publishes (uploads) its presentation on the repository.

3. The repository notifies its subscribers that a new presentation is available using a deploy command.

4. The PageBox downloads the presentation from the repository.

As we use only HTTP, the repository itself is a Web Application.

The model has a couple of interesting properties:

a) The PageBox host can subscribe to many repositories

b) The presentation author can publish to many repositories

c) Everyone can create PageBoxes or repositories

d) PageBoxes can call Web services or cooperate using a P2P protocol:

Let’s look at some use cases.

Portal

The simplest case is the case where you provide a Portal.

You want to include a site that has a PageBox repository. It stored the repository URL in an RSS file. You use the repository URL to subscribe the repository and you get automatically the site presentations on your Application server. You serve your customers faster and the remote site spares machines because it no longer serves presentations.

Constellation

The customer still has to install and maintain a server per location and he wastes resources. An Application server installed on a single processor PC can typically serve twenty requests per second or eighty to two hundred concurrent users whereas application servers have only to serve ten users on average.

We cannot address these issues if we keep the Application servers in the company. We have to move them on ISP/ASP side. There are several benefits to do so:

1. Presentation application servers are better used. They can be shared between many customers.

2. ISPs and ASPs have bigger links than companies that cost them less. A reason is that they avoid paying for the last mille. They rent racks or are hosted in network nodes.

3. ASPs are specialists of hosting.

We can create constellations of PageBoxes. In a constellation, a set of ISPs and ASPs host PageBoxes and subscribe to Presentation providers’ repositories. PageBoxes access the Web services of Data providers and are accessed by browsers.

We show here two worldwide configurations, white and blue on the figure. When a user makes a request she or he is served by one among the closest PageBoxes.

We got five benefits:

1. Reduced latency. As the PageBox is close to the user, she or he gets the response faster. It is not negligible for professional applications: the latency across the Atlantic is about 130ms in the best case.

2. Lower server cost. Data providers no longer need to operate complex server farms

3. Fewer round trips. The PageBox maintains reference data and cache. It invokes the Data provider only for updates.

4. Cheap fault tolerance. A user can be served by many PageBoxes.

5. Bandwidth. Like with client/server applications, only the data go on the wire.

Conclusion

Note that PageBox is just the implementation for dynamic content of something that already exists for static content. Did you never download from an Akamai server?

In the long term PageBox model can have a significant impact. As it allows deploying Web Applications anywhere, Presentation can become an Internet commodity. If data owners provide Web services, anyone can compete to provide the best presentation and publish it. PageBox repositories are presentation yellow pages, just like UDDI directories are Web service yellow pages. You can find which presentations are available and who host them.

PageBox doesn’t only allow the development of sub-second response time Web applications, it also allow creating a more cooperative infrastructure and new kinds of applications.

Resources

We develop PageBox in Open Source (Gnu LGPL). You can download it on http://pagebox.net or http://pagebox.sourceforge.net/doc.html. We also provide a simplified installation procedure on http://pagebox.net/install.htm. The product is however still beta.

Author Biography

Alexis Grandemange is architect and developer for a Computerized Reservation System. A Java developer with 19 years of experience in computer field, Alexis previously worked on large Intranet solutions at BEA Systems. It can be contacted at alexis.grandemange@pagebox.net.