PageBox for Java deployment with relays

PageBox

PageBox for Java

API

Demo

Background

PageBox for Java deployment with relays

Foreword

We presented the deployment with relays first in the Grid API V2 document. The implementation described here uses the SOAP protocol and handles aspects such as security and error management.

Objective

This document presents the deployment with relays implemented in PageBox for Java.

Application

The deployment with relays aims to reduce the deployment duration on large PageBox constellations. This deployment model can also be used with other point-to-point protocols (TCP for instance.)

The deployment with relays is useful:

To send the same data set to a large set of targets (Single Data Multiple Targets, broadcast)
To send different data sets to a large set of targets (Multiple Data Multiple Targets)

Using a point-to-point protocol.

We only needed to implement the first case in PageBox.

Credits

Though our design and goal are different we were deeply influenced by the Message Passing Interface (MPI) collective operations and MPI implementations like MPICH that we present on our Grid page.

Principle

The deployment with relays involves three entities:

The Repository
Relay PageBoxes
Non-relay PageBoxes

When it deploys an archive the Repository piggybacks a list of PageBoxes where to deploy the archive.

The Relay PageBox deploys the archive on these PageBoxes and then notifies the deployment status to the Repository.

Repository

The relay deployment is a recursive function.

At each call the relay function processes an array of PageBoxes. The array contains initially all PageBoxes to deploy.

The relay function randomly selects half the PageBoxes to be piggybacked on its Deployment request.
The relay function picks the first PageBox not defined in the piggybacked list to be the Relay PageBox. The relay function deploys the archive on this PageBox with the piggybacked list.
Eventually the relay function calls itself with an array containing the PageBoxes that weren’t deployed or piggybacked.

Relay PageBox

A PageBox becomes a relay PageBox when it receives a deployment request with a piggybacked list of PageBoxes.

The relay Pagebox handles the deployment request normally and:

Inflate the archive
Install the archive
Deploys the archive on the Application server
Returns the deployment status to the caller, which can be a Repository or another relay PageBox

The relay PageBox also forwards the PageBox piggybacked list to a Relay thread:

The relay thread selects the first half of the PageBoxes to be piggybacked on its Deployment request.
The relay thread picks the next PageBox in the piggybacked list to be the Relay PageBox. The relay thread deploys the archive on this PageBox with the new piggybacked list.
The relay thread repeats steps 1 to 3 with an array containing the PageBoxes that weren’t deployed or piggybacked.
Once the relay thread has no more PageBoxes to deploy it notifies the deployment status to the Repository.

Error handling

Errors that require a special handling are errors occurring in relay PageBoxes.

Deployment error

A deployment can fail for two reasons:

An error occurred on target side for instance for security reason. If the target is a relay PageBox the error doesn’t prevent a deployment on piggybacked PageBoxes.
An error at the invocation of the target Deploy service. The target is not able to handle a deployment request.

In the latter case, the relay retries the deployment with the next PageBox in the array and so on. If the deployment fails at invocation for all PageBoxes in the array the relay tries to deploy on the piggybacked list PageBoxes.

Notification error

The Repository sets the PageBoxes in piggybacked lists to a "maybe" status.

When it receives notifications from relay PageBoxes the Repository updates the status of these PageBoxes to their status in the notification message.

A relay PageBox can fail between the handling of the deployment request and the Repository notification.

After a parameterized delay the Repository retries the deployment on the PageBoxes in "maybe" status.

Example

In this example the Repository must deploy an archive on seven PageBoxes.

At time t the Repository sends a Deployment request to Relay Pagebox1 with three piggybacked PageBox URLs (PageBox3, PageBox4 and PageBox6.)

At time t + 1

The Repository sends a deployment request to Relay PageBox2 with one piggybacked PageBox URL (PageBox5)
Relay PageBox1 sends a deployment request to PageBox3 with one piggybacked PageBox URL (PageBox4)

The two deployments occur in parallel using independent machines and links.

At time t + 2,

The Repository sends a deployment request to PageBox7
Relay PageBox1 sends a deployment request to PageBox6
Relay PageBox2 sends a deployment request to PageBox5
Relay PageBox3 sends a deployment request to PageBox4

The four deployments occur in parallel using independent machines and links.

Comments

The deployment with relays is a holistic approach: we try to minimize the deployment time with the simplest algorithm and a simple query/reply protocol.

It is often possible to reduce further the deployment time by issuing many requests in parallel from the same machine. We didn’t implement parallel requests in PageBox because sequential requests are not a real drawback for a deployment facility. PageBox must be fast enough but not overuse server resources and sending requests one after the other is an effective throttling.

The deployment with relays is not very bandwidth sensitive. However the total deployment duration can be reduced if the Repository and the first Relay PageBoxes have fast links. To identify fast links there are two solutions:

Configuration
Bandwidth measurements and prediction

Because the Repository randomly selects the Relay PageBoxes, resource usage should be evenly balanced between PageBoxes, which is a requirement when PageBoxes are operated by different organizations.