|
|
|
|
|
|
|
Improved distribution with relays
Table of content | |
|
The archive deployment is functionally a form of multicast: the PageBox Repository deploys the archive to a group made of all PageBoxes that have subscribed to this Repository or for this archive on the Repository. Multicast is supported with IP but:
Multicast groups cannot be easily set up on Internet.
It is also difficult to set up a multicast group on two or more sub-network.
Firewalls block multicast traffic.
Multicast only supports datagram protocols. The most common datagram protocol, UDP limits the message size typically to 64 KB.
Unicast raises another issue. It can’t handle acknowledgement. When it deploys an archive a Repository must receive a message from each PageBox telling it if the deployment was successful and if not why. Whereas the deployment is a one-to-many request the deployment acknowledgement is a many-to-one traffic.
Therefore to shorten the deployment time in the PageBox for Java version we used a relay mechanism described first in the Grid V2 document. The PageBox for Java implementation is described in the deployment with Relays document. This relay deployment is effective but it can be further improved if we select the PageBoxes with the fastest connections to be the Relay PageBoxes.
A PageBox constellation is made of one Repository and n PageBoxes.
The deployment problem consists in deploying an archive from the Repository to each PageBox. The archive is big (100KBs to several MBs.)
With relayed deployment the Repository selects a Relay PageBox and sends a first deployment request to this PageBox. The Repository piggybacks half its deployment requests on this request.
In the next step the Repository chooses a second Relay PageBox and sends a second deployment request to this PageBox. The Repository piggybacks half its remaining deployment requests on this request. In the same step, the first Relay PageBox:
Installs the archive on its site
Selects a Relay PageBox in the piggybacked list and sends another deployment request to this PageBox. The Relay PageBox piggybacks half the deployment requests in the piggybacked list on this request.
And so on. At the last step n / 2 Relay PageBoxes issue deployment requests in parallel.
When the Repository and the PageBoxes have the same kind of IP link (bandwidth and latency) and when links are independent (when two or PageBoxes don’t share the same link) this model achieve two objectives:
Fast deployment
No traffic burst because at a given time there is at most one transfer on one link
Note on link independence:
On Internet the link between two PageBoxes is made of a large number of links. This link list is not static and we can practically ignore shared links except on the end points.
When the Repository and the PageBoxes don’t have the same kind of link the solution is no more optimal. In this paper we explore solutions addressing this problem.
PageBox subscription is dynamic. Therefore a first solution is to ask the subscriber to grade the subscribed PageBox: For instance if the PageBox has a DSL link, the subscriber will give a grade of one. It the PageBox has an OC3 link the subscriber will give a grade of four. The Repository and the Relay PageBoxes will first deploy on the highest grade PageBoxes.
Strong points:
Effective and fast
Allows scheduling multiple deployments in parallel on fastest links to shorten the deployment time
Best suited for:
Intranet constellations. In these constellations it is easy to grade PageBoxes because there are only two or three sorts of links. The network is simple and its behavior is easy to predict.
One of the goals of the PageBox project is to enable a pervasive architecture where millions of PageBoxes are subscribed to hundreds of Repositories. There is no central point of administration. Many kinds of links coexist. Some PageBoxes and Repositories are linked to Internet using dialup and DSL connections.
A PageBox or Repository requires a permanent connection (like DSL connections) but doesn’t require a permanent IP address. PageBox 0.0.8 and above even include a dynamic DNS update function.
A deployment involves:
One Repository
The PageBoxes subscribed to that Repository
We can represent the Repository and the subscribed PageBox set with an undirected and complete graph G where the Repository and the subscribed PageBoxes are vertices. For background information about graphs you can read our graph introduction. At the first deployment we randomly explore this graph from the Repository. We can record deployment times and build a second graph Gd, which is a weighted dag:
Directed: from the Repository to a set of PageBoxes, from Relay PageBoxes to other PageBoxes
Acyclic: the Repository and the Relay PageBoxes can be the origin of more than one edge but a PageBox is called only once.
Weighted: The edge weight is the bandwidth from the origin PageBox/Repository to the destination PageBox.
We compute the bandwidth between an origin x and a destination y with the formula:
Bandwidth(x,y) = archive_size / deployment_time
Our knowledge of G is minimum for two reasons:
Exploring the graph more than needed would increase the total deployment time
The deployment time between a PageBox/Repository and a PageBox is a transient and inaccurate data. The link can be a backup link and a file transfer can use most of the available bandwidth.
The problem consists in analyzing Gd to create a Gd’ graph used to make a second deployment. For the third deployment we analyze Gd’ to build a Gd” graph and so on.
The problem is further complicated by the dynamic nature of constellations: PageBoxes subscribed for the first deployment can have been unsubscribed before the second deployment and new PageBoxes can have been subscribed between the first and the second deployment.
We ignore this issue for clarity in the simplified algorithm.
The deployment time from one Repository or PageBox x to a PageBox y
d(x,y) = latency(x,y) + archive_size / bandwidth(x,y)
The size of an archive is most of the time > 100 KB. Therefore we probably can ignore the latency. The bandwidth is the bandwidth of the slowest link, most probably the bandwidth of the connection link of x or x.
A Repository or a Relay is the origin of many deployments. Therefore we can compare the deployment times from this origin. If the deployment times are the same, the origin can be the slowest link. If the deployment times are different we can assume that the destination with the shortest deployment time has the fastest connection and the destination with the longest deployment time has the slowest connection. We can use this information to create Gd’. In this graph we put the PageBoxes with the fastest connection closer to the Repository (we promote them.) These PageBoxes will act as relays and have a bigger contribution to the deployment.
We scan the Gd graph. For each vertex we compare the weight of the outbound edges. If the weights are of the same order of magnitude and high the destinations have fast connections and we can promote all of them. If the weights are of the same order of magnitude and low the origin might have a slow connection, we don’t know. If the weights are different we promote the vertices with the highest weight.
We represent the Gd graph with an adjacency list.
The Gd graph is created at the first deployment.
Once the deployment is completed we scan the adjacency list and we update each subscribed PageBox object with a rank:
Bandwidth if the Pagebox get promoted
0 if we don’t know if the connection is fast or slow
-1 if the connection is slow
Then we serialize the updated subscribed PageBox objects into subscribers.xml.
At the next deployment, we sort the subscribed PageBox objects by bandwidth and we first deploy on the PageBoxes whose connection is certainly fast and next we exercise the PageBoxes whose connection speed is unknown. Because these PageBoxes receive then deployment requests from PageBoxes with fast connections we can determine if they have fast or slow connections.
At each deployment the deployment graph is further improved up to the point where either:
All PageBoxes have a known rank (bandwidth or -1) or
All Relay Pageboxes have a fast connection (bandwidth rank)
The rank of a PageBox moves from unknown to known but not from known to unknown: after the first deployment unknown means that the deployment origin has a slow connection and the PageBox bandwidth is very likely to be still as fast as at the previous deployment.
When a PageBox is unsubscribed the corresponding subscribed PageBox object is removed. When a PageBox is subscribed the subscribed PageBox object is created with a 0 rank.
Java PHP .NET
Reservation Controls Java controls
Polaris Grid Coordinator Grid V2
Distribution Installation NWS
Contact:support@pagebox.net
©2001-2004 Alexis Grandemange
Last modified