PageBox: servlet running in sandbox on J2EE PageBox

for
Presentation FAQ Reference Customisation Runtime Forms Demo Math Verification Downloads Troubleshooting

Cuckoo customisation guide

Conventions

Cuckoo is designed to work with a minimal configuration. As a consequence:

  1. You are asked for a target HTML file name, let say D:/mySite/myProduct/xxxx.html and for a Style directory, let say D:/myInstall

  2. If you don’t set the target HTML file name, let say D:/myDir/xxxx.doc, Cuckoo sets the HTML file name after the name of your Word file, here D:/myDir/xxxx.html.

  3. Cuckoo expects to find an XSL file named cuckoo.xsl in your target directory, here in D:/mySite/myProduct/cuckoo.xsl. If Cuckoo doesn’t find cuckoo.xsl there it copies cuckoo.xsl from the Style directory. You can change the name of the XSL file with the Settings macro.

  4. Cuckoo also expects to find a CSS file named cuckoo.css in your target directory, here in D:/mySite/myProduct/cuckoo.css. If Cuckoo doesn’t find cuckoo.css there it copies cuckoo.css from the Style directory. You can change the name of the CSS file with the Settings macro.

  5. Cuckoo also expects to find a Javascript file named cuckoo.js in your target directory, here in D:/mySite/myProduct/cuckoo.js. If Cuckoo doesn’t find cuckoo.js there it copies cuckoo.js from the Style directory. You can change the name of the JavaScript file with the Settings macro.

  6. Cuckoo generates a intermediate XML file named xxxxW.xml in your target directory, here in D:/mySite/myProduct/xxxxW.xml

  7. Cuckoo also generates images in the target directory named xxxxi.gif or xxxxi.jpg where i is the image number in the document.

XML

If you already use XML and XSL you can skip this section. If you don’t use it yet, we recommend you to start now.

XML

XML is just another markup language. What makes it really cool is the support of the industry that delivered excellent XML parsers such as Xerces or MSXML3 for free. However XML handling through DOM or SAX is still programming.

An XML document is mainly made of elements identified by starting and ending tags:

<tag1>element-content</tag1>

An element can have attributes:

<tag1 attribute1="xxxx">element-content</tag1>

You can also collapsed elements like this:

<tag2/>

XHTML

XML looks like HTML just because both XML and HTML are subset of an older language, SGML.

However:

  1. HTML has a more relaxed syntax. You don’t need to close elements such as <br> and <input>.

  2. HTML elements have a meaning, at least for browsers

XHTML is an XML dialect with basically the same tags and meaning as HTML. Because browsers are mistake-tolerant they can even accept well-formed XML documents. You can manipulate and transform XHTML just like regular HTML.

Cuckoo generates files that combine XML and XHTML.

To create a site file, you just do the same. Here is an example:

<header>

<h1>This is my site!</h1>

</header>

<map>

<a href="ratata.html">My cat, Ratata</a><br/>

<a href="hobby.html">My hobbies</a><br/>

<a href="friends.html">My friends</a><br/>

</map>

<footer>

Alexis

</footer>

XSL

eXtended Stylesheet Language is a way to manipulate XML with an XML flavor. More specifically Cuckoo uses XSL Transformations (XSLT), which is a language for transforming XML documents into other XML documents. XSLT is not hard to code if you already have some knowledge about grammars. We quote here the XSLT specification: "A transformation expressed in XSLT describes rules for transforming a source tree into a result tree. The transformation is achieved by associating patterns with templates. A pattern is matched against elements in the source tree. A template is instantiated to create part of the result tree."

To summarize in XSLT you write templates (<xsl:template>) that include <xsl:apply-templates> elements to invoke other templates.

Cuckoo model

Cuckoo generates an intermediate XML file containing three XHTML elements:

  1. <info> that contains the page title and meta tags

  2. <content> that contains the core of the Word conversion

  3. <toc> that contains a table of content

Construction of the document from a Site XML file and from the document produced by Cuckoo

Beside this content, the page should contain site parts, which are the same for all pages on your site, typically:

  • A header with images, banners, button rows...

  • A footer with a disclaimer, a last modified date...

  • Typically a site map or navigation bar, news...

The default cuckoo.xsl assumes that your site file contains three XHTML elements:

  1. <header> that contains the header

  2. <map> that contains a site map and similar content

  3. <footer> that contains the footer

cuckoo.xsl merges the content and site data and structures the document in tables and other elements.

Default xsl file

<?xml version="1.0" encoding="windows-1252"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:template match="cuckoo">

<html>

<head>

<xsl:copy>

<xsl:apply-templates select="info"/>

</xsl:copy>

<link rel="stylesheet" href="cuckoo.css" type="text/css"/>

<script src="cuckoo.js"></script></head><body>

<div id="tooltip" style="position:absolute;visibility:hidden;border:1px solid black;font-size:14px;layer-background-color:lightyellow;background-color:lightyellow;padding:1px">

</div>

<xsl:copy>

<xsl:apply-templates select="document('pagebox.xml')/site/header"/>

</xsl:copy>

<table><tr><td valign="top">

<xsl:copy>

<xsl:apply-templates select="content"/>

</xsl:copy>

<p align="center">

<xsl:copy>

<xsl:apply-templates select="document('pagebox.xml')/site/footer"/>

</xsl:copy>

</p>

</td><td valign="top" align="left" width="250">

<xsl:copy>

<xsl:apply-templates select="toc"/>

</xsl:copy>

<br/>

<xsl:copy>

<xsl:apply-templates select="document('pagebox.xml')/site/map"/>

</xsl:copy>

</td></tr>

</table></body></html>

</xsl:template>

...

</xsl:stylesheet>

You can modify this file.

The part starting at <link and ending at /div> requires special care:

  1. You should include a Cascading Style Sheet file but you can give it a different name

  2. You must include a JavaScript part at least for mouse over handling but you can give it a different name

  3. You must include the <div> element for mouse over handling

How to customize the layout

Let's assume that you want to display pages like this:

Alternate document format with three columns

You simply need to add a new column:

<?xml version="1.0" encoding="windows-1252"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:template match="cuckoo">

<html>

<head>

<xsl:copy>

<xsl:apply-templates select="info"/>

</xsl:copy>

<link rel="stylesheet" href="cuckoo.css" type="text/css"/>

<script src="cuckoo.js"></script></head><body>

<div id="tooltip" style="position:absolute;visibility:hidden;border:1px solid black;font-size:14px;layer-background-color:lightyellow;background-color:lightyellow;padding:1px">

</div>

<xsl:copy>

<xsl:apply-templates select="document('pagebox.xml')/site/header"/>

</xsl:copy>

<table><tr><td valign="top">

<xsl:copy>

<xsl:apply-templates select="document('pagebox.xml')/site/map"/>

</xsl:copy>

<br/>

<xsl:copy>

<xsl:apply-templates select="document('cuckoo-news.xml')/cuckoo/content"/>

</xsl:copy>

</td>< td valign="top">

<xsl:copy>

<xsl:apply-templates select="content"/>

</xsl:copy>

<p align="center">

<xsl:copy>

<xsl:apply-templates select="document('pagebox.xml')/site/footer"/>

</xsl:copy>

</p>

</td><td valign="top" align="left" width="250">

<xsl:copy>

<xsl:apply-templates select="toc"/>

</xsl:copy>

</td></tr>

</table></body></html>

</xsl:template>

...

</xsl:stylesheet>

We move the map (coming from pagebox.xml) and we add a news part (coming from cuckoo-news.xml) in the new column. cuckoo-news.xml has been authored in Word and saved in Xml with Cuckoo. We can retrieve its content with document('cuckoo-news.xml')/cuckoo/content. As you can see, we can build a HTML page from two or more Word documents. You can use this feature to reuse content.

How to generate server pages

Suppose now that you want to support dynamic update.

It implies that your page includes another page. You can use frames. It is often inconvenient for users:

They cannot bookmark the page

They often don’t know how to save your page

Beside frames you can use <iframe> with Internet Explorer, Netscape 6, Mozilla, Opera or <layer> with Netscape 4. You can use both in your page but it is almost impossible to get the same look and feel with all browsers. Try this to see if it fits your need:

<ilayer><layer src="cuckoo-news.html"></layer></ilayer>

<iframe src="cuckoo-news.html" align="bottom"></iframe>

Most of the time it is better to use server page technology. We will show you now how to do that with ASP, JSP and PHP.

Our recommendation is to keep a static version for WYSIWYG display when you author the document.

Once you are ready to publish, run a batch generation with an XSL file specific to the server technology that you use.

We present below three examples:

  • cuckoo-asp.xsl: ASP generation

  • cuckoo-jsp.xsl: JSP generation

  • cuckoo-php.xsl: PHP generation

You can choose a more radical option and generate your pages at run time.

Pros:

  • You don’t need to generate your pages in batch. Your administration is simplified.

  • You can deliver pages optimized for the user browser version and screen size

  • You can personalize your content

Cons:

  • Your server uses more resources, is slower and less scalable

  • You can be hosted for free for static content. You can have to pay your ASP for dynamic content hosting.

Batch generation

A batch generation script named cuckoo-gen.js is included in the deliveries.

It can also be used to regenerate a static site after a change of the site file:

Batch generation using a WSH script, cuckoo-gen.js

Usage

cuckoo-gen.js /dir:source-directory|/file:file [/toDir:target-directory|/toFile:target-file] [/xsl:xsl-file]

dir: directory of Xml source files (ending with W.xml)

file: Xml file name. It must end with W.xml when combined with toDir option.

xsl: XSLT file

toDir: directory of target HTML files

If a source file is named xxxxW.xml then the target HTML file is named xxxx.html.

toFile: target HTML file name.

Example:

cuckoo-gen.js /dir:D:\cuckoo /toDir:D:\cuckoo /xsl:D:\cuckoo\cuckoo.xsl

ASP generation

In cuckoo-asp.xsl we use ASP #include to include a news file in the HTML page:

<?xml version="1.0" encoding="windows-1252"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:template match="cuckoo">

<html>

<head>

<xsl:copy>

<xsl:apply-templates select="info"/>

</xsl:copy>

<link rel="stylesheet" href="cuckoo.css" type="text/css"/>

<script src="cuckoo.js"></script></head><body>

<div id="tooltip" style="position:absolute;visibility:hidden;border:1px solid black;font-size:14px;layer-background-color:lightyellow;background-color:lightyellow;padding:1px">

</div>

<xsl:copy>

<xsl:apply-templates select="document('pagebox.xml')/site/header"/>

</xsl:copy>

<table><tr><td valign="top">

<xsl:copy>

<xsl:apply-templates select="content"/>

</xsl:copy>

<p align="center">

<xsl:copy>

<xsl:apply-templates select="document('pagebox.xml')/site/footer"/>

</xsl:copy>

</p>

</td><td valign="top" align="left" width="250">

<div style="background-color:#99ff99">

<xsl:comment>#include file="cuckoo-news.html"</xsl:comment>

</div>

<p> </p>

<xsl:copy>

<xsl:apply-templates select="toc"/>

</xsl:copy>

<br/>

<xsl:copy>

<xsl:apply-templates select="document('pagebox.xml')/site/map"/>

</xsl:copy>

<table><tr><td><img src="cuckoo.gif" alt="Cuckoo mark"/></td><td width="5"> </td>

<td width="120" style="font-family:Verdana;color:#ff8080;background-color:#99ff99;font-weight:bold;margin-top:5px;margin-botton:5px">

Cuckoo generated</td></tr></table>

</td></tr>

</table></body></html>

</xsl:template>

We generated an asp file with this command:

E:\cuckoo\cuckoo-gen.js /file:E:\cuckoo\cuckoo-customW.xml /toFile:E:\cuckoo\cuckoo-custom.asp /xsl:E:\cuckoo\cuckoo-asp.xsl

As the ASP doesn’t process the included file, it has to be in HTML format.

We include in the deliveries a cuckoo-min.xsl file to convert from the XML format to a minimal HTML translation where only <content> element is used.

JSP

In cuckoo-jsp.xsl we use JSP <jsp:include> to include a news file in the HTML page:

<?xml version="1.0" encoding="windows-1252"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"

xmlns:jsp="http://java.sun.com/products/jsp/dtd/jsp_1_0.dtd">

<xsl:template match="cuckoo">

<html>

<head>

<xsl:copy>

<xsl:apply-templates select="info"/>

</xsl:copy>

<link rel="stylesheet" href="cuckoo.css" type="text/css"/>

<script src="cuckoo.js"></script></head><body>

<div id="tooltip" style="position:absolute;visibility:hidden;border:1px solid black;font-size:14px;layer-background-color:lightyellow;background-color:lightyellow;padding:1px">

</div>

<xsl:copy>

<xsl:apply-templates select="document('pagebox.xml')/site/header"/>

</xsl:copy>

<table><tr><td valign="top">

<xsl:copy>

<xsl:apply-templates select="content"/>

</xsl:copy>

<p align="center">

<xsl:copy>

<xsl:apply-templates select="document('pagebox.xml')/site/footer"/>

</xsl:copy>

</p>

</td><td valign="top" align="left" width="250">

<div style="background-color:#99ff99">

<xsl:element name="jsp:include">

<xsl:attribute name="page">cuckoo-news.html</xsl:attribute>

<xsl:attribute name="flush">true</xsl:attribute>

</xsl:element>

</div>

<p> </p>

<xsl:copy>

<xsl:apply-templates select="toc"/>

</xsl:copy>

<br/>

<xsl:copy>

<xsl:apply-templates select="document('pagebox.xml')/site/map"/>

</xsl:copy>

<table><tr><td><img src="cuckoo.gif" alt="Cuckoo mark"/></td><td width="5"> </td>

<td width="120" style="font-family:Verdana;color:#ff8080;background-color:#99ff99;font-weight:bold;margin-top:5px;margin-botton:5px">

Cuckoo generated</td></tr></table>

</td></tr>

</table></body></html>

</xsl:template>

We generated a jsp file with this command:

E:\cuckoo\cuckoo-gen.js /file:E:\cuckoo\cuckoo-customW.xml /toFile:E:\cuckoo\cuckoo-custom.jsp /xsl:E:\cuckoo\cuckoo-jsp.xsl.

It is slightly more complex than in ASP, just because JSP directive is itself XML. If you simply write <jsp:include page="cuckoo-news.html" flush="true"/>, XSLT thinks that it has something to do with a JSP namespace. To create XML we need to use <xsl:element> directive and to include the JSP namespace.

Note:

We made tests with Tomcat 3.2.2. Depending on your Application Server, minor changes can be needed.

PHP

In cuckoo-php.xsl we use php <?php include(...); ?> to include a news file in the HTML page:

<?xml version="1.0" encoding="windows-1252"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:template match="cuckoo">

<html>

<head>

<xsl:copy>

<xsl:apply-templates select="info"/>

</xsl:copy>

<link rel="stylesheet" href="cuckoo.css" type="text/css"/>

<script src="cuckoo.js"></script></head><body>

<div id="tooltip" style="position:absolute;visibility:hidden;border:1px solid black;font-size:14px;layer-background-color:lightyellow;background-color:lightyellow;padding:1px">

</div>

<xsl:copy>

<xsl:apply-templates select="document('pagebox.xml')/site/header"/>

</xsl:copy>

<table><tr><td valign="top">

<xsl:copy>

<xsl:apply-templates select="content"/>

</xsl:copy>

<p align="center">

<xsl:copy>

<xsl:apply-templates select="document('pagebox.xml')/site/footer"/>

</xsl:copy>

</p>

</td><td valign="top" align="left" width="250">

<div style="background-color:#99ff99">

<xsl:processing-instruction name="php">

include("cuckoo-news.html"); ?</xsl:processing-instruction>

</div>

<p> </p>

<xsl:copy>

<xsl:apply-templates select="toc"/>

</xsl:copy>

<br/>

<xsl:copy>

<xsl:apply-templates select="document('pagebox.xml')/site/map"/>

</xsl:copy>

<table><tr><td><img src="cuckoo.gif" alt="Cuckoo mark"/></td><td width="5"> </td>

<td width="120" style="font-family:Verdana;color:#ff8080;background-color:#99ff99;font-weight:bold;margin-top:5px;margin-botton:5px">

Cuckoo generated</td></tr></table>

</td></tr>

</table></body></html>

</xsl:template>

We generated a php file with this command:

E:\cuckoo\cuckoo-gen.js /file:E:\cuckoo\cuckoo-customW.xml /toFile:E:\cuckoo\cuckoo-custom.php /xsl:E:\cuckoo\cuckoo-php.xsl

If you simply include <?php ...?>, the XSLT processor identifies an XML processing instruction and try to run it. To create a <?php ... ?> you need to use <xsl:processing-instruction>.

Note:

MSXML3 forgets the trailing question mark, so we add it to the instruction.

CSS

Cascading Style Sheets (CSS) is a simple mechanism for adding style (e.g. fonts, colors, spacing) to Web documents. A CSS file is a set of styles element that define how to display HTML elements.

A style element can apply to an HTML tag, for instance:

UL {

padding-top: 1px;

padding-bottom: 1px;

margin-top: 1px;

margin-bottom: 1px;

}

applies to <ul> elements.

A style element can also apply to a user-defined class, for instance:

.title {

font-size: 20pt;

font-family: Arial, Helvetica;

color: #336699;

}

applies to all elements with a class="title" attribute: <font class="title">...</font>.

cuckoo.css includes both kinds of style elements:

Style element

Use

Creation

Purpose

H1

All <h1> elements.

H2

All <h2> elements.

H3

All <h3> elements.

OL

All <ol> elements.

P

All <p> elements.

OL

All <ol> elements.

UL

All <ul> elements.

.cuckoo-table

<table class="cuckoo-table">

WordReactor VBA class

Word tables

.cuckoo-td

<td class="cuckoo-td">

WordReactor VBA class

Word tables

.mouse-over

<a class="mouse-over">

XmlProcessor VBA class

Mouse over placeholders

.toc-table

<table class="toc-table">

XmlProcessor VBA class

Table of content

.toc-title

<th class="toc-title">

XmlProcessor VBA class

Table of content

.toc-h1 to .toc-h6

<a href="..." class="toc-h1">

XmlProcessor VBA class

Table of content

.map-table

<table class="map-table">

cuckoo.xsl

Site map

.map-title

<th class="map-title">

cuckoo.xsl

Site map

.ttip

cuckoo.js

Mouse over text with Netscape

.test-style

<p class="test-style"> or

<font class="test-style">

XmlProcessor VBA class

Example

.test-style2

<p class="test-style2"> or <font class="test-style3">

XmlProcessor VBA class

Example

.test-style3

<p class="test-style3"> or <font class="test-style3">

XmlProcessor VBA class

Example

We defined three styles, test-style, test-style2 and test-style3 in cuckoo.dot Word template. Therefore cuckoo created corresponding class elements and we should create test-style, test-style2 and test-style3 in cuckoo.css.

In respect of look and feel you can change whatever you want. However we recommend creating another CSS: if you install a newer version of cuckoo you won’t need to update the new cuckoo.css. You just need to update your cuckoo.xsl:

<link rel="stylesheet" href="cuckoo.css" type="text/css"/>

<link rel="stylesheet" href="my.css" type="text/css"/>

For more information about CSS you can visit http://www.blooberry.com/indexdot/css/index.html.

Implementation

Cuckoo object model

Cuckoo uses two main classes:

  • WordReactor responsible to parse the Word document

  • XmlProcessor responsible to generate the xxxxW.xml file, to call the XSLT processor and to invoke the default browser

Therefore if you want to support another feature of Word, you must primarily update WordReactor whereas if you want to generate different XML/XHTML data you must primarily update XmlProcessor.

Let's see the different classes in more details.

Cuckoo has four classes, WordReactor, ParWrapper, TagStack and XmlProcessor

WordReactor

A Word document is made of Paragraphs that can contain Tables made of Cells containing other Paragraphs.

A Paragraph can contain text, Comments, images and hyperlinks.

The main operation of WordReactor is named process. It browses the Word document and invokes processParagraph for every Paragraph that it finds. When the Paragraph contains a Table, processParagraph invokes processTable. Eventually a processParagraph2 is invoked either because either for a root Paragraph or for a Table Cell Paragraph.

processParagraph2 invokes checkForm to handle Form styles, processImage to handle images, processComment to handle Comments. It also invokes the processParagraph2 of XmlProcessor to write the parsed data.

ParWrapper

ParWrapper is a convenient way to put together all parameters passed by WordReactor to XmlProcessor.

XmlProcessor

XmlProcessor has three main methods:

  1. init that checks the environment, opens the site file (sname) and the target XML file (fnameW.xml)

  2. processParagraph2 invoked by WordReactor

  3. The class destructor (Class_Terminate in VBA) that invokes the XSLT transformation and build the target HTML file (fname)

TagStack

The processParagraph2 method of XmlProcessor has to process bullets and numbers. Their implementation in HTML uses <ol> and <ul> and requires the use of a stack. push adds a new <ol> or <ul> on the stack and pop removes it.

Sequence diagram

Sequence diagram: toHTML macro creates a FileSelection form. FileSelection initializes a XmlProcessor and ask WordReactor to process the document.

When you click on the toHTML button, the toHTML macro creates a FileSelection form where you are prompted for the target name of your HTML file and for your style directory. FileSelection creates and initialize a XmlProcessor and a WordReactor. Then FileSelection calls the process method of WordReactor with the active Word document as parameter.

WordReactor invokes the processParagraph2 method of XmlProcessor for each content that it has to write.

Once process has completed its task, FileSelection terminates and XmlProcessor invokes the XSL processor and the default browser.

Character processing

Word characters must be translated in a form that can be processed by XSL translators and by browsers.

We need to take a decision almost case by case for ISO-8859-1, Unicode and especially for mathematical characters.

For performance reason, we chose to implement this function in an ATL COM component, MathGreek.

The implementation of MathGreek is named CMathGreek. Its constructor builds two tables:

  • mgCodes, an array of mathematical and Greek symbols that require a special processing

  • normCodes, an array of symbols requiring a special processing (ISO-8859-1)

MathGreek has two methods:

  • convert, that processes mathematical and Greek symbols and uses mgCodes

  • normalize, that processes symbols and uses normCodes

Glossary

Name

Meaning

XML

eXtended Markup Language.

XHTML

Well-formed HTML. Tags must be closed, for instance <br/>. You cannot write <p><font>something</p></font> but: <p><font>something</font></p>.

As XHTML is well formed XML processors and XSL translators can process it.

ASP

Active Server Page. Microsoft technology running on Internet Information Server (IIS).

JSP

Java Server Page. Java technology running on Java Application Servers such as Tomcat and Resin.

PHP

Hypertext Preprocessor. Server side HTML-embedded scripting language. Just like Cuckoo needs Word, PHP needs a Web Server (IIS, Apache, iPlanet...)

Meta tag

Field used to categorize a Page. Used by search engines.

Title

Field only displayed on Page properties and used by search engines

XSLT

XSL transformation is a XML language allowing transforming an XML document in another document, typically XML or XHTML.

CSS

Cascading Style Sheet. Allow defining how a document (HTML or XML) should be displayed.

Contact:support@pagebox.net
2001-2004 Alexis Grandemange. Last modified .