Java PageBox 
 New PageBox 
 Other products 

Patent search - search

Patent organization

Patent lifecycle

The process for getting a patent is:

  1. The inventors file a patent application at a national patent office.

  2. The patent office makes a novelty search.

  3. An Examiner examines the patentability of the invention in light of the report of the novelty search. An invention is not novel if all the elements that are mentioned in a claim are also found in a single document that is part of the prior art. Any document from any source can in fact be used as prior art. It does not matter in which language the document is written, in how many copies it was made available, or whether any copies were in fact bought or read by third parties. The key point is that the document had to be published before the date of filing. As a consequence a web page can only considered as prior art if it can be proven that this web page was publicly accessible the day before the filing date and with its content the day before the filing date.

  4. The inventors and their attorney can amend the application if it was found obvious or not novel.

  5. The application is published 18 months after it has been filed. It is still an application and can still be rejected. Very often the inventors add a statement "patent pending" on their site when the application is not yet published to deter competitors to develop similar products.

  6. The examiner issues a notice of allowance and the patent is granted to inventors.

The granted patent protects the invention only in the application country. As we seen in the example above the patent can still be invalidated by a court as it was the case for the first PanIP patent. In USA it can also be re-examined by the patent office as it is currently the case for the Eolas patent and for other PanIP patents. For more information on the examination and re-examination you can read the re-examination page.

Patent cooperation treaty

To get patent protection in other countries the inventor file one single patent application to a World Intellectual Property Organization (WIPO) office in which it designates the countries in which he wants to have patent protection. Thank to the Patent Cooperation Treaty (PCT) the patent application will be applied in parallel in these countries with the filing date of the original application country.

Somewhat the same process as above is applied again:

  1. An international bureau makes a search for prior art

  2. National patent office, which can also make searches for prior art, examine the patent

This promotion process implies that governments agree on a common understanding of patents. This is defined by an agreement signed at the end of the Uruguay round, the Agreement on Trade-Related aspects of Intellectual Property rights (TRIPs). This document leaves some space for interpretation. You can look at a Review of the implementing legislation for government answers to questions asked by other governments. The fact is that software patentability is one of the most debated questions. You can download answers in a WordPerfect format that you can read with Word.

Patent offices




The home page is You can find the search engines of Intellectual Property Australia at The easiest to use search engine is

Canada contains interesting documents. For searches use

Europe and are two URLs of interest. The country sites are on For searches use The EPO Online file inspection is the equivalent of the USPTO’s PAIR.


For searches use

New Zealand

The Intellectual Property site is


The Intellectual Property site of Singapore (IPOS) is For searches use and The former URL is the URL of the search engine of the Singapore Intellectual Property database. The latter URL is the URL of a portal allowing searching the Japanese and Taiwanese Intellectual Property databases too.


The Intellectual Property site is For searches use To inspect the correspondence between the applicant and the USPTO use


The URL is For searches use

The European Patent Office (EPO) is a prerequisite to European Union adhesion. However the EPO does not depend on European institutions (Commission and Parliament). The EPO site holds data on patents from 71 countries (45 millions at the beginning of 2004) and applications published by WIPO. Since 2003 their Online file inspection allows reading and downloading the correspondence between the between applicants and examiners thank to an Article 128 of the European Patent Convention, according to which the public is entitled to inspect the complete contents of the files relating to all European patent applications after they have been published. Since September 2004 the USPTO’s Patent Application Information Retrieval (PAIR) implements the same function. The USPTO hold data on US patents since 1790 and US applications published since 2001.

You may look at for a more comprehensive list of patent databases and at for a fairly comprehensive list of patent offices.

Patent search

Patent search is almost like Web search. The differences are:

  1. All patents and only patents are available. In that respect patent search engines compare more to search engines like htdig setup to index a given site than to public search engines like Google, which only indexes the most popular pages regardless of their nature. The most competitive words of a patent search engine are not the same as the most competitive words of a public search engine. If you look for seat and computer with a public search engine you will get inconsistent results whereas if you make the same search on a Patent search engine you will mainly find transportation industry patents. On the other hand acronyms, which are effective with Web search engine thank to the popularity rule do not work with patent search engines

  2. A patent is a structured content, therefore you can search for instance on claims or abstract

  3. Patents are classified and you search for patents in a certain class.

  4. A public search engine like Google indexes 3 billions patents. There are millions and not billions patents.

In this section we will take USPTO as an example.

USPTO search

The patent search page is Select Advanced search in Published Applications. You should get this:

Let’s assume we want to find patent applications related to PageBox. PageBox is a computer program that allows deploying applications to remote locations. The best keywords are computer, deployment, deploy and remote. If we were using a Web search engine, we would rather use Web service, deployment, J2EE, Java.

Then we will restrict our research to the text of the claims. The query will be:

aclm/computer and aclm/remote and aclm/deploy$

We get a list of 46 patent applications on Nov 26, 2002. 20020129129 (System and method for deploying and implementing software applications over a distributed network) is pretty close in some respect to PageBox. 20020129129 has been classified as 709/220, 709/218 by USPTO and G06F 015/16; G06F 015/177. We can think that if PageBox was patented it would also be in this classes:

709 stands for "MULTIPLE COMPUTER OR PROCESS COORDINATING", 220 for "Network computer configuring" and 218 for "Using interconnected networks." G06F 015/16 stands for " Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes" and G06F 015/177 for "Initialisation or configuration control."

Computer often but not necessarily exists in the claims of a PageBox related patent but we found that such a patent should be in 709/220. We improve our result with:

aclm/remote and aclm/deploy$ and ccl/709/220

When a competitor claims to have a patent pending we use that sort of fast search. To identify patents that a project or product we use different method that consists:

  1. In writing a reference text or at least a definition

  2. Analyzing the reference text to extract the keywords

Our advice is to forget common sense. A patent is a text made of sentences. The most effective solution is therefore to find texts that look like a reference text. It is basically the same concept but not the same process as the NSA patent 5,937,422, "Automatically generating a topic description for text and searching and sorting text by topic using the same."


In the example above we used "Advanced search in Published Applications" that allows displaying patent applications, which are no longer secret and not yet granted. On you can also select "Advanced search in Issued Patents" to display granted patents.

Which company filed the application?

In case of 20020129129 and 5,937,422 the assignee name is the name of the company. However, quite often, the assignee name is the name of the patent attorney.

If you suspect that the assignee name is not the company, look:

  1. for the patent in the EPO site, and if you don’t find the application,

  2. at the inventor names,

  3. at the specification examples and drawings,

  4. at the inventor addresses.


To look for US patents on the EPO site this is simple: add a us prefix and remove the commas. To look for 5,937,422 enter us5937422. For applications you may need to make two trials. 20020129129 is actually 2002/0129129. So you must try with and without the zero after the slash, so us20020129129 or us2002129129 (in this case us2002129129 is correct.)

Inventor name

If the "inventor" is an Intellectual Property specialist you are out of luck.

Our experience is that there are three sorts of inventors in case of company application:

  1. Inventors who designed the patented invention

  2. People who contributed to the invention for instance by checking the timecards and the expense reports of the first category

  3. VP, C*O who motivated the first category. They are quoted in press releases.

Our experience is that there are three types of inventors who design inventions:

  1. The living treasure. The living treasure has a big office. He represents his company in technical conferences.

  2. The maverick. The maverick does not like his job. He has a web site with his resume.

  3. The modest person. Because modest persons do not even know that they invent, we collectively call them dark matter: we postulate that they exist because something was invented.

This is easy to identify VP, C*O, living treasures and mavericks with Web search engines. Except for living treasures who like stability you need to find out where they were working at the date the application was filed. The living treasure or the maverick are often at the beginning of the inventor list and the VP/C*O are often at the end of the inventor list.

Examples and drawings

If the examples chosen in the detailed description of the preferred embodiment always relate to the same company there is a good chance that this company is the applicant. Look also at the figures. Quite frequently for business method and software application there are a couple of screenshots with the logo and the name of the company.

Inventor address

The city and the state where the inventors live are provided. With a site like MapQuest you can easily figure how distant are the inventors and which city is at a commuting distance from inventors. Usually geographical data are not sufficient to be sure that a patent was filed by a company for the reason that in USA companies working in a given sector are usually located in the same area.


Text analysis

Word count statistics give good result in patent search.

The principle is to measure the word frequency in a reference text and to use the most frequently used words in the search. Related patents are likely to exhibit similar word frequencies. The results are better when the reference text is of the same type as the searched section: use most frequent words in reference claims to search on claims and use most frequent words in a reference description to search on description.

If the reference text is written in Word you can use the WordCount macro of Cuckoo. This macro writes a report document like this:

91 2% patent

33 1% search patents

25 0% http

24 0% was ://

23 0% invention

22 0% process

20 0% claim

18 0% use claims www system

17 0% same

16 0% section application

15 0% art also

14 0% software 359

13 0% prior most lockwood text

12 0% exist company information only office travel web

11 0% business find American when because

10 0% page filed from uspto inventor experience court


This macro was designed primarily to improve the wording and the page performance on public search engines. Regarding the latter aspect you can find other tools on the Web. The goal is to facilitate the work of the search engines’ robots. These robots use word frequencies in the displayed text to determine the page subject, the idea being that a user who makes a search with the page’s most frequent words should find the page interesting.

We wrote a standalone tool to measure word frequencies in html pages also called WordCount.


WordCount is written in Java (JDK 1.3 or 1.4) and uses the Kizna HTML parser.

HTMLParser is a library released under LGPL, which allows you to parse HTML (HTML 4.0 supported.) You can download HTMLParser from





Example of exclusion file


WordCound is a command line tool. Assuming that you have included the JDK on your path and the WordCount.class directory and htmlparser.jar on your CLASSPATH you can measure the word frequencies in the 20020129129 patent application with:

java WordCount



&s1='20020129129'.PGNR.&OS=DN/20020129129&RS=DN/20020129129" excl.txt

Where the first parameter is the URL of the patent application on the USPTO site and excl.txt is a list of words to exclude from the measurement. We should get a report like this:

Found 5783 words

2% 119 component$

1% 91 client$

1% 88 server$

1% 86 handler$

1% 82 device$

1% 81 fil$

1% 71 download$

1% 70 system$

1% 68 network$

1% 58 document$

0% 57 install$

0% 55 element$ script$

0% 53 contain$

0% 51 method$

0% 49 application$

0% 47 logic computer$

0% 45 object$

0% 41 operat$

0% 39 window$

0% 38 browser$

0% 37 frame$

0% 35 internet

0% 34 platform$

0% 33 function$

0% 32 provide$

0% 31 require$ factory deploy$

0% 30 invention

0% 29 database

0% 27 render$ specific interface$

0% 26 embedd$ implement$ schema

0% 25 process$ button$

0% 24 operation$

0% 23 version$ enabl$

0% 22 visual assembler$ installation manager wireles$

0% 21 standard$ invoke$ us$ layout

0% 20 support$ present$ event$ starter task$


The report contains one line per word frequency that contains:

  • The word percentage

  • The word count

  • The words whose word count occurrences were found. In some cases the word is used with different suffixes. In this case only the beginning of the word is displayed followed by a $.

WordCount analyzes the text in the HTML elements. It excludes the words in excl.txt and applies a few English-dependent rules:

  1. It eliminates numbers

  2. If a word differs of another word only by an ending s, d, ed, ing, both words are considered to be the same (see the ignore method in the source code)

Therefore for other languages you probably need to slightly modify the code.

We do not believe that an improved WordCount would be much more effective. Once you have generated the list you must manually select the unexpected frequent words to be the keywords of your queries.

Search revisited

Advanced search tools in Patent search engines and in Web search engines share a common drawback: they do not allow finding ALL documents of interest for you. It is usually a minor drawback in case of Web search but in case of patent search you need to be reasonably sure that your process infringes no patent.

With the help of WordCount we can build an unordered keyword vector, for instance in case of 20020129129 V = { "server", "client", "download", "deploy$", "remote" }

Then we can define a patent distance dV to the reference text, using the vector V as

  • 0 when the patent contains all keywords in V

  • 1 when the patent contains all keywords in V except one

  • 2 when the patent contains all keywords in V except two and so on

Then we can look at the patents at a given distance or closer. The idea is to recognize that there is some uncertainty in the wording of patents. We can expect that related patents use most of the words in the keyword bag but not necessarily all of them.

We can compare the lists found with this distance and with Boolean searches:

  • If we make a search with server AND client AND download AND deploy$ AND remote we find all patents at a distance of 0.

  • If we make a search with server OR client OR download OR deploy$ OR remote we find all patents at a distance of 4 or closer (size(V) –1).

  • We cannot find patents at a distance of 2 or closer.

It would be nice if search engines could recognize a syntax like this:

["server", "client", "download", "deploy$", "remote" ]2

to mean that we are looking at all patents at a distance of 2 (that contain at least three words in { "server", "client", "download", "deploy$", "remote" }.)

This syntax is not supported. Therefore we need to write our own tools:

  • Uspto processes queries to USPTO

  • Intersect compares the responses returned by Uspto queries

Here is an example:

java Uspto uspto.txt "aclm/server and ccl/709/220 and (aclm/client or aclm/deploy$ or aclm/remote)" server

java Uspto uspto.txt "aclm/client and ccl/709/220 and (aclm/server or aclm/deploy$ or aclm/remote)" client

java Uspto uspto.txt "aclm/deploy$ and ccl/709/220 and (aclm/server or aclm/client or aclm/remote)" deploy

java Uspto uspto.txt "aclm/remote and ccl/709/220 and (aclm/client or aclm/deploy$ or aclm/server)" remote

java Intersect intersect.html server20021130.txt client20021130.txt deploy20021130.txt remote20021130.txt

At each Uspto call we set one keyword as mandatory and the other keywords in an OR statement.

Then Intersect combines the Uspto reports to display a list of patents sorted by growing distance (patents that contain all keywords, then patents that contain all keywords except one and so on.)


Uspto is written in Java (JDK 1.3 or 1.4.)




Uspto.class, Uspto$TextUrl.class

Example of keyword file


Uspto is a command line tool whose syntax is:

Uspto keyword_file advanced_query_string report_file

Assuming that you have included the JDK on your path and the Uspto.class directory on your CLASSPATH you can list patents whose class is 709/220 and whose claims contain "server" and either "client" or "deploy" or "remote" with:

java Uspto uspto.txt "aclm/server and ccl/709/220 and (aclm/client or aclm/deploy$ or aclm/remote)" server

Here is an example of keyword file:

publish 2

subscrib 2

network 2

install 2

method 2

assembl 2

server 2 client

Uspto calls the USPTO site in HTTP using the query parameter. Then Uspto parses the response and identifies the patent titles and URLs. Uspto calls again the USPTO site if the patent list cannot fit on one page. Next Uspto computes the patent title weights using the keyword file data.

A line in the keyword file contains two or more fields:

  • If there are only two fields then when it finds this field in the patent Uspto adds the number in the second field to the patent weight

  • If there are more than two fields then Uspto adds the number in the second field to the patent weight when the title contains the first field AND at least one other field (third field and next)

The weight idea is that we only read patents whose title sounds interesting and that it is boring to read 50 and more patent titles. Weights aim to facilitate this first selection. When you write a keyword file keep in mind that a title contains between two and twenty and more words. You cannot be specific.

Uspto creates two files:

  • A HTML report named report_file + timestamp + ".html", for instance server20021130.html where Uspto displays patents with a weight > 0

  • A text report named report_file + timestamp + ".txt", for instance server20021130.txt where Uspto stores the patent application number, weight, title and URL


Intersect is written in Java (JDK 1.3 or 1.4.)




Intersect.class, Intersect$AppliData.class

Intersect is a command line tool whose syntax is:

Intersect report_file uspto_report [uspto_report]*

Assuming that you have included the JDK on your path and the Intersect.class directory on your CLASSPATH you can compare reports issued by Uspto on November 30, 2002 with:

java Intersect intersect.html server20021130.txt client20021130.txt deploy20021130.txt remote20021130.txt

Here is a report returned with the command listed in the "Search revisited" section. The report contains four colums:

  • Application number

  • Count = the number of query in which the patent was found. It is the opposite of the distance. With four Uspto reports a count of four means that the distance is zero, a count of three means that the distance is one and so on

  • Weight: same meaning as on the Uspto report

  • Summary: patent title

You can click on the application number to display the patent. However the link is a string like "





which means that it can interpreted in a different way by the USPTO dynamic search engine the day after.

Our experience is that these tools help finding patents of interest. Because they are simple you can adapt them easily to your needs. Then you can run them from time to time to check new patent applications. The key issue when we use or modify these tools is to keep a broad-enough spectrum to be reasonably sure that all patents of interest are in it and small enough to be able to analyze the patents.

Prosecution analysis


You may want to see the prosecution history of a patent (like 5,937,422) or an application 20020129129. For patents things are simple, just go on the PAIR (, select Patent number in the Search method dropdown list and enter the patent number. For applications you must first find the correct application number.

First go on and search application by number. In the application file you should have something like this:

The correct application number is made of the Series Code followed by / and by the Serial No. In this case this is 10/081921. Now you can go on and enter this number:

When you click on the Submit button you should get something like this:

In this case you have five tabs. The transaction history contains what we called so far the prosecution history. The continuity data are quite useful to list the other applications in the same family. Published data usually points the patent text we have seen above. Because the examination has not yet started in this case (status = Ready for Examination) the image tab is missing. This is the case also when the patent has been granted.

This image tab is present for the previous application (10/081921):

Click on a checkbox and then on the Start Download button to download the corresponding file in PDF format. It looks like the USPTO borrowed this not-so-obvious modus operandi from the EPO. Note that "Non-Final Rejection", "Applicant Arguments or Remarks Made in an Amendment" and the "Claim" coming with "Applicant Arguments or Remarks Made in an Amendment" are of interest:

  • In a "Non-Final Rejection" the examiner explains why he rejected the application.

  • In an "Applicant Arguments or Remarks Made in an Amendment" the applicant explains why its application is patentable with the amended claims in the "Claim".


The EPO site contains prosecution files for applications and patents whose prosecution took place in Europe. To download these files go to and enter a European application number, for instance EP1368755 (Method and system for providing message publishing on a dynamic page builder on the internet from IBM). You should have this:

Click on the “Open this application” icon. You should get this:

Click on a checkbox and then on the Start Download button to download the corresponding file in PDF format. Of a particular interest are:

  • the international search report and examination (under PCT),

  • the priority document (the original application, for instance a provisional application for an application first filed in the USPTO),

  • the examinations,

  • the amendments.

Open Source

As you can see PageBox is reasonably distant from patents covering similar processes. Furthermore the development started at the end of 1999, which means that there is prior art to oppose to an attempt of patenting a similar process. Normally this happens silently. The patent examiner checks that claims are novel. For this checking the examiner uses patent databases and Internet archives like (Internet Way back machine) where many PageBox pages are recorded. If she finds that a claim is not novel the examiner rejects the patent application.

Open Source developers do not necessarily check if their method or process is patented. This is understandable because the patentee usually sues the richest parties between the publisher and the product users.


Therefore a company that would believe that PageBox infringes its patent would sue PageBox users rather than us and those users would have no means to complain to us. Large firms frequently perceive this aspect as a significant drawback of Open source. We must however clarify this issue.

The contract of a commercial program typically has a warranty and liabilities article that contains a clause like this: "The supplier must indemnify and hold harmless the customer against any claim concerning intellectual property relating to the program, provided that:

  • The customer informs in writing the supplier of such claim in due time

  • The customer assists the supplier and provides it with the information necessary for the enforcement of the above provision

  • The supplier has the exclusive control of the defence and the entire freedom to sign a possible transaction."

The buyer of a commercial product may be involved in a patent dispute and if it does not settle a transaction with the patentee the publisher may just reimburse the paid fees to the buyer and not the training and switching expenses. These expenses are usually higher for a commercial product than for an Open Source product.

Patents Presentation Search Issues Strategies Business Methods Patentability Analysis MercExchange eBay Trial Reexamination Business view Granted patents Examination USPTO EPO PanIP Eolas 1-click family 1-click analysis 1-click prior art Trademark Copyright
2001-2005 Alexis Grandemange   Last modified