Documents clustering

En franÁais


To give a sample of the benefits of using XML, within my PhD work (, I wrote a program in 1998 to do clustering of AML (Astronomical Markup Language) documents. It was using both the meaningful links between the documents, and the keywords associated with them, using a noising partitioning technique, and displaying the result on a topic map. The documents could be retrieved automatically from various sources, starting from an initial document and using the AML links to retrieve the related documents. It was a success, but as many cool PhD software, it disappeared from the web since it could not be maintained anymore.

Back to 2004, I needed a program to cluster other documents, and couldn't find any free software to do this simple task. I decided to resurrect this project, and I found a way to specify the list of documents, keywords and links in an external XML document. This way, it can now work for any collection of documents, even non-XML documents.

Using the program

Here is a sample document list, with keywords and links. The DTD is included in the package.

    <DOCUMENT id="108">
        <TITLE>Vitesse orbitale</TITLE>
            <LINK toid="110"/>

When the document list is ready, the clustering program can be launched (just double-click on Clustering.jar).


The clustering algorithm is first spreading the documents randomly on the grid, then move them in order to reduce the "cost" progressively. After a while, it stops and the result is recorded in a grid.xml file.

This grid XML file can then be displayed with the DispGrid applet, with an HTML file containing this code:

<applet code="dispgrid.DispGrid" archive="DispGrid.jar" width="100" height="100">
    <param name="url" value="http://server/grid.xml">



The software is available under GPL licence.



Some web browsers prevent applets from displaying a new window : Internet Explorer with Windows XP SP2 (it used to work before SP2) or Google bar's popup blocker, Firefox 1.5 (it used to work before version 1.5). The applet cannot display a selected web page because of this.

Author: Damien Guillaume