Yet Another DOM: CTCDOM

Author: Martyn Cutcher

email martyn@cutthecrap.biz

website www.cutthecrap.biz

Why on earth should I provide yet another W3C DOM implementation?

The simple answer is for testing, compatibility and performance purposes. This is also the right answer, but not quite complete.

The other reason is a fascination with the different nature of system designs and trying to gain insight into different computational structures.

CTCDOM: Derived from PDOM

PDOM is a persistent DOM implementation based on the Generic Persistent Object Model.

CTCDOM is fundamentally a re-factoring of PDOM but based on CTCMap. For more information on CTCMap checkout the whitepaper.

This implementation required additional methods for CTCMap and by the end I could be confident that CTCMap was a fully functional class.

A Big Job?

No, the refactoring task was straightforward, perhaps totalling 4-5 hours after testing with sample XML files and adding a few methods to CTCMap.

Performance

The most obvious performance comparison for a straightforward in-memory object model is Xerces. Since CTCDOM is based on the generic CTCMap object, any figures within an order of magnitude of Xerces would be satisfactory.

Parsing

The parse time for a 5.2Mb xml file was slightly slower then Xerces, taking 2900ms compared with 2500ms.

However, the CTCDOM built the final objects eagerly, while Xerces defers some object creation until needed.

Processing

A benchmark to evaluate a full recurse of the DOM objects took Xerces 350ms on first pass, and 50ms thereafter.

The full recurse with CTCDOM took 50ms or each pass.

Observation

The combination of these figures indicates that the combination of parsing and object creation is broadly similar and processing of the DOM objects equivalent.

Java Memory

Once the DOM objects are instantiated, Xerces required around 31Mb and CTCDOM around 24.5Mb. So CTCDOM appears to get by with 20% less memory than Xerces.

Typical Session

Below is a java code fragment used to test CTCDOM.

import cutthecrap.ctcdom.*;
import cutthecrap.xpq.*;
import java.io.FileInputStream;
import org.w3c.dom.*;

CTCDOM pd = new CTCDOM();

String f = "D:/testxml/Opera.xtm";

CTCDocument doc = pd.createDocument("test", new FileInputStream(f));

// the standard W3C API can be used
Element el = doc.getElementById("ninetta");

// plus extended XPath support!
Iterator rs1 
  = doc.queryXPath(
    "id('ninetta')/instanceOf/topicRef/id(@xlink:href)"
    + "/baseName/baseNameString/text()" );

while (rs1.hasNext() {
  System.out.println("Name: " + ((Node) rs1.next()).getNodeValue());
}

Note the "extended" use of id(@xlink:href) allowing navigation of the XML structure based on ID values referenced by attribute values of "related" nodes.

Conclusion

As a functional W3C DOM API implementation CTCDOM can prove useful and offers better memory utilisation.

The purpose of the exercise tho' was as a test of the CTCMap class and to investigate performance and functionality compared with the Generic Persistent Model on the one hand and the conventional but much refined Xerces implentation.

With this in mind the results are very pleasing. CTCMap could clearly be used as a highly functional and well performing utility class.

It is hoped that this will encourage the use of CTCMap and improve understanding of the Generic Model and Cut The Crap Software.

Feedback * Feedback * Feedback

This is the usual plea for developer feedback!

Please mail me with any comments, criticisms and suggestions. Even harsh comments are welcomed since at least it shows you can be bothered.