body { margin:0; padding:0; font-family:times; font-size:9pt; } p { margin-top:5; margin-bottom:5; font-weight:plain; font-family:serif; font-size:9pt; } h1 { margin-top:5; font-size:16pt; font-weight:bold; text-align:center; font-family:sans-serif; } h2 { margin-top:20; font-size:10pt; font-weight:bold; text-align:left; font-family:sans-serif; } pre { color:#993333; font-size:9pt; } code { color:#993333; } .section { text-align:justify; padding:5; }
Why on earth should I provide yet another W3C DOM implementation?
The simple answer is for testing, compatibility and performance purposes. This is also the right answer, but not quite complete.
The other reason is a fascination with the different nature of system designs and trying to gain insight into different computational structures.
PDOM is a persistent DOM implementation based on the Generic Persistent Object Model.
CTCDOM is fundamentally a re-factoring of PDOM but based
on CTCMap. For more information on CTCMap checkout the
whitepaper.
This implementation required additional methods for CTCMap and by the
end I could be confident that CTCMap was a fully functional class.
No, the refactoring task was straightforward, perhaps totalling 4-5 hours after
testing with sample XML files and adding a few methods to CTCMap.
The most obvious performance comparison for a straightforward in-memory object
model is Xerces. Since CTCDOM is based on the generic
CTCMap object, any figures within an order of magnitude of Xerces
would be satisfactory.
The parse time for a 5.2Mb xml file was slightly slower then Xerces, taking
2900ms compared with 2500ms.
However, the CTCDOM built the final objects eagerly, while Xerces
defers some object creation until needed.
A benchmark to evaluate a full recurse of the DOM objects took
Xerces 350ms on first pass, and 50ms thereafter.
The full recurse with CTCDOM took 50ms or each pass.
The combination of these figures indicates that the combination of parsing and
object creation is broadly similar and processing of the DOM objects
equivalent.
Once the DOM objects are instantiated, Xerces required around
31Mb and CTCDOM around 24.5Mb. So
CTCDOM appears to get by with 20% less memory than
Xerces.
Below is a java code fragment used to test CTCDOM.
import cutthecrap.ctcdom.*;
import cutthecrap.xpq.*;
import java.io.FileInputStream;
import org.w3c.dom.*;
CTCDOM pd = new CTCDOM();
String f = "D:/testxml/Opera.xtm";
CTCDocument doc = pd.createDocument("test", new FileInputStream(f));
// the standard W3C API can be used
Element el = doc.getElementById("ninetta");
// plus extended XPath support!
Iterator rs1
= doc.queryXPath(
"id('ninetta')/instanceOf/topicRef/id(@xlink:href)"
+ "/baseName/baseNameString/text()" );
while (rs1.hasNext() {
System.out.println("Name: " + ((Node) rs1.next()).getNodeValue());
}
Note the "extended" use of id(@xlink:href) allowing navigation
of the XML structure based on ID values referenced by attribute
values of "related" nodes.
As a functional W3C DOM API implementation CTCDOM
can prove useful and offers better memory utilisation.
The purpose of the exercise tho' was as a test of the CTCMap class
and to investigate performance and functionality compared with the Generic Persistent
Model on the one hand and the conventional but much refined Xerces implentation.
With this in mind the results are very pleasing. CTCMap could
clearly be used as a highly functional and well performing utility class.
It is hoped that this will encourage the use of CTCMap and improve
understanding of the Generic Model and Cut The Crap Software.
This is the usual plea for developer feedback!
Please mail me with any comments, criticisms and suggestions. Even harsh comments are welcomed since at least it shows you can be bothered.