XML processing is broadly defined as either DOM or SAX based. Either a complete document level object model is built, or events are triggered by each element. It is suggested that there is a simple combined approach, offering the advantages of each.

DOM vs. SAX

DOM processing is used to build a single in memory object model of the imported XML document. The advantage is that all the data can be accessed conveniently for whatever further processing requirement exists. The main disadvantages are:

The SAX parsers that are increasingly used tend to perform much more effectively. The main advantage is that they do not have resource problems when processing large files. The main disadvantage is:

MiniDOM

The idea for MiniDOM came after working with various software components that use SAX parsers.

It became clear that a number of techniques were needed to introduce context based event processing, and that such processing then typically built temporary data structures to be accessed when specific elements were closed - via an endElement event.

Essentially what was happening was that on a case by case basis, fragments of a DOM model were built to be processed atomically by a later event. But that rather than being built generically as they would be by a DOM parser, they were each built by special code. This just seems like a huge waste of time, as well as being unreliable, unmaintainable and unscalable.

The Alternative

The MiniDOM approach is simply to provide a wrapper for a SAX parser, with which a programmer can register clients that are called when specific elements are built.

public class HandlerApp {

  public void processFile(String fname) {
    MiniDOM md = new MiniDOM();

    dom.register("TAG1", new MiniDOM.IHandler() {
      public void handleIt(Element elem) {
        processTAG1(elem);
      }
    } );    

    dom.register("TAG2", new MiniDOM.IHandler() {
      public void handleIt(Element elem) {
        processTAG2(elem);
      }
    } );

    md.process(new FileInputStream(fname));
  }

  public processTAG1(Element elem) {
    //...
  }

  public processTAG2(Element elem) {
    //...
  }
}

In this way, whenever the TAG1 element is ended, the processTAG1 method will be called with the relevant Element object.

If a document contains 10 or 10 million such elements, there is not the DOM disadvantage of building the whole document model, but rather each element is delivered for processing at the granularity required.

Moreover, if at any point no handlers would be active, then no unnecessary data objects are built and retained.

It should be noted that MiniDOM XML is case sensitive. The ONLY strings that are looked for explicitly by the Element object is the assumption that the Element identifier may be specified by either an ID attribute or a name attribute.