body { margin:0; padding:0; font-family:times; font-size:9pt; } p { margin-top:5; margin-bottom:5; font-weight:plain; font-family:serif; font-size:9pt; } h1 { margin-top:5; font-size:16pt; font-weight:bold; text-align:center; font-family:sans-serif; } h2 { margin-top:20; font-size:10pt; font-weight:bold; text-align:left; font-family:sans-serif; } pre { color:#993333; font-size:9pt; } code { color:#993333; } .section { text-align:justify; padding:5; }
The Generic Persistent Object Model (from now referred to as GPO)
is the basis for most of the Cut The Crap software. But it is often difficult for
programmers to appreciate what such a model achieves and how it relates to more
familiar - database - technology.
This whitepaper attempts to introduce GPO by comparing with better understood relational structures. GPO is used initially to reproduce similar database structures and then modified to more directly represent the problem domain. The example system chosen is derived from the Microsoft Northwind database schema.
At the most fundamental level a database is simply a medium that support the storage and retrieval of data.
But, mention "database" to a computer programmer/designer, and other expectations will be raised.
A common description of a database may well be:
"A system where data is arranged in a number of tables"
The arrangement of data in tables seems very intuitive. For example, a database that includes information on "Contacts" may well have a "Contact Table". This table will have "columns" that correspond to the properties of a "Contact", for example: "Name", "Address", "TelephoneNumber" and "Email".
A "Table" is populated by a number of "Rows" of similar data.
For our Northwind-based example, we'll also need tables for Products,
Orders and OrderItems, the purpose of each should be clear as we
proceed.
A database is a global data "system", and "tables" define identifiable "sets" of data.
In GPO there is no direct equivalent, but it might be interesting to
examine this a little more closely.
A "Table" is a globally identifiable "object" within a database, so we can simply define such an object. To start with though, we'll create an object that we can identify as the "database":
GPOMap db = new GPOMap(om);
om.remember("Database", db);
Now we can create our table:
GPOMap contTable = new GPOMap(om);
..and follow some naming convention to be able to retrieve it from the Database object:
db.set("table:Contact", contTable);
Now we can easily retrieve the database and the customer table:
GPOMap contTable = new GPOMap(om);
contTable.set("Database", db);
GPOMap db = om.recall("Database");
GPOMap contTable = db.get("table:Contact");
The next stage is to be able to add "row" data to the table. All rows will have a reference to the table to which they belong:
GPOMap row1 = new GPOMap(om);
row1.set("table", contTable);
row1.set("Name", "Contact One");
GPOMap row2 = new GPOMap(om);
row2.set("table", contTable);
row2.set("Name", "Contact Two");
In GPO we can retrieve the rows as a set of objects - called a "LinkSet":
ILinkSet rows = contTable.getLinkSet("table");
This is defined as the set of objects that reference the contTable object with
the property "table".
This set object can be used directly, for example in
GPOMap row3 = new GPOMap(om); rows.addMember(row3);
The rows.addMember(row3) achieves the same result as:
row3.set("table", contTable);
Well at first glance it seems close. But it would feel better if it was possible to retrieve all "tables" from the "database" in some standard way, and to access an individual "table" by a "name" property that was associated with the "table".
The way to achieve this is to recognize that the "table" objects are just a set of objects within the "database" just as the "row" objects were a "set" of objects defined by the "table" they belonged to.
If we want to be able to "lookup" members if the set using some named property, then
GPO provides an "Index" object called a "Classifier". This can be registered
to "classify" any set defined by such a property.
om.registerClassifier("DatabaseTables", "Name");
ILinkSet tables = db.getLinkSet("DatabaseTables");
Now we can add the table objects to the set:
contTable.set("Name", "Table: Contact");
tables.addMember(contTable);
GPOMap prodTable = new GPOMap(om);
prodTable.set("Name", "Table: Product");
table.addMember(prodTable);
Rather than using a "hard-wired" object reference to retrieve a table object, we would now access it by its "Name" property:
GPOMap aTable = tables.getClassifier("Name").getValue("Table: Contact");
In a relational database, to retrieve all the rows of a table, a SELECT
statement is executed, returning a set of values:
SELECT * FROM TABLE CONTACT
In GPO you would access the set of rows defined by the table:
ILinkSet rows = contTable.getLinkSet("table");
From the ILinkSet the "member" objects would be retrieved using the
iterator() method of the set.
Note that this is similar to the use of a database cursor in that objects are not all retrieved in one go, but only as they are requested.
In the "Northwind" data model, a "Product" has a "Name" and "Price", but also has a "Supplier". In our model the "Supplier" will refer to a "Contact".
In a relational database, a "Contact" will have a "ContactId" as a "primary key". In order to reference a "Contact" as a "Supplier" of a "Product", the value of the "Supplier" field in a "Product" row would be set to the "ContactId", this is known as a "foreign key", and can be used to extract the data from the "Contact" table.
In GPO pretty much exactly the same thing is achieved, but the concept of
"primary keys" and "foreign keys" is not needed since these are subsumed by the idea of
persistent "object identity". But its really the same thing. We would just set the "Supplier"
property of a specific "Product" to the selected "Contact":
GPOMap prod1 = new GPOMap(om);
prod1.set("Name", "Widget");
prod1.set("table", prodTable);
prod1.set("Supplier", row1);
So the value of prod1.get("Supplier").get("Name") will be
"Contact One".
In a relational database, the primary keys provide efficient access to the individual rows they identify.
In a one-to-many relationship the foreign key column of the many table should also be indexed to allow for efficient retrieval of the associated set of rows.
In GPO object references are more similar to a rowId than a primary
key. In a many-to-one association, the set of objects is defined by a common property
reference as we have already seen - "tables" all reference the same "database", "rows" reference
a common "table".
The link structure implemented within GPO "winds" through the set members - it is
just a double-linked list of object references. The effect tho' is that an extremely efficient
indexing mechanism is created.
Any arbitrary set can be created in GPO simply by referring to some common object
with a common property.
Although Northwind does not require any many-to-many associations, it would be easy to introduce some. For example, perhaps we could define a product classification, one where products might be on more than one category, for example "BOOK and FICTION", or "CD and CLASSICAL and MUSAK".
To represent such structures in a relational database requires a special table called a join table. In such a table columns define foreign keys to the rows they join. For example:
**PRODUCT** ID Name Price ---- ---- ----- 123 Abbey Road 12.99 **CATEGORY** ID Name ---- ---- 23 CD 35 CLASSICAL 44 POP **PRODUCT_CATEGORY** Product Category ----- ------ 123 23 123 44
where PRODUCT_CATEGORY is a join table that "joins" the "PRODUCT" and
"CATEGORY" tables, defining the many-to-many association.
In order to be able to extract data efficiently, a relational database must maintain indexes on both the columns of the join table. A query across the table structure might look something like this:
SELECT PRODUCT.Name, CATEGORY.Name FROM PRODUCT_CATEGORY JOIN PRODUCT ON (PRODUCT_CATEGORY.Product=PRODUCT.ID) JOIN CATEGORY ON (PRODUCT_CATEGORY.Category=CATEGORY.ID)
In GPO the same fundamental structure is required. However, the join
structure can be provided by another GPO that can be stored with either one of the
linked objects. The effect of this, is that navigating many-to-many associations is of
the same order of efficiency as navigating a one-to-many association. Okay, not exactly the
same, and there is a small "space" penalty, but it is not significant.
Here is what it would look like using GPO:
GPOMap abbey = new GPOMap(om);
abbey.set("Name", "Abbey Road");
abbey.set("Price", 12.99);
GPOMap cd = new GPOMap(om);
cd.set("Name", "CD");
GPOMap pop = new GPOMap(om);
pop.set("Name", "POP");
new ManyManyLink("product-category", abbey, "category-product", cd);
new ManyManyLink("product-category", abbey, "category-product", pop);
In order to retrieve the categories associated with abbey we
simple get the LinkSet associated with "product-category" and
request an iterator.
Iterator cats = abbey.getLinkSet("product-category").iterator();
[The default action of the LinkSet iterator is to "resolve"
any many-to-many references.]
This just means that any reference from one table to another must be valid - that if a foreign key value is provided, a matching primary key value in the "foreign" table must exist.
It is easy to see why this is such a big deal. The association of data using foreign keys is fundamental to database design.
Relational databases provide a number of tools to help with managing referential
integrity, from CONSTRAINTS to TRIGGERS. It is mostly
down to the database designer tho' to specify how these should be applied, and this
can get pretty complicated. (Which tends to be why you need a Database Administrator
- to fix things when things go wrong)
In GPO the intrinsic object linking structures guarantee that
referential integrity is maintained. It is simply not possible for a reference to exist
to an object that does not.
From the point of view of system/design scalability this is a huge bonus. It is possible to design object structures without consideration for how the referential integrity should be managed - the system will look after its own referential integrity.
You may have noticed that recent GPO snippets have not bothered to create
"Table" objects. In GPO we would tend to address the object associations
directly and not worry about mappings to data in tables.
In a database, the root object is the "database" itself. And the "tables" are globally accessible from that object.
Initially, we created a global "database" object in GPO. We can now
think how we would like to organize our objects to implement the "Northwind" model. Rather
than a global "database", we can instead retrieve a "Northwind" object:
GPOMap nw = new GPOMap(om);
nw.set("Name", "Northwind");
om.remember("Root", nw);
The next time we start up the system, we will be able to recall the root
object:
GPOMap nw = om.recall("Root");
Our root "Northwind" object will have one-to-many associations with "Contacts", "Products" and "Orders". It should be able to lookup a "Contact" by its name - and maybe a "Product" also.
om.registerClassifier("contact/northwind", "Name");
om.registerClassifier("product/northwind", "Name");
ILinkSet contacts = nw.getLinkSet("contact/northwind");
ILinkSet products = nw.getLinkSet("product/northwind");
We can now create some contacts and products:
GPOMap c1 = new GPOMap(om);
c1.set("Name", "Mr Brown");
contacts.addMember(c1);
GPOMap p1 = new GPOMap(om);
p1.set("Name", "Widget");
products.addMember(c1);
Intially this seems very similar to the global "Table" model, with "Northwind" simply replacing the "Database", but as the model develops it will move further from the flat "table" structure.
We can create an "Order" object, add "OrderItems" to the order, specify which "Contact" is the "Customer" for the "Order", which "Contact" should the "Order" be "DeliveredTo", etc.. etc..
We become less interested in global "Orders" - and certainly not global "OrderItems". Instead "OrderItems" are simply objects that are referenced from an "Order" and a "Product" object.
You may now be interested in taking a look at another whitepaper, one that
introduces "The Alchemist" system generator. In this
whitepaper "Northwind" is fully implemented using "The Alchemist". Rather
than using GPOMap objects directly, specialized java
classes provide interfaces to provide a programming API to the
implemented object model.
GPO can be used to represent similar data structures to a relational
database. If we choose to, we can even represent the sets of objects as "tables" within
a "database".
The intrinsic object linking implemented by GPO provide guaranteed
referential integrity and efficient navigation of object associations.
Property value based indexing is provided using "Classifier" objects over a specified one-to-many or many-to-many set.