body { margin:0; padding:0; font-family:times; font-size:9pt; } p { margin-top:5; margin-bottom:5; font-weight:plain; font-family:serif; font-size:9pt; } h1 { margin-top:5; font-size:16pt; font-weight:bold; text-align:center; font-family:sans-serif; } h2 { margin-top:20; font-size:10pt; font-weight:bold; text-align:left; font-family:sans-serif; } h3 { margin-top:20; font-size:9pt; font-weight:bold; text-align:left; font-family:sans-serif; } .appendix { margin-top:20; font-size:12pt; font-weight:bold; text-align:left; font-family:sans-serif; } pre { color:#993333; font-size:9pt; } code { color:#993333; } .section { text-align:justify; padding:5; }
Probably the first popular spreadsheet was VisiCalc. This ran on the first Apple personal computer.
However, the first "modern" spreadsheet was probably Lotus 123. It was 123 that introduced the conventions of cell addressing still used today.
Despite a few minor variations spreadsheets have not really changed very much since they were introduced. Sure, integration of graphics utlities have made them prettier and an aid to management reports and presentations. But as a computational model not much has changed.
Scalability is addressed somewhat with the idea of "linked worksheets" but the fundamentals remain the same - a spreadsheet is a static grid of cells and formulas are defined that define values for some target range of cells from some source range of cells.
Not many people really get on with programming computers. But a significant number can use spreadsheets to some extent.
I think the main reason for this is that a spreadsheet provides a model that is fairly intuitive and upfront.
You create an empty sheet and type some values into a number of cells. These values might be simple strings. You can see when you enter a value that the cell has some identity - maybe 'A3' for example.
Maybe you have entered a number "4" in cell A3, and a number "5" in A4. If you now select
cell A5, you can enter a formula like =A3+A4. You can straight away see that the
value of this cell has been computed to "9". This is pretty cool. The immediacy of the feedback
and the direct accessibility of the model provide an excellent experimentation environment in
which to learn.
Before too long you'll discover how to write a simple formula to operate over a range of cells and once you've done that your away.
But the most interesting thing about spreadsheets is how functional they are. It is possible to use a spreadsheet to create very sophisticated models of complex data. Not only that, but the dependency mechanisms ensure that processing is minimised to re-calculate only those values that may be effected.
It seems strange then, that pretty much everywhere else in computing, programmers use a variety of sequential programming languages to define the computations they need. Why is this?
One reason is that the spreadsheet world is constrained by this idea of cell addressing. This mechanism seems to be a given. The way you address a cell is by referring to its global row and column co-ordinates. But why? It may be convenient in a desktop environment but are the cell addressing schemes really so fundamental? Could there be an alternative?
Around 1987 the Trapeze spreadsheet was available on the Apple Macintosh. Its main feature was that you created named cellular objects. One might be a grid of 10x20 cells, another might be a single cell.
Any formula was always relative to some named object. This achieved two things (at least). Visually you could drag these objects around and arrange them how you liked, and more importantly you now had a spreadsheet structuring mechanism equivalent in some sense to modular programming in a conventional programming language.
I remember reading a review by a journalist where they described Trapeze as suitable for the novice who liked to give elements names, but that a real expert would prefer the flexibility of directly addressing global cells. This is the equivalent of saying that you would rather program in raw machine code than 'C'. Well, maybe for some very specialized applications - but in general?
I have always felt sorry for the producers of Trapeze since intentionally or not they produced a really cool product that should have been far more influential.
Databases tend to be used for two - possibly conflicting - purposes. The primary purpose is usually the running of a business. Storing transaction data for example, or customer details. The "other" purpose is often the production of reports.
Sometimes two or more database may be used. It is increasingly common to extract data into a multi-dimensional database to support a variety of reports and queries. If a single database is used the overhead of using a single schema to support different purposes is usually to the detriment of each.
A fairly common usage tho' is to extract certain data from a database to populate a spreadsheet, and then use the spreadsheet to provide different views and summaries of the data - and maybe the odd pie-chart as well.
But could we do better than this?
In developing the GPO model I was frequently struck by similarities between the
dependency mechanism needed to maintain GPO integrity and the kind of mechanism
that would be needed for dataflow (spreadsheet) computations.
In addition, the mechanism to support the classification objects seemed straightforward to generalise to other forms of set-oriented computation.
An IPropertyValueListener interface provides notification to objects registered
against a specific object property. The classifier objects are registered against the property of
each member object that is the basis for the classification - for example, if a set of objects
are classified on their "name" property, then the classifier will register as a listener
for any change to that property on each of the members. Any change will trigger a
re-calculation of the classification entry.
In addition to monitoring current set contents, the classifiers need to be notified when
objects are added to or removed from the set. The ISetListener interface
provides these notifications.
It should be evident that these two dependency mechanisms are theoretically able to provide the basis for event based notifications to support spreadsheet-type computations.
In a spreadsheet a set of values is defined by a cell range, say "A2:A12" would define a rectangular grid in column 'A' of rows 2 thro' 12.
Normally this set of values is a subset of a larger rectangle - say "A2:F12" where columns B thro' F might represent different but related values.
In a relational database this is like specifying a column within a table of records.
In GPO a set is defined by an object association. If a number of objects all
reference object P using the property "parent", then that set of objects can be
addressed using P.getLinkSet("parent"). This set of objects now needs to be
further refined to identify a specific value by providing the name of the "value" property.
Therefore the spreadsheet formula =SUM(A2:A12) might be equivalent to
SUM(P,"parent","value").
In fact a similar computation was used to test the theory. I created a SUMSET object
that implemented ISetListener and IPropertyValueListener and maintained a
summation of some values of the objects within the defined set.
The first move towards formula-based properties was defining the IComputation
interface. Objects that implemented this interface could be set to some GPOMap
property. When the value of the property was accessed, the object would be requested to
compute the value to be returned.
Object computeValue(IGPOMap context);
By making the SUMSET class implement IComputation a SUMSET
instance could be set to some property, and its value retrieved by simple property access.
gpm.set("total", new SUMSET(gpm, "parent", "value"));
...
System.out.println(gpm.get("total"));
To make the SUMSET work as intended there was a little work to be done. As well
as registering as a ISetListener on the target LinkSet. On each set member
a IPropertyValueListener had to be registered on the value property.
Now it was possible to update the computation whenever objects were inserted into or removed from the set, and when any value was modified.
To trigger any dependency on the computed value itself, the SUMSET had to trigger
update events when listeners were registered on any properties to which it was set.
To support this new hijacking of the Listener structures the support framework was
re-implemented to use a new linking structure. The new structure replaced the use of Arrays
to register listeners with generically extendable linked list mechanisms. This ensured
that storage requirements were spread evenly between objects so that in theory millions of objects
might listen to a specific value.
But another aspect of scalability is one of complexity. Was it really feasible to write a library
of IComputation objects and how would they be combined by a programmer to be used
effectively?
It was soon clear that using these java objects directly was not going to be convenient. But what might the alternative be?
The development of the IComputation interface and considering its applications led
to a number of realisations.
Primarily was that any computation had to be carried out within the context of a GPOMap
object.
What I was after was something akin to a specialised method, but one that was triggered by data change and would propagate any change. Methods can't do that, mainly since they have no "instance" identity - they are "class-based" properties.
A key difference of the IComputation object was that it was just that - an object.
And as such could be registered as a listener to data change events.
KAWA is an implementation of Scheme in Java. I have played with it a little. Of interest for me was that KAWA was not a Scheme interpreter, but rather compiles Scheme expressions into a Java object structure.
This approach seemed appropriate.
KAWA could not itself be used. The computational environment is quite different, KAWA's scheme world is closed and it would be a major project in itself to consider what it would mean to integrate within a persistent environment.
Instead I thought a while, considering scheme s-expressions, GPOMaps, IComputation
and spreadsheet formulas.
The first formula I wrote on the whiteboard was something like:
(times 0.175 value)
After some thought I realised that the method of extracting a value from the GPOMap
context needed to be more explicit, so this changed to:
(times 0.175 (lookup value))
and soon became:
(* 0.175 (-> value))
I then developed the s-expression parser that would generate the IComputation objects.
Allowing the following to be written:
IGPOMap gpm = new GPOMap(om);
gpm.set("value", 10);
IComputation eval = EVAL_SEXPR.compile(gpm, "(* 0.175 (-> value))");
gpm.set("vat", eval);
This was too much to type tho', so some syntax support was added:
gpm.define("vat", "(* 0.175 (-> value))");
seemed a lot neater, and directly reflects that the computation is defined within
the context of a specific GPOMap object.
The s-expression (* 0.175 (-> value)) was compiled to two IComputation
objects - one instance of EVAL_TIMES and one instance of EVAL_LOOKUP. The
EVAL_LOOKUP instance listened to updates to the "value" property and propagated
update events to its parent computation EVAL_TIMES after recalculating its own
value.
The SUMSET object was re-implemented so that:
gpm.define("net", "(sum parent value)");
would create the appropriate set-based computation.
The way that the s-expression compiler creates the formula objects is via a dictionary of registered classes, for example:
EVAL_SEXPR.registerSexpress("->", EVAL_LOOKUP.class);
Mechanisms are also provided to define these mappings in XML files and to
register these when initializing a GPO system, for example:
<desc>
<gpo objectManager="cutthecrap.oms.ObjectManager"
storefile="/ctc/db/store.rw"/>
<formula-dict import="F:/ctc/formula/stats.xml"/>
</desc>
Where the stats.xml file would look something like.
<formula-dict> <define name='stddev' class='mygpo.formula.stats.EVAL_STDEV'/> <define name='avg' class='mygpo.formula.stats.EVAL_AVG'/> </formula-dict>
With these it is possible to introduce new s-expression primitives. Utility methods
on the EVAL_SEXPR class mean that this is not hard. Here is the difinition
of EVAL_PLUS:
package cutthecrap.gpo.computation;
public class EVAL_PLUS extends EVAL_SEXPR {
public Object recalc() {
double result = 0;
Iterator args = getNodes();
while (args.hasNext()) {
result += makeDouble(args.next(), 0);
}
return new Double(result);
}
}
The makeDouble method handles any recursive formula evaluations.
Interestingly, identifying the primitive EVAL_LOOKUP class has been crucial. since
it is only this class that needs to register property value listeners.
Additionally, EVAL_SEXPR derives from GPOMap which means that
complex computation objects - such as an incremental standard deviation - are able to store
values as normal GPOMap properties rather than needing specialized serialization.
For users of the Generic Persistent Object Model it means that they are able to
define formula based values in a fairly intuitive manner.
That they can avoid having to write many methods.
That they can classify objects on computed values.
I truly believe that this is a highly significant advance in persistent object modelling. Or, in fact, of object modelling full-stop.
It is not clear. Certainly it is possible write programs which could exhibit similar behaviour for specific examples. But I suspect this would be equivalent - at best - to the early experiments and could be very complicated to write.
It had been clear for a while that the generic property change listener mechanism should be able to be used for this kind of functionality. I do not know of other persistent systems that have such structural support.
I would be very interested to hear of any similar system.
Yes, if you visit www.cutthecrap.biz/software/downloads.html and download the full software dsitribution you should be able to use the system as described here.
Appendix : Builtin Formula Primitives
Here are the currently available built-in primitives.
| Lookups | Example |
(-> prop1 ...) |
(-> name) |
(#-> gprop ...) |
(#-> root name)
uses IObjectManager.recall to access the first value |
| Strings | Example |
(str val ...) |
(str 'My name is ' (-> name))
Simple concatenation of all the string values passed as arguments |
(substr val begin [end]) |
(substr 'Mr Brown' 3)
Returns the substring from the begin index to either the end of
the string, or the end index if supplied |
(str-upcase val ...) |
(str-upcase (-> name) ' ' (-> surname))
Returns the uppercase representation of a concatenation of all the string arguments provided. |
(str-downcase val ...) |
(str-downcase (-> name) ' ' (-> surname))
Returns the lowercase representation of a concatenation of all the string arguments provided. |
| Debug | Example |
(watch val ...) |
(watch '''Name'' property is now ' (-> name))
Specializes the str primitive to echo the string value to stdout
whenever it is modified. |
| Set Oriented | Example |
(sum link property) |
(sum parent value)
will sum the value of the "value" property for all objects that defined by the set gpm.getLinkSet("parent") |
(avg link property) |
(avg parent value) |
(count link) |
(count parent) |
(stddev link property) |
(stddev parent value)
incrementally computes the standard deviation of the values in the set |
| Maths | Example |
(+ v1 ...) |
(+ (-> vat) (-> net))
(+ 2 3 4 5 6) evaluates to 20 |
(* v1 ...) |
(* 0.175 (-> net)) |
(- v1 ...) |
(- (-> value) 10)
(- 10) as a single argument returns the negative: -10 |
(/ v1 ...) |
(/ (sum linkset value) (count linkset))
- same value as (avg linkset value)
(/ 3) as a single argument returns the inverse: 1/3 |
(sqr v1) |
(sqr (-> value) 2)
computes the square of the value |
(sqrt v1) |
(sqrt 2)
computes the square root of the value |
(expt v1 v2)(^ v1 v2)
| (expt (-> value) 2) or (^ (-> value) 2)
- same value as (sqr (-> value)) |
| Comparison | Example |
(= v1 v2 ...) |
(= (-> value1) (-> other value1))
short circuit equality test for first argument against all others |
(!= v1 v2 ...) |
(!= (-> value1) (-> other value1))
short circuit test for first argument being not equal to all others |
(> v1 v2 ...) |
(> (-> value) 10.0) |
(>= v1 v2 ...) |
(>= (-> value) 10.0) |
(< v1 v2 ...) |
(< (-> value) 10.0) |
(<= v1 v2 ...) |
(<= (-> value) 10.0) |
(and v1 v2 ...) |
(and (> (-> value) 10.0) (< (-> value) 20.0))
Short circuit test returning true if all arguments evaluate to true |
(or v1 v2 ...) |
(or (> (-> value) 10.0) (< (-> value) 20.0))
Short circuit test returning true if any arguments evaluate to true |
(not v1) |
(not (> (-> value) 10.0))
is equivalent to (<= (-> value) 10.0) |
| Regular Expressions | Example |
(match regex value) |
(match '^(\w+)' (-> name))
Would retrieve the first word from the value of the 'name' property. match returns the value of the first group in the expression or the
whole match if no group is defined. |
| Charts | Example |
(pie-chart linkset label data [explode]) |
(pie-chart parent label data)
For a set of objects defined by gpm.getLinkSet("parent"), returns a PieChart
object taking labels from each members "label" property and values from
it's "data" property that can be rendered to a Java2D Graphics
surface, or write "jpeg" and "png" formats to an outputstream. |
(pie-chart linkset label data) |
(bar-chart parent label data)
For a set of objects defined by gpm.getLinkSet("parent"), returns a BarChart
object taking labels from each members "label" property and values from
it's "data" property that can be rendered to a Java2D Graphics
surface, or write "jpeg" and "png" formats to an outputstream. |
| Security | Example |
(md5 v1 ...) |
(md5 (-> name))
Concatanates a list of string values and returns a base 64 string representing the message digest computed using the MD5 algorithm |
| Conditionals | Example |
(cond (t1 v1) ...) |
(cond ((< 10 (-> value)) 'lessThan') (#t 'greaterOrEqual')))
general conditional expression, returning value of last expression of the first successfully tested block |
(case val (t1 v1) ...) |
(case (-> name) ('tony' 'PM') ('cherie' 'QC')))
variation that tests single test value for equality against other expressions, returning the last expression of the block that matches |
| Example | |
(email smtp from to subject content) |
(email smtp.server me@mycompany.com you@yourcompany.com 'Data Change' (-> data))
will send an email whenever the value of the data property is updated |
| Java Callback | Example |
(call [target] meth a1 ...) |
(call doMethod (-> name))
will invoke the method doMethod on the local object with
the value of the name lookup passed as first parameter.
(call (-> assoc) doMethod (-> name))
will do something similar but invoke the method on the object returned by the lookup on the assoc property. |
| Code Blocks | Example |
(let ([(n v)...]) ...) |
(let ((n (-> name))) (str 'hi there ' n))
binds the result of the lookup (-> name) to the symbol n
which is then used in any following formula - in the case above equivalent to:
(str 'hi there ' (-> name)) |