- Revisited Relational Database API
- Really Relational Database API *JDH My fave.
- Rational Relational Database API *Istvan: Rational Software(tm) probably won't let you use this moniker *DWM: They could blow me. The word 'rational' is still part of the English language. :-) *JDH: Beware the lawyers - a couple of weeks ago I saw on a billboard the phrase "Yes, It Does!"(TM) - how the <expletive deleted> can one trade mark such a general phrase?
Some mention has been made on this wiki of developing a database API as a profitable venture. (See Business Arrangements, Other Business Ideas.) I'll try to explain here what this idea is and where it comes from. This should be particularly helpful to Istvan. John and Dave are intimately familiar with all of this, and Jim hears a lot about it from me at work. I'll write this as a bit of a linear historical narrative, because I find that easier, and Istvan may find it mildly interesting.
All of us except Istvan work at Creative Solutions, a maker of 'shrink-wrap' accounting software, primarily for small accounting firms, in Dexter, Michigan. Creative Solutions is often anachronistically referred to as 'CSI', or Creative Solutions Incorporated. In fact, CSI is no longer incorporated, but is wholly owned by Thomson Corporation; this transition from public ownership occurred before I was hired in December of 1998.
The group that John, Dave, and I work in is responsible for the development of CSI's Checkbook Solutions product (CBS), which is a general-purpose accounting package for small businesses, akin to Intuit's QuickBooks. CBS is unusual among CSI's products. CSI's direct customers are accountants, but CBS is used by the accountant's clients. CBS exchanges data with CSA, a program which does some similar things, but is used by the accountants to track the finances of their customer's companies and provide services to them. (There is some overlap in functionality, but CBS provides additional modules such as Accounts Payable and Accounts Receivable, which rarely handled remotely by an accountant.) Jim and Andy work on CSA (among other things, I believe), and so did Frank Swierz until recently. As you might imagine, both CBS and CSA are database-intensive programs; however, being shrink-wrap products, they are much more like desktop applications than typical IT database applications. There is no large centralized database, no client-server architecture; we cannot diagnose or upgrade databases in a central location. All database administration must be highly automated, because it all happens at the customers' sites. Both products currently use Microsoft's Jet database engine, which is (was?) the default underpinning of their Access database application development environment.
When I first started working at CSI, my exact duties were not all that clear. It was evident that our manager, Greg, wanted me to focus on the severe performance problems in the product, but he also pushed me to learn MFC programming. Since I dislike UI programming, I mostly focused on performance, and began familiarizing myself with the way in which the database is used by the product. Since I didn't have a strong background in databases prior to this, I spent a lot of time educating myself.
I had always disdained database work in the past, because I considered it a boring and well-understood subfield of computer science. I spent some time researching object-oriented database technology, because I had been a long-time advocate of OOP. But I also attempted to give myself a good grounding in its predecessor, relational database theory. And the things I found out opened my eyes.
I found that I was both right and wrong about database technology being boring. It turns out that the most important aspects of database theory are in fact very well understood, in the form of relational database theory. But nobody uses the fruits of that understanding! DBMS vendors 'standardize' on SQL, but SQL itself is a very poor attempt to implement relational database concepts, few if any of its implementations actually achieve compliance with the standard, and most implementations diverge from the standard in various ways.
In fact, there seem to be no DBMS products that are truly relational. They all allow violations of basic tenets of relational theory, which can make application of theory to database design and programming difficult or impossible. More importantly, they all fall far short of the capabilities that the theory allows for, which seems to be the primary motivation for the ill-advised diversion of the industry into 'object-oriented' databases.
At the same time that I was learning all this on the side, my investigations of CBS convinced me that the primary source of performance problems was the stupid way (c'mon Dan, say what you really mean! ;-)) in which the program used the database. Combining data was done 'by hand', avoiding the use of SQL altogether in the newer parts of the program. The main reason for this, I believe, was an overreaction to, or incorrect choice of remedy for, resource usage issues in the program's interaction with the DBMS.
Fundamental to working with a lower-class DBMS like Jet is the fact that the database itself cannot represent all of the business logic or enforce all of the data constraints for a complex database. (And, seeing as CBS is a general-purpose accounting package, the database is quite complex.) So one needs a layer over the database to arbitrate between the DBMS and the application where these rules can be encoded. This layer also serves to isolate the application somewhat from the details of the particular DBMS being used. I began desiging a replacement for this layer that would rely more on the database's native query capabilities, called CsiDb.
While this was going on, I continued my hunt for a better understanding of relational theory, I very fortunately stumbled across a book by Chris Date and Hugh Darwen, subtitled The Third Manifesto (TTM). This book is a cry of the theorist against the industry's current infatuation with baroque, ad hoc, and ill-defined 'object-oriented database' concepts. Date & Darwen take a fairly detailed look at relational theory and simplify it further. They examine the issues of data types and inheritance in detail, and show how type inheritance can fit into the relational model without losing any of the model's benefits.
TTM describes a straw-man database language alternative to SQL, called Tutorial D. I started using D as a model for the C++ API that I was creating for CBS. I have been working on this project on-and-off for about two and half years now; much of that time I worked alone on it, fending off Greg's concerns about what the heck I was working on as best I could. In late 1999, together with Frank Drew (mentioned in the Member Directory), I jammed an early version of CsiDb into CBS to help with the new Accounts Payable module. This was only a partial success, for a variety of reasons, but I do think that it helped with performance.
Since then, CsiDb has become central to a major performance improvement and multi-user feature effort. John and Dave are now helping me with it, and the system has undergone a complete rewrite in the past few months. I have occasional correspondence with Chris Date, who has cleared up some of my theory misunderstandings. By June, the read-path through CsiDb should be fairly complete, when we expect to ship a new version of CBS. After that we will begin work on the write-path.
So what, exactly, does CsiDb do for a programmer? ; API based on relational algebra: Relational algebra is easier to learn than SQL. For reasons that are obscure, SQL ended up being a strange hybrid between relational algebra and relational calculus, and this is confusing. My own opinion is that most programmers are more comfortable with algebraic notation than they are with calculus notation. (It's certainly true for me.) ; DBMS-independent framework: CsiDb by itself is only a framework. The database designers need to provide a layer on top of it to specify their database's schema, and to bind the framework to utilities for interacting with a particular DBMS API. (This layer is currently called CbsRdb in our system. We only support Microsoft's DAO API, to access Jet databases, but I'm trying hard to keep the core framework ready for an eventual migration to SQL Server.) An expanded version of CsiDb could probably do away with this layer, replacing it with off-line tools to build the database from scratch via CsiDb, but in CBS it's more important that we work with the existing database and co-exist with older database access code. ; Complete isolation from DBMS details: SQL dialects vary annoyingly and significantly. When using CsiDb, you don't have to deal with SQL at all anymore. (John recently demonstrated to me that even some loe-level CbsRdb-internal DBMS-specific work could be done using the CsiDb relational expression API, without resorting to SQL or DAO.) ; Solid theoretical basis: CsiDb's interfaces are solidly grounded in relational database theory. This prevents or detects a host of common mistakes in database interactions. If you learn the theory, then CsiDb's APIs are easy to understand. Contrast this with SQL, where there so much time is spent learning the idiosyncracies of the language (and of the various DBMS' APIs) that database programmers often never get around to learning the theory well, with bad consequences for programs.
The thought occurred to me a month or so ago that it might be worthwhile to productize CsiDb. Since this is well outside of CSI's market, perhaps rights to the current code could be purchased from CSI, or CSI could be convinced to invest in a startup to productize it (even if the investment was simply a granting of the rights to the code). Worst case, I could recreate a similar system from scratch, in much less (elapsed) time. I doubt that anyone could argue a violation of IP rights, because CSI is not in the database tool business.
A lot more work would be needed to turn it into a useable system. The most important things are:
- The write path. Non-trivial, for sure, but the basic design ideas are firmly burned into my brain.
- Improve DBMS modularity, and support multiple popular systems.
- Improve the type system. John knows a lot about my desires in this direction, he's been helping with CsiDb's type system quite a bit recently.
- Schema definition and other essential database administrative tools. The method used currently for schema definition is ad-hoc, for lack of time to do something better. There are various ways of approaching this.
Possible future directions:
- Build a new DBMS from scratch based on the same principles as the API - a 'native' DBMS to go with the API.
Some relevant links:
- [DBDebunk]: Fabian Pascal site that ascerbically criticizes common database management and programming practices. Recently active again after two months of cryogenetic freeze.
- [The Third Manifesto]: A site by Date & Darwen that hasn't really got underway yet.
In spite of my obvious enthusiasm for this topic, I'm unsure as to the appropriateness of pursuing DB API development as a money-making venture. The IT sector is pretty entrenched in its ways, and this technology may not be of great interest to most of the Spooky Distance members. It's also worth mentioning, though, that an MMOG is likely to include a fair amount of database work, and I would love to have something like CsiDb as an interface to it. Perhaps two ventures could be started, a 'subsidiary' company for the database subsystem. Or even generalize that idea further, and identify additional subsystems of an MMOG that have the potential to be separately productized.
JDH: Somewhat independently of Dan and shortly after I began some more serious work on CsiDb I came to a similar conclusion about the possibility of this being a potentially saleable project. From my perspective every project I've worked upon has used the underlying DBMS poorly - it's not enough to simply bemoan the JET DBMS engine when it's being used so wrong-headedly (hey, I'm politer than Dan!) Up until CsiDb I've been of the opinion that people should do far more with SQL - SQL engines generally have the best performance characteristics (even bad ones). Dan is starting to educate me as to the shortcomings of SQL (although I'd still maintain that CBS would be in a better position today had more use of managed SQL been utilized at an early stage in the project. But I'm a CsiDb convert - it really is a powerful addition to RDBMS technoloy. Although Dan is having to implement this on top of a substandard SQL engine (JET) that's also a blessing because it's forcing data independence (which was one of the original corner stones from CJ Date (the granddaddy of RDBMSs).
Anyway, enough of the ramblings - here are my closing remarks. The majority of this effort belongs to Dan - and so it's up to Dan what, if anything he'd like to do and what, if anything he'd like to ask for outside help with. I suddenly realized quite how presumptuous even suggesting this as some sort of SpookyD "pre-game" project is! But I suspect that if any product were to come of this then there would be more than enough work to go around (Dan, yell at me if I'm wrong). My thought is that this technology could be developed to a fairly complete proto-typical stage and then try and interest a heavy hitter (MS, Oracle, Sybase, IBM etc.) To get a foot in the door with one of those guys' R&D department one would probably need a name associated with the product (or a letter of introduction) - so Dan, keep writing to Chris ;-)
OK, more rambling - what has this got to do with SpookyD? Contrary to Dan I think this has money-making potential. I think to make money from it will require some serious effort. I think my fellow SpookyD'ers would be able (and hopefully willing) to invest some of that effort. If this plays out then the money made could help supply the freedom to make SpookyD the game.
Somebody remind me to edit this page if Greg, our manager, should ever join SDx. :) And you just know that if he hears about this, he'll at least ask to see the site!
To clarify: I wouldn't have a problem with doing this as an SDx project, on the contrary. I hope the other members weigh in with an honest assessment of their interest in this.
I like the idea of getting a 'letter of introduction' or some other form of endorsement from Chris. We'd have to see if he was impressed with the results, first, though. :)
The API by itself probably could make some money. I'm sceptical about getting another company to pick it up; I suspect we'd more likely end up peddling it ourselves. But I really don't have any knowledge or experience on which to base my scepticism. If we did end up peddling it ourselves, then I think the follow-on project of developing a native DBMS for it would be even more attractive.
BTW, this project I would be willing to pursue as Open Source, without hesitation. That throws most of the money-making potential out the window, but I'd get great satisfaction in producing something that I know could help raise the standards of the DBMS industry. And as Open Source software, it's probably even more likely to make a splash with the likes of Date & Darwen. (Not that they appear to be Open Source advocates; but they are working relentlessly in their own way to re-educate the DBMS community.)