Why use an ORM at all, anyway ?
By Nico on Friday, December 2 2005, 22:44 - General - Permalink
Update [2006/04/06] : This post from Fredrik Lundh gives a neat trick on how to iterate on DBAPI cursors. This should be written in the documentation !
Well, I don't know about you, but I find that Java's JDBC and Python's DBAPI 2.0 are lousy APIs. They are both more or less based on ODBC parlance, which dates back to (Google, help me here)... 1992. And it shows. Granted, you don't need utra brite APIs to manipulate relational data, but come on...
1) Both API do no allow the developer to use their favorite language's iterator construct.
You cannot use for(Row row in statement.execute()) (Java) or for row in cursor.execute() or any construct like that. You have to use things like resultSet.next() or cursor.fetchone() and test for false or None and feel like in the eighties (with a little bit of effort you can ear the first few bars of Enola Gay somewhere in the back of your head). Enumeration, Iterator, generators anyone ? Nope, you don't get to use that.
The situation in Python is rather ugly since you cannot use the affectation operator in an if statement. So you you have to write boring and unpythonic code like this :
cursor.execute('select * from foobar')
row = cursor.fetchone()
while row:
# do something
row = cursor.fetchone()
See that row = cursor.fetchone() being written twice ? It's as if the people how designed the DBAPI chose the design that would directly leverage the weaknesses of Python. This, in a language that support iterators and generators, is unbearable. The net result is that you start to write a generator wrapper around this construct, and bam, you're on your way to write a full-fledged ORM in less time than needed to say "Object-relational mapper". First little step :
def resultset(cursor,request):
cursor.execute(request)
row = cursor.fetchone()
while row:
yield row
row = cursor.fetchone()
# and then you can do things like :
for row in resultset(cursor,'select * from foobar'):
print row
2) Indexing columns by number is a pain in the ass
In Python, a row as obtained above is a tuple (that, at least, stick pretty much to pure relational algebra). You use integer indexing to get the value of a column. Which of course means that if you add a column in the middle of the request, you're in deep pain. This also means that using select * from ... is a big no-no. So very rapidly you get the habit of wrapping the row in a dictionary, using meta-data from the cursor :
def resultset(cursor,request):
cursor.execute(request)
column_names = [column[0] for column in cursor.description]
row = cursor.fetchone()
while row:
yield dict(zip(column_names,row))
row = cursor.fetchone()
Thus you get a dictionary per row (1). And from a dictionary per row to a class instance per row, the path is pretty straightforward... Which is too bad because a lot can be done just with a dict, especially in a dynamically typed language like Python.
In Java, it's much better. The ResultSetResultset class has at least the gusto to provide both getString(int columnIndex) and getString(String columnName) methods. I just remember back in the day when I was still coding in Java the anxiety I felt about losing a few bytecodes for the name resolution... Aaah those were the days.
3) Java programmer crave for classes, not dictionaries
So JDBC got it right by providing a no-hassle way to get a column value given its name. Except... except that using Map-like structures to pass around data feels weird in Java, so you promptly wrap the data you get from JDBC into a class instance. Soon you begin to write extremely boring boilerplate like aPerson.name = resultset.getString("name") and in a matter of minutes you tell yourself "humpf, why not try to generate this code automagically ?".
Seriously, when I switched from Java to Python three years ago, I had to follow an intensive detoxication program. I had to stop writing a class for each and every piece of data that needed to be moved around and use a dict instead (2). I had to submit before the power and speed of Python's dict and stop thinking that I could do better. I feel much better now.
Since what comes out of SQL request is typed out of Java's scope (think about partial requests, views, schema evolution etc.), why not admit it and carry around maps ? When your application is mostly a pipeline from the database to some HTML code on a web page, forcing the data into an object hierarchy is just overkill.
Of course, one day you really need to build a business object hierarchy and implement business rules in methods, but you know what ? A lot of business rules have been implemented and ran for decades now without the slightest hint of an object hierarchy. It's dirty, it's not fun, but it can handle millions of rows in a blaze. Can your nice object hierarchy and ORM machinery do the same thing ?
4) Prepared statements are a pain to use in both APIs
And that's too bad, since prepared statements are the best way to shield your code against SQL injection vulnerabilites, on top of being theoretically faster than standard statements.
In JDBC, setting PreparedStatement parameters is awful : you have to use the correct setXYZ(int index,XYZ value), XYZ being the type of the parameter. If the JDBC driver is nice enough, you can use setObject and hope everything will be converted automagically (and correctly...). The problem (before Java 1.5 and its autoboxing feature) was that to set, say, an integer value, you had to wrap it in an Integer instance, or use setInt and then have problems when you need a SQL NULL value, so you'd have to use setNull AND provide the type of the NULL... Need I say more ? Awful, I told you.
On top of that, parameters are not named, you have to set them by giving their index, and contrary to Python, there are no way to quickly write a list or tuple in Java. So you write pStatement.setInt(1,42); pStatement.setInt(2,777); and you just hope you'll never have to add a parameter in the middle of that 10-parameters request, for fear of having to manually reindex the setXXX calls...
In Python, that's where you have to tackle the fun panaché of parameters styles defined in PEP 249. If you only ever program for one database software in your Python coder's life, that's okay. But if you want to write code that's a tiny wee bit portable from one database to another, then either you're lucky and they use the same parameters styles, or you have to build an API that will wrap this mess. And we're gone again to building a better database API, where the temptation to write an ORM is great...
Another pretty annoying thing in Python is that with positioned parameters styles, the parameters have to be passed into a tuple (with named parameters, dicts are allowed). The DBAPI implementations that I used fo far insist on the tuple thing : I cannot pass a list or any other indexed object. When you have a single parameter, it forces you to write an ugly cursor.execute(request,(parameter,)). Also, you sometimes have to dynamically build a request, so your parameters list is dynamic too, and you have to convert in into a tuple. To help me with prepared statement parameters, I've found this kind of code useful :
def resultset(cursor,request,*args,**kwargs):
cursor.execute(request,args or kwargs)
column_names = [column[0] for column in cursor.description]
row = cursor.fetchone()
while row:
yield dict(zip(column_names,row))
row = cursor.fetchone()
# Now you can write :
for row in resultset(cursor,'''
select * from foobar where answer=? and whatever=?
''',21*2,"hello, 'world'"):
print row
The *args, **kwargs syntax is one of the little glitches of Python in term of elegance (meaning it looks like Perl ;-). It is just the syntax for variable positioned parameters (args is therefore a tuple of parameters) and variable named parameters (kwargs being a dict of named parameters).
5) And that's only from the top of my head
Contributions are welcomed. Doubly welcome, I'd say, since given the number of comments (~20) on my one-year old blog, I'm beginning to feel a bit lonely ;-).
Conclusion
To be clear, my point is that the common use case of ORM is not really about object-relational mapping. ORMs are simply used by the developer to get an API in which they feel more comfortable than the default database API. A strong dislike of SQL is also a reason for a lot of developers to turn to an ORM, not really for the object semantics but rather as a way to flee SQL. In this blog entry, one of the reasons Titus Brown needs an ORM is that understanding joins, especially left joins, is too difficult... If that's not a nice case of programmer's lazyness (the one where you end up doing twice the work by programming instead of folding up your sleeves and doing the hard work the hard way), I don't know what it is !
From my experience, the ease of use criteria are more appealing than the need for an object view. People quickly grew tired of writing the same ugly code to do basic things, so they looked for something better. I guess the object-relational part comes naturally since "writing a better database API" sounds as lame as "building a better mousetrap". So people started thinking about how they could make things a little better for us poor developers and got a little carried away by the "hey, let's make this relational database look as a bunch of object" idea. I know, I've done it, too :-).
If only relational APIs like JDBC or DBAPI had been a tad more user friendly, the need for something better would not have been so strong, and the many mistakes (and few big successes) of ORM would not have been done. Maybe that's why it's now that some mid-level APIs could become more popular, now that developers feel a bit disappointed by the lures of ORM. At least that's what I though when I heard about PAT.
Notes
(1) And again, I've not even scratched the surface of the many weird ideas you can get when writing such code, like the one where I decided that building a dict per row was too much (as if it mattered in Python) and that I'd rather precompute the index for each column name and stick to rows being tuples.
(2) The irony, of course, is that classes are largely and cleverly built on top of dictionary and functions.

