Categories

Why use an ORM at all, anyway ?

Update [2006/04/06] : This post from Fredrik Lundh gives a neat trick on how to iterate on DBAPI cursors. This should be written in the documentation !

Well, I don’t know about you, but I find that Java’s JDBC and Python’s DBAPI 2.0 are lousy APIs. They are both more or less based on ODBC parlance, which dates back to (Google, help me here)… 1992. And it shows. Granted, you don’t need utra brite APIs to manipulate relational data, but come on…

1) Both API do no allow the developer to use their favorite language’s iterator construct.

You cannot use for(Row row in statement.execute()) (Java) or for row in cursor.execute() or any construct like that. You have to use things like resultSet.next() or cursor.fetchone() and test for false or None and feel like in the eighties (with a little bit of effort you can ear the first few bars of Enola Gay somewhere in the back of your head). Enumeration, Iterator, generators anyone ? Nope, you don’t get to use that.

The situation in Python is rather ugly since you cannot use the affectation operator in an if statement. So you you have to write boring and unpythonic code like this :

cursor.execute('select * from foobar')
row = cursor.fetchone()
while row:
     # do something
     row = cursor.fetchone()

See that row = cursor.fetchone() being written twice ? It’s as if the people how designed the DBAPI chose the design that would directly leverage the weaknesses of Python. This, in a language that support iterators and generators, is unbearable. The net result is that you start to write a generator wrapper around this construct, and bam, you’re on your way to write a full-fledged ORM in less time than needed to say « Object-relational mapper ». First little step :

def resultset(cursor,request):
     cursor.execute(request)
     row = cursor.fetchone()
     while row:
         yield row
         row = cursor.fetchone()

# and then you can do things like :
for row in resultset(cursor,'select * from foobar'):
     print row

2) Indexing columns by number is a pain in the ass

In Python, a row as obtained above is a tuple (that, at least, stick pretty much to pure relational algebra). You use integer indexing to get the value of a column. Which of course means that if you add a column in the middle of the request, you’re in deep pain. This also means that using select * from ... is a big no-no. So very rapidly you get the habit of wrapping the row in a dictionary, using meta-data from the cursor :

def resultset(cursor,request):
     cursor.execute(request)
     column_names = [column[0] for column in cursor.description]
     row = cursor.fetchone()
     while row:
         yield dict(zip(column_names,row))
         row = cursor.fetchone()

Thus you get a dictionary per row (1). And from a dictionary per row to a class instance per row, the path is pretty straightforward… Which is too bad because a lot can be done just with a dict, especially in a dynamically typed language like Python.

In Java, it’s much better. The ResultSetResultset class has at least the gusto to provide both getString(int columnIndex) and getString(String columnName) methods. I just remember back in the day when I was still coding in Java the anxiety I felt about losing a few bytecodes for the name resolution… Aaah those were the days.

3) Java programmer crave for classes, not dictionaries

So JDBC got it right by providing a no-hassle way to get a column value given its name. Except… except that using Map-like structures to pass around data feels weird in Java, so you promptly wrap the data you get from JDBC into a class instance. Soon you begin to write extremely boring boilerplate like aPerson.name = resultset.getString("name") and in a matter of minutes you tell yourself « humpf, why not try to generate this code automagically ?« .

Seriously, when I switched from Java to Python three years ago, I had to follow an intensive detoxication program. I had to stop writing a class for each and every piece of data that needed to be moved around and use a dict instead (2). I had to submit before the power and speed of Python’s dict and stop thinking that I could do better. I feel much better now.

Since what comes out of SQL request is typed out of Java’s scope (think about partial requests, views, schema evolution etc.), why not admit it and carry around maps ? When your application is mostly a pipeline from the database to some HTML code on a web page, forcing the data into an object hierarchy is just overkill.

Of course, one day you really need to build a business object hierarchy and implement business rules in methods, but you know what ? A lot of business rules have been implemented and ran for decades now without the slightest hint of an object hierarchy. It’s dirty, it’s not fun, but it can handle millions of rows in a blaze. Can your nice object hierarchy and ORM machinery do the same thing ?

4) Prepared statements are a pain to use in both APIs

And that’s too bad, since prepared statements are the best way to shield your code against SQL injection vulnerabilites, on top of being theoretically faster than standard statements.

In JDBC, setting PreparedStatement parameters is awful : you have to use the correct setXYZ(int index,XYZ value), XYZ being the type of the parameter. If the JDBC driver is nice enough, you can use setObject and hope everything will be converted automagically (and correctly…). The problem (before Java 1.5 and its autoboxing feature) was that to set, say, an integer value, you had to wrap it in an Integer instance, or use setInt and then have problems when you need a SQL NULL value, so you’d have to use setNull AND provide the type of the NULL… Need I say more ? Awful, I told you.

On top of that, parameters are not named, you have to set them by giving their index, and contrary to Python, there are no way to quickly write a list or tuple in Java. So you write pStatement.setInt(1,42); pStatement.setInt(2,777); and you just hope you’ll never have to add a parameter in the middle of that 10-parameters request, for fear of having to manually reindex the setXXX calls…

In Python, that’s where you have to tackle the fun panaché of parameters styles defined in PEP 249. If you only ever program for one database software in your Python coder’s life, that’s okay. But if you want to write code that’s a tiny wee bit portable from one database to another, then either you’re lucky and they use the same parameters styles, or you have to build an API that will wrap this mess. And we’re gone again to building a better database API, where the temptation to write an ORM is great…

Another pretty annoying thing in Python is that with positioned parameters styles, the parameters have to be passed into a tuple (with named parameters, dicts are allowed). The DBAPI implementations that I used fo far insist on the tuple thing : I cannot pass a list or any other indexed object. When you have a single parameter, it forces you to write an ugly cursor.execute(request,(parameter,)). Also, you sometimes have to dynamically build a request, so your parameters list is dynamic too, and you have to convert in into a tuple. To help me with prepared statement parameters, I’ve found this kind of code useful :

def resultset(cursor,request,*args,**kwargs):
     cursor.execute(request,args or kwargs)
     column_names = [column[0] for column in cursor.description]
     row = cursor.fetchone()
     while row:
         yield dict(zip(column_names,row))
         row = cursor.fetchone()

# Now you can write :
for row in resultset(cursor,'''     select * from foobar where answer=? and whatever=? ''',21*2,"hello, 'world'"):
    print row

The *args, **kwargs syntax is one of the little glitches of Python in term of elegance (meaning it looks like Perl ;-) . It is just the syntax for variable positioned parameters (args is therefore a tuple of parameters) and variable named parameters (kwargs being a dict of named parameters).

5) And that’s only from the top of my head

Contributions are welcomed. Doubly welcome, I’d say, since given the number of comments (~20) on my one-year old blog, I’m beginning to feel a bit lonely ;-) .

Conclusion

To be clear, my point is that the common use case of ORM is not really about object-relational mapping. ORMs are simply used by the developer to get an API in which they feel more comfortable than the default database API. A strong dislike of SQL is also a reason for a lot of developers to turn to an ORM, not really for the object semantics but rather as a way to flee SQL. In this blog entry, one of the reasons Titus Brown needs an ORM is that understanding joins, especially left joins, is too difficult… If that’s not a nice case of programmer’s lazyness (the one where you end up doing twice the work by programming instead of folding up your sleeves and doing the hard work the hard way), I don’t know what it is !

From my experience, the ease of use criteria are more appealing than the need for an object view. People quickly grew tired of writing the same ugly code to do basic things, so they looked for something better. I guess the object-relational part comes naturally since « writing a better database API » sounds as lame as « building a better mousetrap ». So people started thinking about how they could make things a little better for us poor developers and got a little carried away by the « hey, let’s make this relational database look as a bunch of object » idea. I know, I’ve done it, too :-) .

If only relational APIs like JDBC or DBAPI had been a tad more user friendly, the need for something better would not have been so strong, and the many mistakes (and few big successes) of ORM would not have been done. Maybe that’s why it’s now that some mid-level APIs could become more popular, now that developers feel a bit disappointed by the lures of ORM. At least that’s what I though when I heard about PAT.


Notes

(1) And again, I’ve not even scratched the surface of the many weird ideas you can get when writing such code, like the one where I decided that building a dict per row was too much (as if it mattered in Python) and that I’d rather precompute the index for each column name and stick to rows being tuples.

(2) The irony, of course, is that classes are largely and cleverly built on top of dictionary and functions.

View Comments à to “Why use an ORM at all, anyway ?”

  • Hi Nicolas,
    have you had a look at MySQL’s handler call?
    dev.mysql.com/doc/refman/…
    this may be a nice alternative to your dislike of SQL, and seems like a nice fit for ORM type operations

  • Alexander Kozlovsky:

    See PEP 249, Section "Optional DB API Extensions". In many DBAPI implementations the cursor object have .next() and .__iter__() methods, hence instead of fetchone you can write:

    cursor.exectute(sql)
    for column1, column2, column3 in cursor:
    process row

  • Be careful about the fetchone & yield when using MySQL. It gets a lock on the table and doesn’t unlock it until ALL the rows have been fetched. For MySQL you are better off returning cursor.fetchall().

  • I think you should go back to 1996 when Michael Lorton and I designed the DBAPI. You’ll find Python was a very different beast at that time. Most of your rant just wouldn’t apply.

    Also, you don’t seem to understand the "design point" of the API. What were we trying to accomplish?

    You’re saying, "it should have made it easy for me to write my code". Well, that’s wrong, so your rant is so far off base, it is silly.

    We designed it to make it easy for the EXTENSION WRITER. Not the application. It has always been so easy to write those extra layers in Python, that it just wasn’t important to force every single extension writer to put it into their code. Let it be handled at the Python level. Again, let’s go back to 1996. Why make it easy for the extension writer? There were about two or three extensions for databases back then. That’s it. And they all sucked balls. The DBAPI design made it easy to write an extension, so they WERE written.

    The DBAPI dramatically changed the landscape of database access in Python. But we choose a different design criteria than what you believe we should have. I say "phooey". We made the right choice. If you don’t like the API, then you can write a SMALL amount of Python (as you’ve demonstrated) to remedy the situation.

  • Hi Greg,

    I certainly didn’t want to specifically diss on DBAPI 2.0. Note that I don’t find Java any better in the database API department. The API you designed is indeed good for what it was meant – a database abstraction API. And as you said, you made it easy to build a DBAPI layer for any database.

    The only thing I disagree with in your comment is that making life easier for a lot of application developers seems more important to me than making life easier for a few extension writers (granted, I am an application writer, so the contrary would be surprising).

    The net result of this choice is that there are a lot a ways to use a cursor, which is Not a Good Thing in Python-land. The point of my article is that we wouldn’t have to cope with tens of ORM implementations today if the DBAPI was a little more friendly to application writers from the beginning.

    Yet again, as today’s update shows, there are a lot of tricks that can improve the usage of the DBAPI without adding a single ligne of Python code. The for row in iter(cursor.fetchone, None) form should be documented right within the DBAPI specs as « The one and only true way to iterate on a cursor, unless you want to use fetchall()« .

  • Our view was that "making it easy for application developers" could be done in Python, rather than by the extension. Expose the functionality up to Python, then let it take over. I think we can agree that writing code in Python is easier than in C :-)

  • Jack Diederich:

    Be careful about the fetchone & yield when using MySQL. It gets a lock on the table and doesn't unlock it until ALL the rows have been fetched. For MySQL you are better off returning cursor.fetchall().

Laisser un commentaire

blog comments powered by Disqus