Categories

Archive pour la catégorie ‘General’

Awesome Python trick : tail recursion implementation through a decorator

« A new tail recursion decorator that eliminates tail calls for recursive functions is introduced. » Link.

The second version is pretty smart : it does not use any particular stack inspection trick. The tricky thing is that the recursive call to factorial() is in fact calling the decorated version, which condenses three different behaviors : a driver loop, a simple return of it arguments (among which the accumulated result), or a recursive call to the original function. This makes my head hurt a little, but it’s brilliant :) .

Tu nous manques, Bazouf.

Mon grand-père paternel, que nous surnommons Bazouf, nous a quitté brutalement hier. Je l’aimais profondément et sa disparition laisse un grand vide.

Depuis une dizaine d’années, il habitait avec ma grand-mère assez loin pour qu’on ne se voie pas aussi souvent que nous l’aurions voulu. Le 2 octobre dernier, je lui avais confié son arrière petite fille Violette et mon père avait pris cette photo. On le reconnaît bien dans ce sourire en coin.

Bazouf a été pour moi tout ce qu’un grand-père passionné par les trains, les avions, l’histoire et la généalogie peut être pour un petit fils curieux : une mine d’or de savoir, qu’il savait mettre en pratique en m’emmenant voir une locomotive à vapeur de passage à Lisieux, en m’emmenant faire mon baptême de l’air, en m’expliquant patiemment ses derniers travaux en généalogie ou en tant que correspondant de La Vie du Rail.

Mais Bazouf n’était pas que docte, il maniait l’ironie, la contrepèterie, le calembourg et avait un étonnement sans cesse renouvelé face à la bêtise de nos contemporains, étonnement dont il nous faisait part en relatant dans son style bien particulier des anecdotes qui lui étaient arrivées.

Son absence me laisse avec une foultitude de petits détails qui me reviennent en permanence. C’était un bon vivant, la seule personne que j’ai vu saupoudrer son melon de sucre roux, et il avait une manière bien à lui de se caresser des moustaches fictives pour nous montrer à quel point il se régalait. De ses années passées au Brésil, travaillant pour Simca do Brasil, Bazouf avait gardé l’habitude de dire « pronto » bien fort. Et, bien sûr, il avait une façon bien particulière d’éternuer, qui s’entendait à des kilomètres, et que je tiens de lui sans le moindre doute.

Un jour seulement que tu nous as quitté et déjà tu nous manques, Bazouf.

Strings mess

WARNING : I may use a little bit of coarse language in this blog entry, so please take your children away from the screen.

So the other day I was writing that the problem with C++ was not in the language, but rather in the build environment. Well, it turns out that I had forgotten about the situation with strings.

Man, I was going to write a very long and thourough survey of the many types of strings in the C/C++ world, but those guys have already done it for the Win32 world. Please be their guest and read The Complete Guide to C++ Strings Part I – Win32 Character Encodings and Part II – String Wrapper Classes

That was nice, wasn’t it ? No ? My, my, my.

So, basically, you have three levels of language (C, C++, C++ with templates), times two methods for string length storage (zero-terminated strings versus Pascal-like strings), times four encoding strategies (assuming that the ANSI codepage is used, assuming that a default encoding is used, MBCS and Unicode), times different memory management strategies (caller management, callee management, reference couting…), times the number of development frameworks Microsoft released (MFC, OLE, VB, ATL…), times the number of times a developer team had the hubris of thinking that none of the already existing string type is as cool as the one he can build, equals a huge mess.

It’s not even scary, it has reached the point where it is ridiculous. And since all the combinations have not been tried, we can only expect the mess to grow. For instance, some guys from Microsoft’s WMI team found that char*, wchar_t*, BSTR, _bstr_t, CComBSTR, CString, std::string and a bunch of other types were not sufficient, and that it was necessary to invent a new string time, CHString. Oh, I see it’s deprecated, but you know, there is some code using it somewhere and one day or another you’ll have to learn about this class if you want to interface with it.

At this point I must say I’m pretty happy to understand how Unicode works, what are encodings and how Unicode strings and bytes strings are related. Apparently everybody is not up to date on this, and it’s a shame, because it’s not that hard. I really feel sorry for the people trying to find their way in this pile of string types without understanding that a Unicode character is not the same thing as its 16-bit encoding. Then again, maybe some of those people are the cause of this mess… Anyway.

Of course, you can decide to stick to the « standard » and use the STL strings. Great, but did you notice that ‘s’ at the end of strings ? Yeah, even in STL, you have to choose your character type, so either you write all your code against std::basic_string and decide about it later, or choose between std::string and std::wstring (which are typedef to std::basic_string and std::basic_string respectively) and regret about it when you’ll have to interface with some code that has chosen otherwise.

Anyway, in Effective STL, Scott Meyers lists at least four different ways std::basic_string<T> is implemented, each way requiring you to think differently about how you should use strings. For example, some implementations use reference counting, so pass-by-value is cheap and memory is automatically managed. For other implementations, pass-by-values means copying the whole string, which is not the same at all in terms of performances.

Of course, the open source world isn’t free of this problem, for example KDE introduces its QString, Gnome its gchar* type, Mozilla its nsAString hierarchy, and so on and so forth.

This balkanization of the string types is not only inefficient at development time, it is also inefficient at runtime. All this work converting from one type to another isn’t free (though sometimes it is only a matter of compile-time cast so it is effectively free). Some data has to be copied and transformed just because it has to go from one string universe to another.

Of course, all this is not a problem in a sanely designed language like Java (released 10 years ago) or C# (released 5 years ago). For this not to happen, the language design must feature a true, unique, human-centered string type, not something like an array of bytes. It is quite surprising, then, to notice that there are not so many languages in which the string is a first-class type, and not just a collection of bytes.

Python is a bit in the middle ground, since it has something like a string type but with two versions, an old-style 8-bits character str and a Unicode string type named unicode. Of course, this poses a lot of problem, especially since there are discrepancies in the manner the str are converted into unicode and vice-versa. So sometimes you provide a str to some code that want a unicode instance and everything breaks down (well, you get an exception, not like the mayhem you could get when mixing character types in C/C++).

For all its merits, D has failed to see this and only provides us a char[] type and its Unicode equivalent. We can expect a lot of the aforementioned problems to appear in the D world due to this lack of design.

Let’s sum it all by saying that a true, unique, Unicode string type is something that should not be forgotten during the design of any computer language. Scorning the string type and trying to abstract it as a collection of bytes is a sure way to make your language less efficient, both in development time and in runtime. The blooming of Java and .NET development is surely related to the way both platforms can handle text in a simple and uniform way.

And now, I’m back to my workbench trying to transform those dang BSTR* into std:wstring

Strings mess

WARNING : I may use a little bit of coarse language in this blog entry, so please take your children away from the screen.

So the other day I was writing that the problem with C++ was not in the language, but rather in the build environment. Well, it turns out that I had forgotten about the situation with strings.

Man, I was going to write a very long and thourough survey of the many types of strings in the C/C++ world, but those guys have already done it for the Win32 world. Please be their guest and read The Complete Guide to C++ Strings Part I – Win32 Character Encodings and Part II – String Wrapper Classes

That was nice, wasn’t it ? No ? My, my, my.

So, basically, you have three levels of language (C, C++, C++ with templates), times two methods for string length storage (zero-terminated strings versus Pascal-like strings), times four encoding strategies (assuming that the ANSI codepage is used, assuming that a default encoding is used, MBCS and Unicode), times different memory management strategies (caller management, callee management, reference couting…), times the number of development frameworks Microsoft released (MFC, OLE, VB, ATL…), times the number of times a developer team had the hubris of thinking that none of the already existing string type is as cool as the one he can build, equals a huge mess.

It’s not even scary, it has reached the point where it is ridiculous. And since all the combinations have not been tried, we can only expect the mess to grow. For instance, some guys from Microsoft’s WMI team found that char*, wchar_t*, BSTR, _bstr_t, CComBSTR, CString, std::string and a bunch of other types were not sufficient, and that it was necessary to invent a new string time, CHString. Oh, I see it’s deprecated, but you know, there is some code using it somewhere and one day or another you’ll have to learn about this class if you want to interface with it.

At this point I must say I’m pretty happy to understand how Unicode works, what are encodings and how Unicode strings and bytes strings are related. Apparently everybody is not up to date on this, and it’s a shame, because it’s not that hard. I really feel sorry for the people trying to find their way in this pile of string types without understanding that a Unicode character is not the same thing as its 16-bit encoding. Then again, maybe some of those people are the cause of this mess… Anyway.

Of course, you can decide to stick to the « standard » and use the STL strings. Great, but did you notice that ‘s’ at the end of strings ? Yeah, even in STL, you have to choose your character type, so either you write all your code against std::basic_string and decide about it later, or choose between std::string and std::wstring (which are typedef to std::basic_string and std::basic_string respectively) and regret about it when you’ll have to interface with some code that has chosen otherwise.

Anyway, in Effective STL, Scott Meyers lists at least four different ways std::basic_string<T> is implemented, each way requiring you to think differently about how you should use strings. For example, some implementations use reference counting, so pass-by-value is cheap and memory is automatically managed. For other implementations, pass-by-values means copying the whole string, which is not the same at all in terms of performances.

Of course, the open source world isn’t free of this problem, for example KDE introduces its QString, Gnome its gchar* type, Mozilla its nsAString hierarchy, and so on and so forth.

This balkanization of the string types is not only inefficient at development time, it is also inefficient at runtime. All this work converting from one type to another isn’t free (though sometimes it is only a matter of compile-time cast so it is effectively free). Some data has to be copied and transformed just because it has to go from one string universe to another.

Of course, all this is not a problem in a sanely designed language like Java (released 10 years ago) or C# (released 5 years ago). For this not to happen, the language design must feature a true, unique, human-centered string type, not something like an array of bytes. It is quite surprising, then, to notice that there are not so many languages in which the string is a first-class type, and not just a collection of bytes.

Python is a bit in the middle ground, since it has something like a string type but with two versions, an old-style 8-bits character str and a Unicode string type named unicode. Of course, this poses a lot of problem, especially since there are discrepancies in the manner the str are converted into unicode and vice-versa. So sometimes you provide a str to some code that want a unicode instance and everything breaks down (well, you get an exception, not like the mayhem you could get when mixing character types in C/C++).

For all its merits, D has failed to see this and only provides us a char[] type and its Unicode equivalent. We can expect a lot of the aforementioned problems to appear in the D world due to this lack of design.

Let’s sum it all by saying that a true, unique, Unicode string type is something that should not be forgotten during the design of any computer language. Scorning the string type and trying to abstract it as a collection of bytes is a sure way to make your language less efficient, both in development time and in runtime. The blooming of Java and .NET development is surely related to the way both platforms can handle text in a simple and uniform way.

And now, I’m back to my workbench trying to transform those dang BSTR* into std:wstring

Performancing test

This is a test of how Performancing supports Dotclear, or vice versa. So far it looks cool, though the WYSIWYG editing mode in Performancing is a bit weird with paragraph spacing. Plus, I can’t find the fixed-spacing font settings ! How am I supposed to insert code fragments in my blog ?

Update : this is an edit test.

A Brief Look at C++0x

C++ inventor Bjarne Stroustrup describes the next version of the C++ standard.

For all the coolness that this next version is supposed to introduce (no, it’s not the name), nothing is said about what seems to me the biggest issue of the C/C++ world, namely the diversity of build standards.

C++ is a pretty tough language with a zillion built-in ways to shoot yourself in the foot. Writing C++ is a tad less scary and a little more user friendly than juggling with chainsaws, in that the error messages do not involve bandaid or ER procedures. But that’s all. This being said, a C++ compiler is an extremely powerful tool which, given the right set of safety measures (courtesy of Mr Scott Meyers, need I say this) can help you produce some of the most blazingly fast code ever seen on earth. Template metaprogramming is not a feature of C++, it’s THE feature. Have a look at Boost or STXXL to see what I mean.

BUT. The whole process of running the compiler and linking with third party libraries is a mess. In the Java or .NET world, integrating a third party library is pretty easy : you take the JAR file or Assembly DLL, put it in a place where your compiler can find it, and bam, you’re done. Nowadays, building a Java program is done the same way everywhere, provided you use ANT which is a de facto standard.

In the C/C++ world, you have Makefiles with their 70s coding style (a perfect case for the nastiness of significant whitespace), configure scripts, static and dynamic libraries in different, mutually incompatible formats. One compiler suite won’t sometimes accept libraries built with another compiler suite. You have configure scripts which are only there to try to build a kind of standard environment to develop against, unless of course the platform you’re trying to build your program doesn’t support them. You have different build tools like make, nmake or even bjam for Boost. Each time a new version of a compiler is released, the build configuration files must adapt or wither and die. Library developers have to support a hundred different build environment, which frankly is rarely what they were expecting to do when writing their library.

Python tries (and succeeds) in saving the developer some time with its setup system (the distutils module), but unfortunately binaries built from Python 2.3 under MSVC 6 aren’t compatible with those built under MSVC 7.1. If you can’t afford an old license of MSVC 6, you can try using the MinGW suite but that’s a brand new set of kung-fu moves to learn. But you won’t ever, never succeed in building a Python 2.3 extension in MSVC 7.1 or a Python 2.4 extension with MSVC 6. Maybe. I don’t know. I can’t bear it anymore.

At the end of the day, I don’t know about you, but I for one get a lot more headaches struggling with the build mess than tackling with C++ delicacies. C and C++ are more inherently portable than the build environment that surround them. How’s that for a paradox ? The complex is portable where the supposedly simple is not.

Unfortunately the only solution is once again to reinvent the wheel and try to force the whole world to drink the new Koolaid. I’m not convinced this has a chance to succeed… But that’s really sad to see the C++ baby disposed with the dirty bath water that the build system can be.

Quick performance note about DBAPI implementations for Win32

Believe me or not, but according to my crude bench, adodbapi is about 50 times slower than mxODBC for a basic scan of a table of 900,000 lines from a Microsoft SQL Server 2000 server (227 seconds versus 4 seconds). This is not especially surprising since adodbapi is written in pure Python and use the pywin32 API to access ADO COM objects, whereas mxODBC is a Python module written in pure C.

I’ve been using mxODBC for two years now as a part of the mxODBC Zope DA, and I have never been disappointed (except by Zope but that’s another story). I’ve just realized that I have been using mxODBC as a DBAPI driver on another production server (no Zope this time) without a proper license (mxODBC is not free for commercial use), so I’ve bought two (one per CPU), for a pretty good price (150€ total) given the performance :) .

Cool faces

Faces is a powerful, flexible and free project management tool.

I’m currently on the lookout for a new project management software (PMS). I can’t bear Microsoft Project anymore. For maybe ten years now Microsoft Project has consistently refused to support more than one level of undo. Suppose you have a nice project which is all balanced and tidy and cool. You make a little edit that breaks something, but you don’t notice yet. You make another edit. Duh, that’s when you notice your first mistake. You’re stuck, pal. Undoing will undo your last edit, and undoing again will redo your last edit. Sigh… So basically when I’m building a project schedule, I’m wasting a lot of time saving & commiting my work to a Subversion repository, so that I can roll back to a good state if I’ve broke something in the project. This, along with the fact that it’s very easy to break anything, due to the disastrous tendency of Project to letting default computation overwrite carefully hand-filled data, is too much. We’re in 2005, aren’t we ?

And I won’t write about the curious arithmetics and rouding behaviour that MS Project features. You know, the kind where 0.25 days + 0.75 days = 0.997754 days. How fun it is when you present your schedules to your boss or customer.

It seems like the limitation on undo levels is an industry standard, though. I imagine PMS designers, with totally uncool 80′s wool sweaters, checking features on a piece of punched paper : « Stubborn one-level undo, check ! ». The whole scene would take place in a neon-lit, windowless basement room. Indeed, Open Workbench, the free, open-sourced version of Computer Associate Clarity, is almost as good as Microsoft Project in this domain. I mean, they have the same one-level undo feature. How cool.

I don’t know how PS8.5 behaves in the undo departement, but I’m surely highly doubtful that PSNext, the new, web-based version has more than one level of undo. I wonder if it even has one level of undo. Heck, managing database consistency when multiple users are working simultaneously is difficult enough, with a set of pessimistic or optimistic locking strategy, but I can’t imagine what the problem becomes if you try to enter undoable actions in the process.

One of the funniest thing to do when you use a PMS is changing the project structure. For various reasons, on one of my biggest projects, I had to present three differents cost structure to my customer, all based on the same project. The first time, I structured the project phases and tasks the way I was used to do. Then my client requested that I divided the work another way to suit its comparison grids (that was during presales). Then once we won the project, my client requested another structure for the project, to suit its buying departement needs. Believe me, there isn’t any PMS out there that will be actually helpful while doing those various restructurations. Quite the contrary.

The end result was that I quit using MS Project and reverted to Excel, which has more than one level of undo, making sure that the project budget was the same berfore and after the restructuration. This was then that I discovered the weird rounding mistakes Project does. Unfortunately (or not), Excel does not do the same mistakes, so you have to manually alter some computations so that the final budget matches the previous one to the cent. Otherwise, it would not look serious enough, I mean, if you cannot rebalance your budget without changing the net result, you might as well drop the whole thing.

Using Excel for project management is not a bad choice, as Joel Spolsky demonstrated. But you know, PMS could provide an added value over doing everything manually with Excel, like computing critical paths, trying to schedule resources, and actually provide help when tens of tasks are changed, when new deadlines are introduced, when additional works appears and you have to justify elapsed & remaining time to your customer so that you may eventually charge him for the new features.

I’m writing all that because, you know, the project with three major budget restructurations ? I have to rebuild it a fourth time, taking into account elapsed time, new features, new deadlines and so on. It was difficult enough when the project wasn’t started and everything was pristine, but now, well, it’s quite overwhelming. Anyway, onward !

That’s why I find faces very cool : the editing interface is a text editor with a clever code completion engine (need I say that it supports multiple undo levels ?), and what you edit is Python code ! The code looks like that :

class Bob(Resource):
  pass

class Alice(Resource):
  pass

def My_Project():
  resource = Alice | Bob

  def Task1():
    start = "2005-1-16"

    def Foobar():
      effort = "5d"
    
    def Buffer():
      resource = Alice & Bob
      effort = "4d"

  def Task2():
    start = up.Task1.end
    effort = "1w"

  def Task3():
    start = up.Task1.Foobar.end
    effort = "5d"

project = BalancedProject(My_Project)

From this project description, faces computes a scheduling (there are multiple scheduling algorithms), and can show you various diagrams and reports, including of course Gantt charts. It can even generate HTML pages so that you boss/customer doesn’t have to learn Python to understand the project structure :) . Since the file format is Python, you can use a whole bunch of already existing tools, including Subversion which should be handy to compare two different version of a same project. And of course, you’ve got a built-in scripting language to edit your project structure (or is it a built-in mini-language to describe the structure ???) so you can easily work around missing things like recurrent tasks (just build them in a loop).

Of course, this is totally a tool for programmers or the kind of people that rather write code than enter data in Excel… Which is why I can’t use it professionnaly, not being the only one to have to edit the project. That’s too bad ! But I’d like to send Michael Reithinger a big thumbs up for his work on faces. It’s very cool, extremely well designed, and professionally packaged (including a standalone installer for Windows) !

Why use an ORM at all, anyway ?

Update [2006/04/06] : This post from Fredrik Lundh gives a neat trick on how to iterate on DBAPI cursors. This should be written in the documentation !

Well, I don’t know about you, but I find that Java’s JDBC and Python’s DBAPI 2.0 are lousy APIs. They are both more or less based on ODBC parlance, which dates back to (Google, help me here)… 1992. And it shows. Granted, you don’t need utra brite APIs to manipulate relational data, but come on…

1) Both API do no allow the developer to use their favorite language’s iterator construct.

You cannot use for(Row row in statement.execute()) (Java) or for row in cursor.execute() or any construct like that. You have to use things like resultSet.next() or cursor.fetchone() and test for false or None and feel like in the eighties (with a little bit of effort you can ear the first few bars of Enola Gay somewhere in the back of your head). Enumeration, Iterator, generators anyone ? Nope, you don’t get to use that.

The situation in Python is rather ugly since you cannot use the affectation operator in an if statement. So you you have to write boring and unpythonic code like this :

cursor.execute('select * from foobar')
row = cursor.fetchone()
while row:
     # do something
     row = cursor.fetchone()

See that row = cursor.fetchone() being written twice ? It’s as if the people how designed the DBAPI chose the design that would directly leverage the weaknesses of Python. This, in a language that support iterators and generators, is unbearable. The net result is that you start to write a generator wrapper around this construct, and bam, you’re on your way to write a full-fledged ORM in less time than needed to say « Object-relational mapper ». First little step :

def resultset(cursor,request):
     cursor.execute(request)
     row = cursor.fetchone()
     while row:
         yield row
         row = cursor.fetchone()

# and then you can do things like :
for row in resultset(cursor,'select * from foobar'):
     print row

2) Indexing columns by number is a pain in the ass

In Python, a row as obtained above is a tuple (that, at least, stick pretty much to pure relational algebra). You use integer indexing to get the value of a column. Which of course means that if you add a column in the middle of the request, you’re in deep pain. This also means that using select * from ... is a big no-no. So very rapidly you get the habit of wrapping the row in a dictionary, using meta-data from the cursor :

def resultset(cursor,request):
     cursor.execute(request)
     column_names = [column[0] for column in cursor.description]
     row = cursor.fetchone()
     while row:
         yield dict(zip(column_names,row))
         row = cursor.fetchone()

Thus you get a dictionary per row (1). And from a dictionary per row to a class instance per row, the path is pretty straightforward… Which is too bad because a lot can be done just with a dict, especially in a dynamically typed language like Python.

In Java, it’s much better. The ResultSetResultset class has at least the gusto to provide both getString(int columnIndex) and getString(String columnName) methods. I just remember back in the day when I was still coding in Java the anxiety I felt about losing a few bytecodes for the name resolution… Aaah those were the days.

3) Java programmer crave for classes, not dictionaries

So JDBC got it right by providing a no-hassle way to get a column value given its name. Except… except that using Map-like structures to pass around data feels weird in Java, so you promptly wrap the data you get from JDBC into a class instance. Soon you begin to write extremely boring boilerplate like aPerson.name = resultset.getString("name") and in a matter of minutes you tell yourself « humpf, why not try to generate this code automagically ?« .

Seriously, when I switched from Java to Python three years ago, I had to follow an intensive detoxication program. I had to stop writing a class for each and every piece of data that needed to be moved around and use a dict instead (2). I had to submit before the power and speed of Python’s dict and stop thinking that I could do better. I feel much better now.

Since what comes out of SQL request is typed out of Java’s scope (think about partial requests, views, schema evolution etc.), why not admit it and carry around maps ? When your application is mostly a pipeline from the database to some HTML code on a web page, forcing the data into an object hierarchy is just overkill.

Of course, one day you really need to build a business object hierarchy and implement business rules in methods, but you know what ? A lot of business rules have been implemented and ran for decades now without the slightest hint of an object hierarchy. It’s dirty, it’s not fun, but it can handle millions of rows in a blaze. Can your nice object hierarchy and ORM machinery do the same thing ?

4) Prepared statements are a pain to use in both APIs

And that’s too bad, since prepared statements are the best way to shield your code against SQL injection vulnerabilites, on top of being theoretically faster than standard statements.

In JDBC, setting PreparedStatement parameters is awful : you have to use the correct setXYZ(int index,XYZ value), XYZ being the type of the parameter. If the JDBC driver is nice enough, you can use setObject and hope everything will be converted automagically (and correctly…). The problem (before Java 1.5 and its autoboxing feature) was that to set, say, an integer value, you had to wrap it in an Integer instance, or use setInt and then have problems when you need a SQL NULL value, so you’d have to use setNull AND provide the type of the NULL… Need I say more ? Awful, I told you.

On top of that, parameters are not named, you have to set them by giving their index, and contrary to Python, there are no way to quickly write a list or tuple in Java. So you write pStatement.setInt(1,42); pStatement.setInt(2,777); and you just hope you’ll never have to add a parameter in the middle of that 10-parameters request, for fear of having to manually reindex the setXXX calls…

In Python, that’s where you have to tackle the fun panaché of parameters styles defined in PEP 249. If you only ever program for one database software in your Python coder’s life, that’s okay. But if you want to write code that’s a tiny wee bit portable from one database to another, then either you’re lucky and they use the same parameters styles, or you have to build an API that will wrap this mess. And we’re gone again to building a better database API, where the temptation to write an ORM is great…

Another pretty annoying thing in Python is that with positioned parameters styles, the parameters have to be passed into a tuple (with named parameters, dicts are allowed). The DBAPI implementations that I used fo far insist on the tuple thing : I cannot pass a list or any other indexed object. When you have a single parameter, it forces you to write an ugly cursor.execute(request,(parameter,)). Also, you sometimes have to dynamically build a request, so your parameters list is dynamic too, and you have to convert in into a tuple. To help me with prepared statement parameters, I’ve found this kind of code useful :

def resultset(cursor,request,*args,**kwargs):
     cursor.execute(request,args or kwargs)
     column_names = [column[0] for column in cursor.description]
     row = cursor.fetchone()
     while row:
         yield dict(zip(column_names,row))
         row = cursor.fetchone()

# Now you can write :
for row in resultset(cursor,'''     select * from foobar where answer=? and whatever=? ''',21*2,"hello, 'world'"):
    print row

The *args, **kwargs syntax is one of the little glitches of Python in term of elegance (meaning it looks like Perl ;-) . It is just the syntax for variable positioned parameters (args is therefore a tuple of parameters) and variable named parameters (kwargs being a dict of named parameters).

5) And that’s only from the top of my head

Contributions are welcomed. Doubly welcome, I’d say, since given the number of comments (~20) on my one-year old blog, I’m beginning to feel a bit lonely ;-) .

Conclusion

To be clear, my point is that the common use case of ORM is not really about object-relational mapping. ORMs are simply used by the developer to get an API in which they feel more comfortable than the default database API. A strong dislike of SQL is also a reason for a lot of developers to turn to an ORM, not really for the object semantics but rather as a way to flee SQL. In this blog entry, one of the reasons Titus Brown needs an ORM is that understanding joins, especially left joins, is too difficult… If that’s not a nice case of programmer’s lazyness (the one where you end up doing twice the work by programming instead of folding up your sleeves and doing the hard work the hard way), I don’t know what it is !

From my experience, the ease of use criteria are more appealing than the need for an object view. People quickly grew tired of writing the same ugly code to do basic things, so they looked for something better. I guess the object-relational part comes naturally since « writing a better database API » sounds as lame as « building a better mousetrap ». So people started thinking about how they could make things a little better for us poor developers and got a little carried away by the « hey, let’s make this relational database look as a bunch of object » idea. I know, I’ve done it, too :-) .

If only relational APIs like JDBC or DBAPI had been a tad more user friendly, the need for something better would not have been so strong, and the many mistakes (and few big successes) of ORM would not have been done. Maybe that’s why it’s now that some mid-level APIs could become more popular, now that developers feel a bit disappointed by the lures of ORM. At least that’s what I though when I heard about PAT.


Notes

(1) And again, I’ve not even scratched the surface of the many weird ideas you can get when writing such code, like the one where I decided that building a dict per row was too much (as if it mattered in Python) and that I’d rather precompute the index for each column name and stick to rows being tuples.

(2) The irony, of course, is that classes are largely and cleverly built on top of dictionary and functions.

Spyced : A point for ORM developers to remember

At the risk of proceeding to beat a dead horse, didn’t anyone look at those code samples and think, « wow, our ORM code is way the hell uglier than the vanilla SQL? »

Source: Spyced

I must confess I wrote my own ORM in Python. Twice. The second try is a bit better (using metaclasses and all), and you don’t have to jump through hoops to make SQL requests. But in the latest version, I did the stupid mistake of trying to build a request micro-language, using operator overloading (& and | like in Django).

Well, this is a bad idea. Indeed, invariably, what is easily written in SQL looks a tad uglier in Python, and what is not easily written in SQL cannot be written in Python. So my decision is made : I’ll revert all my « ooooh-I’m-so-cleverly-using-operator-overloading » request generator code and stick to writing SQL with a little bit of help from the ORM (because, you know, DBAPI 2.0 is somewhat dumb about query parameters).