2

Apr

Filed in Django, PostgreSQL, iBegin with 3 comments |

Last week I made the decision to begin the conversion from MySQL to PostgreSQL. Well, more of a benchmark conversion, but none the less. So we begin the installation..

I installed PostgreSQL server, version 8.2, painlessly both locally and on our staging server. I then installed psycopg2 via Python setuptools. Nice, quick and easy, right? Segmentation Fault when I try to sync up the database. So I figure maybe setuptools is outdated, and I removed psycopg2 and grabbed it with Darwin Ports. “Warning: […]” everytime its initialized. I finally managed to get it working by downloading the source and compiling it. About an hour later, the only thing left to do is import the data…

First I needed to sync the database, manage.py syncdb. Yep, Django makes it quick and easy. Things worked fine, no problems yet. I began importing the www.ibegin.com database (12 million business listings, among other meta data). About 10% through California (which is massive) I get “Cannot Adapt”. First thing that comes through my head is “What the fuck kind of error message is that?”. I come from a strictly MySQL background, and I stand behind it to this day. After a little bit of searching, and pestering #postgresql on freenode, I learned that somehow the primary key sequence had corrupted itself.

Correcting the sequence was fairly easy. You select the max value currently, and then do setval() or nextval() respectively on the sequence to fix it. I had no idea why this had occurred however. I get done about 70% of the way through California, and notice my importer is slowing down. Oh, that’s right, my version of Django is still horrendously slow because it’s missing #17.

I quickly optimize my script to store everything in memory, and turn off settings.DEBUG because it wastes too damn much memory. I then continue the import from where it left off, and I notice that every so often it hangs. I check my indexes, and my select queries, and they’re all blazing fast. After a bit more pestering of the #postgresql people, I learned that PGSQL sucks hardcore on count queries. Now I didn’t plan to use them like I was in the importer, but that’s still a pretty big setback. So, the 1,000,000 extra meta rows that would exist, will have to wait until the import is fully finished.

I get done with California, and move on to Florida, another fairly large state. I screen it and go to bed. I wake up to Cannot Adapt Again, “What the fuck!?”. This was getting very tiresome, and still is. I’m still only about 50% done with the import and I’ve had this come up at least a half dozen times already.

All in all, PostgreSQL is neat. Transactions are great. Row-locking is great. MySQL’s InnoDB isn’t. It has a GIS framework. It causes a lot of headaches. Weighing the pros vs the cons, we’re going to stick with it. We’ll be deploying the new iBegin.com page (powered by Django) running a PostgreSQl 8.3 database, which will be pretty massive, to which I’m hoping that I don’t have to patch Django for ticket #17 to handle it.

3 Responses to "Segmentation Fault"

Subscribe to this topic with RSS or get the Trackback URL
feedchris (Apr 2nd):

interesting post, why the switch to postgre, if you don’t mind me asking? i have a background with mysql as well, have fought the urge so far to switch.

depesz (Apr 3rd):

hi, i use some postgresql myself, and if you will have any issues with migration and later on - i will be happy to help. just mail me.

trbs (Apr 3rd):

Before upgrading to the asome PostgreSQL 8.3 release, please be aware that Django has a small bug for this release:

http://code.djangoproject.com/ticket/6523

PgSQL 8.3 further improved there SQL specs and therefor disabled the autocasting to ::text for types. This is overall a very good thing and hopefully Django trunk will be patched soon.

Leave A Reply

 Username (*required)

 Email Address (*private)

 Website (*optional)

Note: Comments moderation may be active so there is no need to resubmit your comment.