I was helping someone today in the Django IRC channel and the question came across about storing a denormalized data set in a single field. Typically I do such things by either serializing the data, or by separating the values with a token (comma for example).
Django has a built-in field type for CommaSeparatedIntegerField, but most of the time I’m storing strings, as I already have the integers available elsewhere. As I began to answer the person’s question by giving him an example of usage of serialization + custom properties, until I realized that it would be much easier to just write this as a Field subclass.
So I quickly did, and replaced a few lines of repetitive code with two new field classes in our source:
Update: There were some issues with my understanding of how the metaclass was working. I’ve corrected the code and it should function properly now.
SerializedDataField
This field is typically used to store raw data, such as a dictionary, or a list of items, or could even be used for more complex objects.
from django.db import models try: import cPickle as pickle except: import pickle import base64 class SerializedDataField(models.TextField): """Because Django for some reason feels its needed to repeatedly call to_python even after it's been converted this does not support strings.""" __metaclass__ = models.SubfieldBase def to_python(self, value): if value is None: return if not isinstance(value, basestring): return value value = pickle.loads(base64.b64decode(value)) return value def get_db_prep_save(self, value): if value is None: return return base64.b64encode(pickle.dumps(value))
SeparatedValuesField
An alternative to the CommaSeparatedIntegerField, it allows you to store any separated values. You can also optionally specify a token parameter.
from django.db import models class SeparatedValuesField(models.TextField): __metaclass__ = models.SubfieldBase def __init__(self, *args, **kwargs): self.token = kwargs.pop('token', ',') super(SeparatedValuesField, self).__init__(*args, **kwargs) def to_python(self, value): if not value: return if isinstance(value, list): return value return value.split(self.token) def get_db_prep_value(self, value): if not value: return assert(isinstance(value, list) or isinstance(value, tuple)) return self.token.join([unicode(s) for s in value]) def value_to_string(self, obj): value = self._get_val_from_obj(obj) return self.get_db_prep_value(value)

8 Responses to "Custom Fields in Django"
Perhaps I’m missing something obvious, but I fail to understand the need to check for an ending “=” char in the encoded value before decoding and unpickling.
For instance, consider:
>>> import base64
>>> import cPickle as pickle
>>> base64.b64encode(pickle.dumps(['t','e','s','t']))
‘KGxwMQpTJ3QnCmFTJ2UnCmFTJ3MnCmFTJ3QnCmEu’
Section 2.2 of RFC 3548 seems to indicate that the ‘=’ padding is not required.
Ahh, seems I don’t know base64 as well as I thought. Thanks David!
We’ll just have to hope Django’s example of checking the value first is a waste of time.
hi,
this is very nice. You said “So I quickly did, and replaced a few lines of repetitive code with two new field classes in our source:” Does this mean that this will get into Django?
I doubt it will go into trunk, but the above code wasn’t quite working. I’ve found more issues with it, due to Django’s custom field classes not being very well documented.
For future readers, the above code *is* working.
I'm running into the problem you documented, where Django's enthusiasm for to_python() means that this doesn't (seem to) work for Fields intended to handle strings.
We're trying to write a legacy-compatible Django app that can run alongside a legacy, non-unicode-aware PHP app. MySQL has the legacy tables as 'latin1' and the PHP is writing and reading UTF-8 bytestrings as payload in the textual columns.
One option we're tossing around is to change settings.DATABASE_OPTIONS['use_unicode'] to False, which means we get back the same bytestrings in Django, which we can then .decode('utf8'), but we're leery of this.
I came across this trick that will un-munge the munged unicode strings we get back if we leave 'use_unicode' set to True: value = munged_string.encode('iso-8859-1').decode('utf8'). Oof.
So maybe we can put this trick in to_python()—and put the converse in get_db_prep_value()—in a custom Field. But to_python seems to be called more than once, as you say.
Any advice?
P.S. Thanks for your djangosphinx module. Kudos. We're using that too, in the rewrite of of our site. It's positively a lifesaver.
[...] 21:41 ????? ?? ????? ?? ???????????? ????? ??? ?????????? ????, ??????????? ??????????? ??? ?? [...]
Hi and thanks for this post. Let me bring up one issue with SerializedDataField which I suspect is a bug. I have a model class A with a serialized field s and a class B with a foreign key to A called a. When I do a A.objects.get(pk = …).s it works just fine but when I do a B.object.get.filter(…).values(‘a__s’) I get the raw base64-ed pickled string. I am sorry my example is not executable but the real thing is a bit more complicated than that. Let me know if you are looking into this and I will try and make a real example if that helps.
Leave A Reply