Contact: fumanchu@aminus.org

Log in as guest/dejavu to create tickets

I think I've seen this ORM somewhere before...

Ticket #45 (defect)

Opened 5 years ago

Last modified 4 years ago

memorize fails on unicode with umlaut in list

Status: closed (fixed)

Reported by: m. dietrich <mdt@emdete.de> Assigned to: fumanchu
Priority: critical Milestone: 1.5
Component: Storage Keywords:
Cc: Estimate (total hours): 3

if i store a list with a unicode string containing an umlaut like:

(u'äbc', )

the memorize() fails with

dejavu/storage/db.py", line 1048, in reserve

values = u", ".join(values) UnicodeDecodeError?: 'ascii' codec can't decode byte 0xe4 in position 7: ordinal not in range(128)

Change History

02/27/06 23:49:55: Modified by fumanchu

  • status changed from new to closed.
  • resolution set to fixed.
  • component changed from Misc to Storage.
  • milestone set to 1.5.

Fixed in [172]. All AdapterToSQL coercions now MUST return encoded strings, not unicode.

08/11/06 12:14:39: Modified by fumanchu

  • priority changed from major to critical.
  • status changed from closed to reopened.
  • estimate set to 3.
  • resolution deleted.

This wasn't entirely fixed. It works with PostgreSQL in SQL_ASCII encoding, but not UNICODE. Apparently the pickle module chooses between two encodings: utf-8 and 'raw-unicode-escape'. We can't use the utf-8 mode (protocol 1 or 2), because other pieces of that protocol stick null bytes into the result, which e.g. PostgreSQL won't accept in an SQL string. Unfortunately, the custom and undocumented 'raw-unicode-escape' encoding leaves some characters between 127 and 255, which PostgreSQL in SQL_ASCII mode accepts, but in UNICODE mode does not.

08/11/06 23:12:36: Modified by fumanchu

  • status changed from reopened to closed.
  • resolution set to fixed.

Fixed in [282] with some aggressive octal escaping.