Ticket #45 (defect)
Opened 5 years ago
Last modified 4 years ago
memorize fails on unicode with umlaut in list
Status: closed (fixed)
| Reported by: | m. dietrich <mdt@emdete.de> | Assigned to: | fumanchu |
|---|---|---|---|
| Priority: | critical | Milestone: | 1.5 |
| Component: | Storage | Keywords: | |
| Cc: | Estimate (total hours): | 3 | |
if i store a list with a unicode string containing an umlaut like:
(u'äbc', )
the memorize() fails with
dejavu/storage/db.py", line 1048, in reserve
values = u", ".join(values) UnicodeDecodeError?: 'ascii' codec can't decode byte 0xe4 in position 7: ordinal not in range(128)
Change History
02/27/06 23:49:55: Modified by fumanchu
- status changed from new to closed.
- resolution set to fixed.
- component changed from Misc to Storage.
- milestone set to 1.5.
08/11/06 12:14:39: Modified by fumanchu
- priority changed from major to critical.
- status changed from closed to reopened.
- estimate set to 3.
- resolution deleted.
This wasn't entirely fixed. It works with PostgreSQL in SQL_ASCII encoding, but not UNICODE. Apparently the pickle module chooses between two encodings: utf-8 and 'raw-unicode-escape'. We can't use the utf-8 mode (protocol 1 or 2), because other pieces of that protocol stick null bytes into the result, which e.g. PostgreSQL won't accept in an SQL string. Unfortunately, the custom and undocumented 'raw-unicode-escape' encoding leaves some characters between 127 and 255, which PostgreSQL in SQL_ASCII mode accepts, but in UNICODE mode does not.
08/11/06 23:12:36: Modified by fumanchu
- status changed from reopened to closed.
- resolution set to fixed.
Fixed in [282] with some aggressive octal escaping.

Fixed in [172]. All AdapterToSQL coercions now MUST return encoded strings, not unicode.