pyshapelib unicode saga

Bram de Greve bram.degreve at gmail.com
Thu Mar 15 15:53:12 CET 2007


Hi there,

For a moment there I thought I've seen my change to support unicode for the
filenames.  But it was only for a moment =)

I've looked in Python's source code how they handled things for their own
file object, and I've mimicked it as far as I could.
Key aspect seems to be to parse a string argument using "et" instead of "s"
and to use Py_FileSystemDefaultEncoding as encoding.
Except that it doesn't work ...

First of all, FileSystemDefaultEncoding is only defined for windows (mbcs)
and apple (utf-8),
and not for Linux (NULL, meaning default encoding, meanding ascii).  So
linux still gets plagued by the same error Didrik had before.
And yet, Python's file() seems to be able to copy with unicode filenames in
Linux.

Secondly, for windows mbcs is used, which is a lossy encoding (not all
unicode can be represented using mbcs).
This is necessary because the original shapelib library only uses the narrow
(char*) API, and on windows that means mbcs encoding.
To get full unicode support, the wide character API must be used instead
(_wfopen), but shapelib simply doesn't support that.
(Python's file() does precisely that on windows, in case of unicode it tries
to use the wide character API)

Then there's also the issue of the encoding of the field names and the
string values.  The easiest solution would be to fix everything
on UTF-8 but I believe we could do better.  It should be able to specify the
encoding when opening or creating a DBFFile, defaulting
to perhaps something specified by the locale.

There's also the issue of backwards compatibility.  Getting strings in the
DBFFile isn't a problem since we can check whether the
caller passes a unicode or a classic string, but getting out is.  Should be
always return unicode strings and risk some
incompatibilities with calling code, or should be try to diversify (perhaps
based on the used encoding,
ascii encoding could return classic strings, or maybe based on another flag
...)

Bram

-- 
hi, i'm a signature viruz, plz set me as your signature and help me spread
:)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://intevation.de/pipermail/thuban-devel/attachments/20070315/b12c3589/attachment.html


More information about the Thuban-devel mailing list