pyshapelib unicode saga
Philippe Le Grand
aemphil at gmail.com
Thu Mar 15 16:25:15 CET 2007
The DBF format specifies that field names and the contents of
character fields are ASCII, using the OEM code page (a.k.a. IBM PC
code page, a.k.a. code page 437; see wikipedia).
I believe FoxPro uses a flag to identify alternate codepages at offset
1Dh in the header of the file, but whether that is actually part of
the standard is unclear to me.
You can find dbf file specs at:
http://www.dbf2002.com/dbf-file-format.html (dbf III+ ,IV)
or http://www.dbase.com/KnowledgeBase/int/db7_file_fmt.htm (dbf VII)
The dbf associated with shapefiles is version III+, I believe.
For portability (which is the only relevant purpose of shapefiles as
far as I an concerned), you might want to restrict yourself to the
most common features of the standard, i.e. ASCII field names and
character field contents.
Thanks for your work. I hope to be able to soon start testing, and
giving you feedback.
On 3/15/07, Bram de Greve <bram.degreve at gmail.com> wrote:
> Then there's also the issue of the encoding of the field names and the
> string values. The easiest solution would be to fix everything
> on UTF-8 but I believe we could do better. It should be able to specify the
> encoding when opening or creating a DBFFile, defaulting
> to perhaps something specified by the locale.
> There's also the issue of backwards compatibility. Getting strings in the
> DBFFile isn't a problem since we can check whether the
> caller passes a unicode or a classic string, but getting out is. Should be
> always return unicode strings and risk some
> incompatibilities with calling code, or should be try to diversify (perhaps
> based on the used encoding,
> ascii encoding could return classic strings, or maybe based on another flag
> hi, i'm a signature viruz, plz set me as your signature and help me spread
> Thuban-devel mailing list
> Thuban-devel at intevation.de
More information about the Thuban-devel