pyshapelib unicode saga

Bram de Greve bram.degreve at gmail.com
Thu Mar 15 16:33:56 CET 2007


Philippe Le Grand wrote:
> Bram,
>
> The DBF format specifies that field names and the contents of
> character fields are ASCII, using the OEM code page (a.k.a. IBM PC
> code page, a.k.a. code page 437; see wikipedia).
> I believe FoxPro uses a flag to identify alternate codepages at offset
> 1Dh in the header of the file, but whether that is actually part of
> the standard is unclear to me.
> You can find dbf file specs at:
> http://www.dbf2002.com/dbf-file-format.html (dbf III+ ,IV)
> or http://www.dbase.com/KnowledgeBase/int/db7_file_fmt.htm (dbf VII)
>
> The dbf associated with shapefiles is version III+, I believe.
>   
Great, I'll take a look at those ...
> For portability (which is the only relevant purpose of shapefiles as
> far as I an concerned), you might want to restrict yourself to the
> most common features of the standard, i.e. ASCII field names and
> character field contents.
>   
OK, so, basically, there's no work that needs to be done if we follow 
the specs strictly, since current implementation only allows ASCII anyway =)
But, OTOH, if Thuban aspires to use unicode all the way, we have found a 
barrier here.  Anyway, if there would be some extension to support other 
encodings, at the very least the default should be ASCII ...
> Thanks for your work. I hope to be able to soon start testing, and
> giving you feedback.
>
>   
That would be great.

Bram



More information about the Thuban-devel mailing list