Tuesday, December 9, 2008

working with utf16 in python

I got a file from a client that was exported from SQL Server and was encoded as utf16. I needed to do some work on it. I had to google around a bit to find some help on handling the gobble-de-gook that I was seeing

>>> f = open("f:\\contact.txt","r")
>>> l = f.readline()
>>> l
'\xff\xfe6\x003\x003\x00D\x003\x00A\x008\.....\n'

Here is how to do it

>>> import codecs
>>> f = codecs.open("f:\\contact.txt", "r", "utf16")
>>> l = f.readline()
>>> l
u'633D3A84-3870-4A93-9755-000215260850,8568,NULL,Scooby,Shaggy,NULL,,NULL,mymail@address.com,1902-06-01 00:00:00.000,NULL\r\n'
>>>

No comments: