Python decode utf8

9/15/2023

Check out this thread for more information: What is the fool proof way to convert some string (utf-8 or else) to a simple ASCII string in python Unfortunately, the string.encode() method is not always reliable. The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky You may want to brush up on Unicode and UTF-8 and encodings. With codecs.open('config/index/'+index, 'r', 'utf8') as findex:Ĭursor.execute('SELECT COUNT(id) AS nbr FROM artistes WHERE nom=%s', (artiste,))Ĭursor.execute('INSERT INTO artistes(nom,status,path) VALUES(%s, 99, %s)', (artiste, artiste + u'/')) Sql = mdb.connect('localhost','admin','ugo&( ','music_vibration', charset='utf8') It may actually work better if you used codecs.open() to decode the contents automatically instead: import codecs Use unicode objects, not str objects when querying or inserting, but use sql parameters so the MySQL connector can do the right thing for you: artiste = code('utf8') # it is already UTF8, decode to unicodeĬ.execute('SELECT COUNT(id) AS nbr FROM artistes WHERE nom=%s', (artiste,))Ĭ.execute('INSERT INTO artistes(nom,status,path) VALUES(%s, 99, %s)', (artiste, artiste + u'/')) You also really don't want to write out the UTF-8 BOM, unless you have to support Microsoft tools that cannot read UTF-8 otherwise (such as MS Notepad).įor your MySQL insert problem, you need to do two things:Īdd charset='utf8' to your nnect() call. You'd want to use codecs.open() instead, which returns a file object that will encode unicode values to UTF-8 for you. If you instead build up unicode values instead, you would indeed have to encode those to be writable to a file. Just write your data directly to the file, there is no need to encode already-encoded data. UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128) > data.encode('utf8') # Try to *re*-encode it

> data = data.encode('utf8') # encoded to UTF-8 That is what is failing here: > data = u'\u00c3' # Unicode data When you try to do that, Python will first try to decode it to unicode before it can encode it back to UTF-8. You don't need to encode data that is already encoded.

0 Comments

Python decode utf8

Leave a Reply.

Author

Archives

Categories