Latin vs UTF-8

Latin is a single-byte encoding.

So, non-Latin characters will break if you try to display them using a Latin encoding.

UTF-8 is potentially a 3 byte encoding and can encode a great deal more. 

Here’s a Unicode/UTF-8 character table:

http://www.utf8-chartable.de/unicode-utf8-table.pl?start=256&unicodeinhtml=dec

If you’re using MySQL, here’s how to ALTER the table to convert to the UTF-8 character set. Note there’s a slight subtlety here in that, because of the extra bytes used by UTF-8, unless you specify the column type it might roll over to the next larger column type size. E.g. for VARCHAR:

ALTER TABLE table_name MODIFY column_name VARCHAR(255) CHARACTER SET utf8;

http://dev.mysql.com/doc/refman/5.0/en/alter-table.html

Leave a Reply

Your email address will not be published. Required fields are marked *