Call us Toll-Free:
1-800-218-1525
Live ChatEmail us

 Sponsors

Multi Lingual Support - Unicode UTF8 and MySQL Character sets

Adrian Singer, 09-16-2008
If you're going to be supporting multiple languages on your website, there are a few easy tips you should follow when it comes to designing your HTML pages, storing information in the database and retrieving it.

Too many engineers spend too much time trying to figure out the basics of multi lingual support. Multi-lingual support can be easy once you know the rules. And so, I present to you -

The Ultimate Guide to Multi Lingual Support with PHP / MySQL

1. HTML Pages

All of your HTML pages should be designed for utf-8. There is absolutely no reason whatsoever to EVER use any other encoding other than utf8 for your HTML pages.

Set ALL your HTML pages with this content-type:


<head>
<
meta http-equiv="content-type" content="text/html; charset=utf-8" />
</
head>

2. MySQL Database Setup

Whenever connecting to the database, immediately set the character set to UTF8.

To do this, update your database connection function, adding these two calls, right after you connect to the database:


@mysql_query("SET NAMES 'utf8'");
@
mysql_query("SET CHARACTER_SET utf8");

Update all existing databases and tables to use utf8.

To update databases run:


@mysql_query("alter database DBNAME charset=utf8");

Then per each one of your table char fields where you are going to be storing multi lingual characters, run:


@mysql_query("alter table MYTABLE change FIELDNAME FIELDNAME text character set utf8");

3. Storing information in the database

To store information in the database, you have to convert it to utf8 first.

Here's how to convert an input string into utf8 before storing it in the database:


@mysql_query("INSERT INTO MYTABLE (body) values (convert("string" using utf8))");

It's important that you call the convert using utf8 routine everytime you Insert or Update "multilingual-enabled" fields in your database

4. Retrieving information from the database

If you've followed all steps up to this point, your HTML pages are set to utf8 content-type and information goes into your database as utf8.

Retrieving information from the database is the easy part. Use simple mysql_fetch_array, mysql_fetch_row etc. and do NOT encode / decode the output in any way shape or form.

Be wary of using string functions that are not unicode supported. Do a search-replace in your code for all functions such as htmlspecialchars() and pass the charset parameter as 'utf8'.

---

Followed these 4 steps and you're still running into problems? Seeing question marks or funny looking characters instead of your precious data?

If that's the case, you must be manipulating the data somewhere between the store - retrieve - display loop. Comb your code and verify you're following the steps I described to a tee. If you still can't figure it out, feel free to Contact us for assistance or post a comment on this thread.

Adrian Singer, 09-16-2008
Important:

If you're doing any conversions in MySQL to utf8, you MUST know the format of the source data.

The correct way to convert into utf8 is:


SET NAMES
'old_char_set';
UPDATE MYTABLE set FIELDNAME = convert("string" as utf8);

Slavi, 09-23-2008
Hello,

What if we use ||accept-charset="UTF-8" || in form declaration.
e.g.
<form id="profile_form" method="POST" accept-charset="UTF-8" onsubmit="" action="/member/register">

Do we still have to use: ".... values (convert("string" using utf8))");"

If Yes, will the data be double UTF8 encoded ?

Slavi

Adrian Singer, 09-25-2008
Slavi -

If your HTML content-type is set to utf8 and your form charset is set to utf-8, then you don't need to do the additional convert using utf8.

You only want to do that when you are converting from a different charset

Stick to utf-8 on your html pages and your MySQL will play along nicely

Mac, 03-19-2009
Hi!
I have one question regarding Multilingual.
I am MySQL DBA. I want my database to support multiple languages. Can I do it just by setting database character set as UFT8. But UTF8 is combination of unicode characters of 1-4 bytes and UCS2 is 16-bit.
Enjoyed this post?

Subscribe Now to receive new posts via Email as soon as they come out.

 Comments
Post your comments












Note: No link spamming! If your message contains link/s, it will NOT be published on the site before manually approved by one of our moderators.



About Us  |  Contact us  |  Privacy Policy  |  Terms & Conditions