Mon 20 Nov 2006
UTF-8, a Unicode encoding, is probably already the most used character encoding for new web applications, except maybe for Asia. The most popular open source database is MySQL. (But don’t miss the most advanced open source database, which I prefer.)
What do you need to do to have your database and web application be all UTF-8? MySQL offers a lot of places to configure character set (character encoding) and character collation (used for sorting and comparing text).
The most important rule is: Don’t rely on server configuration – you may not control it on the server you application will be running. When you write a web application that needs to run on a variety of operating systems and Linux distributions, all with their own default database configuration, you must make as little assumptions as possible about the system it will run on and its configuration. You may have configured your MySQL server to run your application correctly, but you may not have permission or the opportunity to reconfigure the MySQL server your application will run on. Fortunately you can specify all character set configuration in MySQL in places that you control: in the SQL scripts and the source code of your web application.
Stored data
One side of the problem is the data in the tables. If you control the CREATE DATABASE statement, you should specify the character set there:
CREATE DATABASE webapp
DEFAULT CHARACTER SET utf8;
On some web hosts, web applications have to use the one database that came with the hosting. The database has already been created for you. In that case you have to specify the character set in the individual CREATE TABLE statements:
CREATE TABLE gadgets (
name VARCHAR(255) PRIMARY KEY,
rating INT
) DEFAULT CHARACTER SET utf8;
It doesn’t hurt to specify the character set in both locations. (Except you’re violating the DRY Principle.) Now MySQL knows that the data in your tables is in UTF-8.
Communication with the database
The other side of the problem is the data that comes from and gets sent to the client. MySQL offers a lot of features here; you can have different character sets at almost every stage of data processing. To be all UTF-8, issue the following statement just after you’ve made the connection to the database server:
SET NAMES utf8;
This sets the character_set_client, character_set_connection and character_set_results variables to utf8. See below for the meaning of each of these variables.
Communication with the database also concerns SQL files you read with the MySQL command line client, or upload with phpMyAdmin. Put the statement at the top of every SQL file, like this:
SET NAMES utf8;
INSERT INTO TABLE gadgets (name, rating) VALUES ('iPod', 45);
If you’re talking to mysql from a command line that doesn’t understand UTF-8 (likely in Windows and older Linuxes), use the following statement to tell MySQL which character set you’re using on the client side:
SET CHARACTER SET cp1250;
This sets the character_set_client and character_set_results variables to cp1250. Upon arriving on the server, your data will be converted from CP1250 to UTF-8. Results returned to you will be converted from UTF-8 to CP1250.
Which character set do you need for the command line?
- utf8
- Modern Linux (Fedora, Ubuntu since 5.04, recent SuSE, recent Mandriva) for all languages. No need to issue a SET CHARACTER SET command here.
- latin1
- Western European Windows and Linux. English, Spanish, German, French, …
- cp1250
- Central European Windows. Polish, Czech, Slovak, Hungarian, Slovene, Croatian, Romanian, Albanian.
- latin2
- Central European Linux. Polish, Czech, Slovak, Hungarian, Slovene, Croatian, Romanian, Albanian.
- cp1251
- Cyrillic Windows. Russian, Ukrainian, Bulgarian, Belorussian, …
- koi8r
- Russian Linux.
- koi8u
- Ukrainian Linux.
- cp1256
- Arabic Windows.
- cp1257
- Baltic Windows. Estonian, Latvian, Lithuanian.
- latin7
- Baltic Linux. Estonian, Latvian, Lithuanian.
- latin5
- Turkish Linux.
Character set and collation variables
What do these character_set_* variables mean?
Variables concerning communication
- character_set_client
- This informs MySQL about the character set data from the client is encoded in.
- character_set_connection
- This is the character set MySQL converts incoming data to. It converts from character_set_client.
- character_set_result
- This is the character set MySQL converts outgoing data to. It converts from the character set specified for the data in the database / tables / columns.
Variables concerning data storage
- character_set_server
- The character set used for new databases if none is specified in the CREATE DATABASE statement. It can be set from the server configuration file, on the command line for mysqld and interactively in a database session.
- character_set_database
- The character set used for new tables if none is specified in the CREATE TABLE statement. It is set to character_set_server by default.
- character_set_system
- The character set for meta-data: database, table and column names. Its value is always utf8.
Feedback
Please leave a comment if this information was helpful to you. Also don’t hesitate to ask questions concerning the above. I’m not the best writer and some things are probably phrased incomprehensibly.
Last but not least I’m always interested in other topics concerning web programming that I could write about. What are you most interested in? What information do you need most?
December 17th, 2006 at 7:54
After years of having the charset issues with PHP und MySQL, your article (and especially the part with “SET NAMES utf8;”) solved my problems. Thanks a lot!
- Jérôme
December 27th, 2006 at 3:42
Great, it worked like a charm. Thank you for this informative post. As the title says, no more question marks! This post should be the first result in search engines for utf8 MySQL queries.
Thank you,
Asem Daaboul
January 24th, 2007 at 4:14
Test for TR characters.
ğüşiçöı
ĞÜŞİÇÖı
February 7th, 2007 at 22:39
I’m a novice mysql user and wondering about how can i change the Collation to UTF8 on all of the tables on each databases ( have 35 db running on my linux ) websserver with not much work. At the moment i have to change this by hand by phpmyadmin. But there must be a much simple way to do this i guess.
March 14th, 2007 at 23:24
FYI: You must be using MySQL 4.1 or higher: set names ‘utf8′
October 16th, 2007 at 14:15
Hi,
I have 5.0.27-standard MySQL Community Edition – Standard (GPL)on linux machine.
By default the character_set_client,connection,result are set to utf8 and am creating my database with default caharacter set utf8.
But it is not supporting the greek characters, can any one please help me.?
October 16th, 2007 at 19:45
Prakash, please be more specific: what does “it is not supportin greek characters” mean?
October 18th, 2007 at 17:16
Hi Markus thanks for the reply,
Its,when i insert the greek text into the database and if i see the database it shows some unknown characters(like ??????), but the samething works fine in my windows machine with same MySQL setup.
October 19th, 2007 at 16:29
Prakash,
what do you use to look at the database? How is that application set up in regard to character sets and encodings?
October 23rd, 2007 at 12:04
Hi Markus,
Am using netbeans..
I have a java servlet application for updating greek news. When i insert some text, it displays some unknown characters, am suspecting the issue is with MySQL installed on linux since it works fine on my windows machine.
Following is the SQL Syntax am using:
CREATE DATABASE `DBName` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci;
CREATE TABLE `TName` (
`PK` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
`DATE` DATE NOT NULL ,
`TYPE` VARCHAR( 10 ) NOT NULL ,
`TITLE` VARCHAR( 500 ) NOT NULL ,
`TEXT` VARCHAR( 1000 ) NOT NULL
) DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci ENGINE = innodb;
November 25th, 2007 at 19:58
Thanks for the article – it inspired a solution. I had an encoding problem with PHP & MySQL on my production server. Cyrillic symbols from database were displayed as question marks in a PHP-driven web page. There were no problems on test server. After a couple of hours of changing settings in MySQL, I discovered this line added right after mysql_connect solved the problem:
mysql_query(’SET CHARACTER SET utf8;’);
December 29th, 2007 at 21:51
Latin2 test: Árvíztűrő tükörfúrógép
February 24th, 2008 at 23:22
Thanks for the article. Everything that one have had to look up through the whole manual to collect all the pieces is now presented in one article. Great work!
I’ve noticed that specifying this (I’m using cp1251 user interface and UTF-8 database):
init_connect = ‘SET CHARACTER SET cp1251′
in my.ini file doesn’t work for me.
Where as executing ‘SET CHARACTER SET cp1251′ query at the beginning of SQL session works fine.
March 21st, 2008 at 0:55
more advanced latin2 test:
lowercase:
csíkcsőszopó pürégöbös hűsítőváza
uppercase:
SZOPÓCSÉSZECSŰRŐ CSOMÓKURVAMEGSTRICIVANAWESTENDBEN BÜDÖSTAHÓBUNKÓSEGGNYŰVELŐSZOPÓFEJŰ
BUZIÁLLATJOBBOSKÖCSÖGÖKKELVANTELEAZORSZÁG
DELESZAROMMERTMÁRELHÚZTAMONNANHAHAHA
March 23rd, 2008 at 10:00
does anyone knows if there is any other information about this subject in other languages?
March 24th, 2008 at 12:28
hello All,
I importing CSV data files which having huge data in unicode text format.
Data goes properly into mysql. It show me in phpmyadmin. when i choose to edit it returns MySQL returned an empty result set (i.e. zero rows).
SELECT *
FROM `cosmos`.`tbl_hotel`
WHERE CONVERT( `tbl_hotel`.`hotel_id`
USING utf8 ) = ‘996′
AND CONVERT( `tbl_hotel`.`hotel_name`
USING utf8 ) = ‘Lampang River Lodge’
AND CONVERT( `tbl_hotel`.`hotel_address_i`
USING utf8 ) = ‘”330 moo 11 chompoo district, amphur muang”‘
AND CONVERT( `tbl_hotel`.`hotel_address_ii`
USING utf8 ) = ”
AND CONVERT( `tbl_hotel`.`city_name`
USING utf8 ) = ‘Lampang’
AND CONVERT( `tbl_hotel`.`country_name`
USING utf8 ) = ‘Thailand’
AND CONVERT( `tbl_hotel`.`min_show_rate`
USING utf8 ) = ‘55′
AND CONVERT( `tbl_hotel`.`linkurl`
USING utf8 ) = ‘http://www.agoda.com/asia/thailand/lampang/lampang_river_lodge.html’
AND CONVERT( `tbl_hotel`.`hotel_description`
USING utf8 ) = ‘”The Lampang River Lodge is convenient to everything Lampang has to offer, and is only 10 km (6 miles) away from Lampang Intern’
AND CONVERT( `tbl_hotel`.`star_rating`
USING utf8 ) = ‘3′
AND CONVERT( `tbl_hotel`.`hotel_images`
USING utf8 ) = ‘http://images.agoda.com/hotels/996/main/996.jpg\r’;
please tell me what i do for editing.
also i want search operation on same table.
when i get text from text box & retrives data from table & fire like query it show nothing (returns empty);
Please give me solution for same
Thanks in advance
April 10th, 2008 at 20:21
Cheers from Copenhagen.
Excellent article – you just helped me a lot! Thank you very much!
April 15th, 2008 at 21:05
I’ve been looking at the MySQL manual and various PHP articles for about an hour now. Thank you for posting this.
I can now get back to programming making my clients happy!!!
Much respect,
A. Sagarwala (Toronto, Canada)
May 12th, 2008 at 20:29
thank you for your posting. it helped me a lot.
i am using PHP Version 5.2.3-1ubuntu6.3 and mysql 5. i am able to save charecters like solo porque sí and ğüşiçöı. i can see in my phpmyadmin but when i try to see my own page i see ?�?i��? this for ğüşiçöı. i have used both in php file header(’Content-Type: text/html; charset=utf-8′ ); and html file
can you help meplz?
May 14th, 2008 at 21:37
thanks a lot! question marks have been haunting me for a while, not anymore!
àòçèéùì!
May 29th, 2008 at 17:57
Excellent guide. Moreover, you can set the init_connect system variable in my.cnf file to ‘SET NAMES utf8′. For each client that connects, server executes this statement automatically and you don’t have to execute this statement by yourself. Set it up in the [mysqld] section:
.
.
[mysqld]
init_connect=’SET NAMES utf8′
.
.
The init_connect statement is not executed for super-user logins.
July 14th, 2008 at 10:17
Thanks for the great article, Markus. It answered a number of questions for me!
July 23rd, 2008 at 0:00
Your article solved my problems.
Thanks a lot 1000 times :D
September 10th, 2008 at 7:01
[...] Resulta que en una codificación he utilizado un MySQL con el charset en UTF8. Todo funcionaba bien si la carga de los datos se realizaba mediante la interfaz web pero en el momento se insertaba algún registro mediante phpMyAdmin con algún caracter especial, como un acento, este aparecía representado con caracteres extraños en la interfaz web. El problema es bastante sencillo, MySQL permite definir el charset a varios niveles de comunicación de forma que podemos enviar los datos en UTF8 pero recibirlos en latin1, por ejemplo. Podéis ver más información aquí [...]
November 20th, 2008 at 5:35
Thanks a lot. Finally a tip that solve it all. Thank you very much!!
Al.
February 10th, 2009 at 23:40
I am trying to write persian/farsi texts into the database using MYSQL, but it doesnt seems to be working?
any ideas and suggestions would be appreciated.. thanks alot in advance
March 19th, 2009 at 21:34
Thanks a bunch, this really helped out.
March 30th, 2009 at 22:45
Best from all articles!
post it in earth knowledge base & translate to all languages :)
May 9th, 2010 at 15:28
THANKS MAN THIS WORKED AFTER YEARS OF ENCODING PROBLEMS.
May 20th, 2010 at 1:43
thanks a lot!!
i wasted my time like an hour
your post saved my time!!
thanks again!
June 28th, 2010 at 23:32
Thanks, after searching for 20 minutes, the solution on this page fixed the problem I was having!
January 18th, 2011 at 23:06
I solved my problems just modifying the MYSQL server variable:
[mysqld]
…
character-set-server=utf8
…
Thanks for your precious post!
September 1st, 2011 at 22:52
Brilliant!!! Thanks a lot for this helpful information :) I was already moving website to production and saw only question marks…
This helped me a lot!
mysql_query(’SET CHARACTER SET utf8;’);