up vote 38 down vote favorite

My page often shows things like ë, Ã, ì, ù, à in place of normal characters.

I use utf8 for header page and MySQL encode. How does this happen?

share|improve this question
You need to add more context. Where do these characters show up, what encoding are your tables in, what does the code look like to retrieve the data.... – Pekka 웃 Feb 26 '11 at 15:28
These are UTF-8 sequences when displayed on a Latin-1 charset website. The best option is to add <meta charset="UTF-8"> to your pages, or use header("Content-Type: text/html; charset=utf-8"); on top of your PHP scripts. I assume this isn't actually the case yet. – mario Feb 26 '11 at 15:37

3 Answers 3

active oldest votes
up vote 44 down vote

These are utf-8 encoded characters. Use utf8_decode() to convert them to normal ISO-8859-1 characters.

share|improve this answer
This may happen to fix the problem at hand, but it is much, much better to get all encodings in the process right in the first place. – Pekka 웃 Feb 26 '11 at 15:30
I always use utf8_encode() (and mysql_real_escape_string of course) when sending a string to database. At the output page is use utf8_decode(). But you say that's wrong, I didn't know that, how would you deal with this? – Ray Feb 26 '11 at 15:33
utf8_encode() and utf8_decode convert data from and to ISO-8859-1. In a modern web site setup where the database, the database connection, and the output page encoding are UTF-8, it will not be necessary to do those conversions any more. That is the recommended way when building PHP projects from scratch. While it would probably fix the problem the OP shows, fixing the problem at its root (if possible) is much preferable. – Pekka 웃 Feb 26 '11 at 15:44
@Pekka Thanks, i'll keep it in mind! – Ray Feb 26 '11 at 15:46
And you may need even to use it twice – vivoconunxino Feb 24 '15 at 10:58
up vote 23 down vote

If you see those characters you probably just didn’t specify the character encoding properly. Because those characters are the result when an UTF-8 multi-byte string is interpreted with a single-byte encoding like ISO 8859-1 or Windows-1252.

In this case ë could be encoded with 0xC3 0xAB that represents the Unicode character ë (U+00EB) in UTF-8.

share|improve this answer
how encoded with 0xC3 0xAB that represents the Unicode character ë (U+00EB) in UTF-8 ?? – Leonardo Apr 5 '11 at 21:24
The character ë has the code point 0xEB in the Unicode character set and is encoded with 0xC3AB in UTF-8. But this byte sequence does represent something different when interpreted with a different character encoding. For example, in ISO 8859-1 and Windows-1252 it represents the two characters à (0xC3) and « (0xAB). – Gumbo Apr 6 '11 at 8:09
up vote 7 down vote

Even though utf8_decode is a useful solution, I prefer to correct the encoding errors on the table itself. In my opinion it is better to correct the bad characters themselves than making "hacks" in the code. Simply do a replace on the field on the table. To correct the bad encoded characters from OP :

update <table> set <field> = replace(<field>, "ë", "ë")
update <table> set <field> = replace(<field>, "Ã", "à")
update <table> set <field> = replace(<field>, "ì", "ì")
update <table> set <field> = replace(<field>, "ù", "ù")

Where <table> is the name of the mysql table and <field> is the name of the column in the table. Here is a very good check-list for those typically bad encoded windows-1252 to utf-8 characters -> Debugging Chart Mapping Windows-1252 Characters to UTF-8 Bytes to Latin-1 Characters.

Remember to backup your table before trying to replace any characters with SQL!

[I know this is an answer to a very old question, but was facing the issue once again. Some old windows machine didnt encoded the text correct before inserting it to the utf8_general_ci collated table.]

share|improve this answer

Not the answer you're looking for? Browse other questions tagged php mysql character-encoding mojibake utf8-decode or ask your own question.