How to json_encode utf8?

Francisco is an engineer focused on cross-platkhung apps (Ionic/Cordova) and specialized in hardware-software công nghệ integration.

Bạn đang xem: How to json_encode utf8?

Quý Khách sẽ xem: Utf8_encode

Read the Spanish
*

version of this article translated by Marisela OrdazAs a MySquốc lộ or PHP. developer, once you step beyond the comfortable confines of English-only character sets, you quickly find yourself entangled in the wonderfully wacky world of UTF-8 encoding.Unicode is a widely-used computing industry standard that defines a comprehensive sầu mapping of quality numeric code values to the characters in most of today’s written character sets to aid with system interoperability và data interchange.UTF-8 is a variable-width encoding that can represent every character in the Unicode character phối. It was designed for backward compatibility with ASCII & khổng lồ avoid the complications of endianness & byte order marks in UTF-16 & UTF-32. UTF-8 has become the dominant character encoding for the World Wide Web, accounting for more than half of all Web pages.UTF-8 encodes each character using one khổng lồ four bytes. The first 128 characters of Unicode correspond one-to-one with ASCII, making valid ASCII text also valid UTF-8-encoded text. It is for this reason that systems that are limited khổng lồ use of the English character mix are insulated from the complexities that can otherwise arise with UTF-8.For example, the Unicode hexidecimal code for the letter A is U+0041, which in UTF-8 is simply encoded with the single byte 41. In comparison, the Unicode hexidecimal code for the character
*

is U+233B4, which in UTF-8 is encoded with the four bytes F0 A3 8E B4.

On a previous job, we began running into lớn data encoding issues when displaying bgame ios of artists from all over the world. It soon became apparent that there were problems with the stored data, as sometimes the data was correctly encoded and sometimes it was not.

This led programmers to lớn implement a hodge-podge of patches, sometimes with JavaScript, sometimes with HTML charset meta tags, sometimes with PHPhường, & so on. Soon, we ended up with a danh mục of 600,000 artist bios with double- or triple-encoded information, with data being stored in different ways depending on who programmed the feature or implemented the patch. A classical technical rat’s nest.

Indeed, navigating through UTF-8 data encoding issues can be a frustrating và hair-pulling experience. This post provides a concise cookbook for addressing these UTF-8 issues when working with PHPhường và MySquốc lộ in particular, based on practical experience and lessons learned (and with thanks, in part, khổng lồ information discovered here & here along the way).

Data encoding with UTF-8 unicode for PHPhường và MySquốc lộ makes complex languages simple.

Specifically, we’ll cover the following in this post:

Mods you’ll need lớn make khổng lồ your php.ini file & PHP. code. Mods you’ll need khổng lồ make to your my.ini tệp tin và other MySQL-related issues to lớn be aware of (including config mods needed if you’re using Sphinx)

PHP. UTF-8 Encoding – modifications lớn your php.ini file:

The first thing you need to vị is lớn modify your php.ini file khổng lồ use UTF-8 as the default character set:

default_charset = "utf-8";(Note: You can subsequently use phpinfo() to lớn verify that this has been set properly.)

OK cool, so now PHP & UTF-8 should work just fine together. Right?

Well, not exactly. In fact, not even cthất bại.

While this change will ensure that PHPhường always outputs UTF-8 as the character encoding (in browser response Content-type headers), you still need to make a number of modifications to lớn your PHPhường code lớn make sure that it properly processes and generates UTF-8 characters.

Related: PHPhường Best Practices & Tips by obatambeienwasirherbal.com Developers

PHPhường UTF-8 Encoding – modifications to lớn your code:

To be sure that your PHPhường. code plays well in the UTF-8 data encoding sandbox, here are the things you need khổng lồ do:

Set UTF-8 as the character phối for all headers output by your PHP. code

In every PHP output header, specify UTF-8 as the encoding:

header("Content-Type: text/html; charset=utf-8"); Specify UTF-8 as the encoding type for XML

Specify UTF-8 as the character phối for all HTML content

For HTML nội dung, specify UTF-8 as the encoding:

In HTML forms, specify UTF-8 as the encoding:

Set UTF-8 as the default character mix for all MySquốc lộ connections

Specify UTF-8 as the default character mix lớn use when exchanging data with the MySQL database using mysql_set_charset:

$link = mysql_connect("localhost", "user", "password"); mysql_set_charset("utf8", $link); chú ý that, as of PHPhường 5.5.0, mysql_set_charphối is deprecated, và mysqli::set_charmix should be used instead:

$mysqli = new mysqli("localhost", "my_user", "my_password", "test"); /* kiểm tra connection */ if (mysqli_connect_errno()) printf("Connect failed: %s ", mysqli_connect_error()); exit(); /* change character phối to utf8 */ if (!$mysqli->set_charset("utf8")) printf("Error loading character phối utf8: %s ", $mysqli->error); else printf("Current character set: %s ", $mysqli->character_set_name()); $mysqli->close(); Always use UTF-8 compatible versions of string manipulation functions

There are several PHPhường functions that will fail, or at least not behave sầu as expected, if the character representation needs more than 1 byte (as UTF-8 does). An example is the strlen function that will return the number of bytes rather than the number of characters.

Two options are available for dealing with this:

MySQL UTF-8 Encoding – modifications to your my.ini file:

On the MySQL/UTF-8 side of things, modifications khổng lồ the my.ini file are required as follows:

Set the following config parameters after each corresponding tag:

default-character-set=UTF-8 default-character-set=UTF-8 character-set-client-handshake = false #force encoding to uft8 character-set-server=UTF-8 collation-server=UTF-8_general_ci default-character-set=UTF-8 After making the above sầu changes to lớn your my.ini file, restart your MySQL daemon.

To verify that everything has properly been set lớn use the UTF-8 encoding, exexinh đẹp the following query:

mysql> show variables like "char%"; The output should look something like:

| character_set_client | UTF-8 | character_set_connection | UTF-8 | character_set_database | UTF-8 | character_set_filesystem | binary | character_set_results | UTF-8 | character_set_VPS | UTF-8 | character_set_system | UTF-8 | character_sets_dir | /usr/share/mysql/charsets/ If you instead see latin1 listed for any of these, double-kiểm tra your configuration và make sure you’ve properly restarted your mysql daemon.

MySQL UTF-8 Encoding – other things lớn consider:

If the connecting client has no way to specify the encoding for its communication with MySQL, after the connection is established you may have sầu to run the following command/query:

set names UTF-8; When determining the kích cỡ of varchar fields when modeling the database, don’t forget that UTF-8 characters may require as many as 4 bytes per character.

MySQL UTF-8 Encoding – if you use Sphinx:

In your Sphinx configuration tệp tin (i.e., sphinx.conf):

Set your index definition to lớn have:

charset_type = utf-8 Add the following khổng lồ your source definition:

Migrating database data that is already encoded in latin1 khổng lồ UTF-8

If you have sầu an existing MySquốc lộ database that is already encoded in latin1, here’s how khổng lồ convert the latin1 to UTF-8:

Make sure you’ve sầu made all the modifications to the configuration settings in your my.ini tệp tin, as described above.

Xem thêm: jpy là gì

Exeđáng yêu the following command:

ALTER SCHEMA `your-db-name` DEFAULT CHARACTER SET UTF-8; Via commvà line, verify that everything is properly mix to UTF-8

mysql> show variables like "char%"; Create a dump tệp tin with latin1 encoding for the table you want to convert:

mysqldump -u USERNAME -pDB_PASSWORD --opt --skip-set-charphối --default-character-set=latin1 --skip-extended-insert DATABASENAME --tables TABLENAME > DUMP_FILE_TABLE.sql e.g:

mysqldump -u root --opt --skip-set-charmix --default-character-set=latin1 --skip-extended-insert artists-database --tables tbl_artist > tbl_artist.sql Do a global search and replace of the charphối in the dumpfile from latin1 khổng lồ UTF-8:

e.g., using Perl:

perl -i -pe "s/DEFAULT CHARSET=latin1/DEFAULT CHARSET=UTF-8/" DUMP_FILE_TABLE.sql chú ý to lớn Windows users: This charset string replacement (from latin1 to lớn UTF-8) can also be done using find-and-replace in WordPad (or some other text editor, such as vim). Be sure to lớn save sầu the tệp tin just as it is though (don’t save sầu it as unicode txt file!).

From this point, we will start messing with the database data, so it would probably be prudent lớn backup the database if you haven’t already done so. Then, restore the dump into lớn the database:

mysql> source "DUMP_FILE_TABLE.sql"; Search for any records that may not have converted properly and correct them. Since non-ASCII characters are multi-byte by kiến thiết, we can find them by comparing the byte length lớn the character length (i.e., to lớn identify rows that may hold double-encoded UTF-8 characters that need lớn be fixed).

See if there are any records with multi-byte characters (if this query returns zero, then there don’t appear khổng lồ be any records with multi-byte characters in your table & you can proceed khổng lồ Step 8).

mysql> select count(*) from MY_TABLE where LENGTH(MY_FIELD) != CHAR_LENGTH(MY_FIELD); Copy rows with multi-byte characters inkhổng lồ a temporary table:

create table temptable ( select * from MY_TABLE where LENGTH(MY_FIELD) != CHAR_LENGTH(MY_FIELD)); Convert double-encoded UTF-8 characters to lớn proper UTF-8 characters

This is actually a bit tricky. A double encoded string is one that was properly encoded as UTF-8. However, MySquốc lộ then did us the erroneous favor of converting it (from what it thought was latin1) to UTF-8 again, when we mix the column to UTF-8 encoding. Resolving this therefore requires a two step process through which we “trick” MySquốc lộ in order lớn preclude it from doing us this “favor”.

First, we mix the encoding type for the column baông chồng lớn latin1, thereby removing the double encoding:

e.g.:

alter table temptable modify temptable.ArtistName varchar(128) character phối latin1; Note: Be sure lớn use the correct field type for your table. In the example above sầu, for our table, the correct field type for ‘ArtistName’ was varchar(128), but the field in your table could be text or any other type. Be sure lớn specify it properly!

The problem is that now, if we mix the column encoding baông chồng to lớn UTF-8, MySquốc lộ will run the latin1 khổng lồ UTF-8 data encoding for us again và we’ll be baông xã to lớn where we started. To avoid this, we change the column type khổng lồ blob and THEN we mix it to UTF-8. This exploits the fact that MySQL will not attempt khổng lồ encode a blob. We are thereby able to lớn “fool” the MySQL charset conversion to avoid the double encoding issue.

e.g.:

alter table temptable modify temptable.ArtistName blob; alter table temptable modify temptable.ArtistName varchar(128) character phối UTF-8; (Again, as noted above, be sure khổng lồ use the proper field type for your table.)

Remove sầu rows with only single-byte characters from the temporary table:

delete from MY_TABLE where LENGTH(MY_FIELD) = CHAR_LENGTH(MY_FIELD); Re-insert fixed rows baông xã inlớn the original table (before doing this, you may want lớn run some selects on the temptable lớn verify that it appears to be properly corrected, just as a sanity check).

replace into MY_TABLE (select * from temptable); Verify the remaining data và, if necessary, repeat the process in step 7 (this could be necessary, for example, if the data was triple encoded). Further errors, if any, may be easiest to lớn resolve manually.

Source code và resource files

One other thing to remember & verify is that your source code files, resources files, & so on, are all being saved properly with UTF-8 data encoding. Otherwise, any “special” characters in these files may not be handled correctly.

In Netbeans, for example, you can right-cliông xã on your project, choose properties and then in “Sources” you will find the data encoding option (it usually defaults to UTF-8, but it’s worth checking).

Or in Windows Notepad, use the “Save As…” option in the File thực đơn, & select the UTF-8 encoding option at the bottom of the dialog. (Note that the “Unicode” option that Notepad provides is actually UTF-16, so that’s not what you want.)

Wrap-up

Although it can be somewhat tedious, taking the time lớn go through these steps to systematically address your MySquốc lộ and PHP.. UTF-8 data encoding issues can ultimately save you a great khuyễn mãi giảm giá of time and grief. In the long run, this type of methodical approach is far superior to the all-too-common tendency to lớn just keep patching the system.

This guide hopefully emphasizes the importance of taking the charset definition into consideration when setting up a project environment in the first place & working in a software project environment that properly accounts for character encoding in its manipulation of text & strings.