Perl, MySQL and UTF-8
Posted in perl, mysql, unicode, orm Mon, 02 Oct 2006 15:35:00 GMT
One of the mysteries of Perl to me is that why, as of yet, is there no UTF-8 support in DBD::mysql although this issue has been discussed on the msql-mysql-modules list since at least 2003 (using the MARC archives). This is also given that MySQL does have UTF-8 support itself.
When I first looked into this I found a few articles on this:
- utf-8 and DBD::mysql by Pedro Melo
- Movable Type, MySQL, Perl, Unicode by Zakaria "Zack" Ajmal: provides a patch for Movable Type 3.2
Pedro's article mentions that the reason this hasn't been done for DBD::mysql is that the DBI and DBD::mysql folks cannot decide where to put UTF-8 implementation, i.e. in DBI itself or the DBD drivers. Because, there is still no built-in support. To get around this, there have been numerous patches produced. Andrew Forrest even put together UTF-8 versions of DBI and CGI.pm (link seems broken atm). However, some of these patches seem to have problems and are non-standard.
If you prefer to use an ORM, DBIx::Class and Class::DBI get around this by implementing UTF-8 support in their own libraries with DBIx::Class::UTF8Columns and Class::DBI::utf8 respectively. I'd recommend DBIx::Class over Class::DBI since it has more functionality (e.g. built-in JOIN support) and is supposed to generate more efficient SQL.
The intersting thing is that DBD::Pg for PostgreSQL has had built-in UTF-8 support for some time. While not an issue specific to the MySQL database, the UTF-8 perl driver issue is something to consider when choosing MySQL or PostgreSQL.
Update: Thanks to Dominic Mitchell for mentioning the latest developer release, DBD::mysql 3.0007_1 released on 8 Sep 2006, has integrated UTF-8 support. It's a developer release but good things are finally happening!



















Wait no longer! DBD::mysql 3.007_01 has now got the same level of utf8 support as DBD::Pg. I sent the patch to Patrick Galbraith and it got in! That version is lacking tests, but they should be in the next version.
It’s basically the same idea as in DBD::Pg. You set
$dbh->{mysql_enable_utf8}and any text columns come back as UTF8 if they’re valid.I also managed to get the support into DBD::Pg a couple of years back. This was quite lucky as the patches are very similar.
:-)Wow, that’s exciting news! Thanks for posting about the developer release. Looks like I may try it out soon.
Just a small correction: MySQL 5.0 does have UTF8 support. I’m not sure, but I think 4.1 also does.
Robbie: I think you misread the article since it says “MySQL does have UTF-8 support” with the issue being UTF-8 support in the DBD::mysql driver, not the MySQL database.
In case you need utf8 support in DBD::mysql before the aforementioned 3.007 release, you can use
$dbh->do(“SET NAMES ‘utf8’”);
after getting the db handle.
mysql的优化