Posted in perl, mysql, unicode, orm
Mon, 02 Oct 2006 15:35:00 GMT
One of the mysteries of Perl to me is that why, as of yet, is there no UTF-8 support in DBD::mysql although this issue has been discussed on the msql-mysql-modules list since at least 2003 (using the MARC archives). This is also given that MySQL does have UTF-8 support itself.
Read more...
8 comments
Posted in postgresql, perl, unicode
Fri, 29 Sep 2006 18:21:00 GMT
Perl has two UTF-8 encodings, utf8 which is Perl's liberal version and UTF-8 which is a strict interpretation, aka utf-8-strict. The liberal version allows for encoded characters outside the UTF-8 character set, however you can run into problems when interoperating with applications that expect utf-8-strict, such as PostgreSQL. Here's a function I wrote to strictify utf8 to UTF-8 using the Encode core module:
use Encode;
sub strictify_utf8 {
my $data = shift;
if (Encode::is_utf8($data) && !Encode::is_utf8($data,1)) {
Encode::_utf8_off($data);
Encode::from_to($data, 'utf8', 'UTF-8');
Encode::_utf8_on($data);
}
return $data;
}
no comments
Posted in perl, unicode
Fri, 29 Sep 2006 18:00:00 GMT
I recently responded to someone asking how to get a Unicode hex codepoint from a Unicode literal on DevShed Forums. Since I think it may be more generally useful, here's my solution. The following function takes a unicode literal, converts it to a decimal representation using unpack and then converts it to hex usning sprintf:
sub codepoint_hex {
if (my $char = shift) {
return sprintf '%2.2x', unpack('U0U*', $char);
}
}
my $cp = codepoint_hex('カ'); # eq '30ab'
Read more...
2 comments