Perl - Strictify utf8 to UTF-8
Posted in postgresql, perl, unicode Fri, 29 Sep 2006 18:21:00 GMT
Perl has two UTF-8 encodings, utf8 which is Perl's liberal version and UTF-8 which is a strict interpretation, aka utf-8-strict. The liberal version allows for encoded characters outside the UTF-8 character set, however you can run into problems when interoperating with applications that expect utf-8-strict, such as PostgreSQL. Here's a function I wrote to strictify utf8 to UTF-8 using the Encode core module:
use Encode;
sub strictify_utf8 {
my $data = shift;
if (Encode::is_utf8($data) && !Encode::is_utf8($data,1)) {
Encode::_utf8_off($data);
Encode::from_to($data, 'utf8', 'UTF-8');
Encode::_utf8_on($data);
}
return $data;
}


















