<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="/stylesheets/rss.css" type="text/css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Dev411 Blog: Perl - Strictify utf8 to UTF-8</title>
    <link>http://www.dev411.com/blog/2006/09/29/perl-strictify-utf8-to-UTF-8</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>John Wang on Technology</description>
    <item>
      <title>Perl - Strictify utf8 to UTF-8</title>
      <description>&lt;p&gt;Perl has two UTF-8 encodings, &lt;span class="fix"&gt;utf8&lt;/span&gt; which is Perl's liberal version and &lt;span class="fix"&gt;UTF-8&lt;/span&gt; which is a strict interpretation, aka &lt;span class="fix"&gt;utf-8-strict&lt;/span&gt;. The liberal version allows for encoded characters outside the UTF-8 character set, however you can run into problems when interoperating with applications that expect &lt;span class="fix"&gt;utf-8-strict&lt;/span&gt;, such as PostgreSQL. Here's a function I wrote to strictify &lt;span class="fix"&gt;utf8&lt;/span&gt; to &lt;span class="fix"&gt;UTF-8&lt;/span&gt; using the Encode core module:&lt;/p&gt;

&lt;pre&gt;use Encode;

sub strictify_utf8 {
    my $data = shift;
    if (Encode::is_utf8($data) &amp;&amp; !Encode::is_utf8($data,1)) {
        Encode::_utf8_off($data);
        Encode::from_to($data, 'utf8', 'UTF-8');
        Encode::_utf8_on($data);
    }
    return $data;
}&lt;/pre&gt;

</description>
      <pubDate>Fri, 29 Sep 2006 13:21:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:6b24eecf-4fda-43d5-94da-7497d3ed337c</guid>
      <author>John Wang</author>
      <link>http://www.dev411.com/blog/2006/09/29/perl-strictify-utf8-to-UTF-8</link>
      <category>postgresql</category>
      <category>perl</category>
      <category>unicode</category>
    </item>
  </channel>
</rss>
