perl iconunicode icon

Perl - Getting a Unicode Character's Hex Codepoint

Posted in , Fri, 29 Sep 2006 18:00:00 GMT

I recently responded to someone asking how to get a Unicode hex codepoint from a Unicode literal on DevShed Forums. Since I think it may be more generally useful, here's my solution. The following function takes a unicode literal, converts it to a decimal representation using unpack and then converts it to hex usning sprintf:

sub codepoint_hex {
    if (my $char = shift) {
        return sprintf '%2.2x', unpack('U0U*', $char);
    }
}

my $cp = codepoint_hex('カ'); # eq '30ab'

Using unpack is a nice solution here because the U0 option will require the literal to be strict UTF-8, aka utf-8-strict and not Perl's liberal utf8 version. It throws a warning if the literal is not valid UTF-8. The other nice thing about using unpack is it works whether the UTF-8 flag is on or not, removing one check.

An alternate way to get the decimal representation from the literal is to use ord instead of unpack, but ord requires the UTF-8 flag to be on. This can be turned on with Encode::_utf_on but that just turns the flag on without checking for strict UTF-8 so you'll have to do that with another check.

Update: miyagawa just showed me how to use ord with Encode:

use Encode;

sub codepoint_hex {
    sprintf "%04x", ord Encode::decode("UTF-8", shift);
}
del.icio.us:Perl - Getting a Unicode Character's Hex Codepoint digg:Perl - Getting a Unicode Character's Hex Codepoint reddit:Perl - Getting a Unicode Character's Hex Codepoint spurl:Perl - Getting a Unicode Character's Hex Codepoint wists:Perl - Getting a Unicode Character's Hex Codepoint simpy:Perl - Getting a Unicode Character's Hex Codepoint newsvine:Perl - Getting a Unicode Character's Hex Codepoint blinklist:Perl - Getting a Unicode Character's Hex Codepoint furl:Perl - Getting a Unicode Character's Hex Codepoint fark:Perl - Getting a Unicode Character's Hex Codepoint blogmarks:Perl - Getting a Unicode Character's Hex Codepoint Y!:Perl - Getting a Unicode Character's Hex Codepoint smarking:Perl - Getting a Unicode Character's Hex Codepoint magnolia:Perl - Getting a Unicode Character's Hex Codepoint segnalo:Perl - Getting a Unicode Character's Hex Codepoint

2 comments

Comments

  1. Dominic Mitchell said about 3 hours later:

    If you need the character names from the code number, look at the charnames module.

  2. Krish said about 1 year later:

    Thanks for the codepoint_hex function, that works gr8 4 me

    Thanksx1000

    But I need the explanation for the line

    sprintf ‘%2.2x’, unpack(‘U0U*’, $char);

    im a new bie plz explain 2 me

(leave url/email »)

   Comment Markup Help Preview comment