Aug 132012
 

I had to edit a Silverlight manifest file – an XML file that was uneditable due to being of characterset UTF-16, and therefore, as far as the usual utilies were concerned, was not strictly text. Not having had much experience with file encoding, I had to do some research. Fortunately, I found this rather good article which explains the whole unicode characterset thing.

I had to edit multiple such files, and so a script was in order to automate the process, but the unicode thing was something I’d never seen. Here’s how I got around it.

First, here’s how to discern what characterset youre dealing with:

  # file -bi filename
application/xml; charset=utf-16le

It turns out you can convert between encodings using iconv, but I was concerned that data could be lost when converting, in that some characters may need to be ignored. Particularly when I got errors like this:

  # iconv -f UTF16LE -t ascii tmp
iconv: illegal input sequence at position 0

After much Googling, I found this Perl code, written by Enrique Nell, with which to process the Silverlight XML manifest.

I needed to import the perl Encode module to use the encode and decode functions:

  open(my $in, '<:raw', $in_path) || die "Couldn't open file: $!";
  my $text = do { local $/; <$in> };
  decode('UTF-16LE', $text); 

Otherwise, the code referenced above is all that was needed.


Matt Parsons is a freelance Linux specialist who has designed, built and supported Unix and Linux systems in the finance, telecommunications and media industries.

He lives and works in London.

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>