How to check if a string is valid utf-8

Every day (at least) I’m facing a problem: how to check if a string is valid in utf-8 ?
So I wrote a little C program, that I put on my github. Just be aware that pure ASCII is valid UTF-8 and that’s not a bug: my program is checking if a string is valid utf-8, not if the string is in utf-8.

Enjoy :-)

This entry was posted in Code. Bookmark the permalink.

3 Responses to How to check if a string is valid utf-8

  1. Matthias says:

    Thanks for sharing the snippet. I took it and copied it into my project (adapted code style to my styleguide and so on) because there is no license file shipped with the code. Hope, you’re fine with that.

    Best regards.

    • You’re right, the project missed a licence.

      I added the Simplified BSD License to it, so it’s totally OK to use my code, modified, in every kind of project.

      UTF8 is cool ! (I’m vomiting UTF16, UTF32, latin-*, euh… every other encodings except ASCII ^-^)

  2. Jia says:

    (str[i] >= 0xC0 /*11000000*/ && str[i] <= 0xDF /*11011111*/)

    In the above condition, should 0xC0 actually be 0xC2, since U-0080 is C280 ?

Leave a Reply to Julien Palard Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>