For those who have been into computer science for any amount of time, you’re likely familiar with Joel Spolsky, his blog Joel on Software, and/or perhaps any of his books.
A couple of years ago, I read an article called The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).
I’m not ashamed to admit that, at the time, it wasn’t very applicable to me. Yes, it was interesting, yes, I cared, but I didn’t have a practical way to implement it simply because there was nothing that I was working that warranted the information in the article.
But here was one of my biggest takeaways:
If you completely forget everything I just explained, please remember one extremely important fact. It does not make sense to have a string without knowing what encoding it uses. You can no longer stick your head in the sand and pretend that “plain” text is ASCII.
Fast forward a couple of years and I was working at a place where every piece of application code that we rolled out had to be internationalized because it was accessible by a variety of countries all across the world – now it was more practical (and it’s not much different than WordPress, huh?).
And now, I’m finding myself working more with unicode characters in WordPress more than I ever have before.
Here’s the thing that few people talk about: Sites, themes, or HTML in general will specify a character set that can drastically affect how the content in your page is rendered.
Continue reading