Converting from char to int in Java
[Computer] 日本語 

Someone in a newsgroup asked how to convert a character into an integer in Java. So for example, they'd want to get the value 5 when passed the character '5'. Personally, my first instinct was to just use Character.getNumericValue(), because I'm lazy and this seemed like the easiest way to do it, but then someone suggested that it might be more efficient to use the following algorithm, which avoids a method call:

char a = '5'; int x = a - '0';

Then someone pointed out that this algorithm is too North-American centric.

I think the problems with that method include that other locales will have different characters that represent digits and it will fail for those locales.

If you don't care about internationalization or you know that you have one of the Arabic numerals 0 through 9 then go for it.

Someone then asked if there existed any locales where the set of digits was not continous. It turns out that yes, there is: Japanese.

The codepoint for 一 (which represents the value 1) is U+4E00. The codepoint for 二 (which represents the value 2) is U+4E8C, or 140 (in decimal) characters away. What character immediately follows 一? 丁 (which apparently has a ton of meanings including leaf, cake, even number, 4th in rank, male adult, robust, etc.). In fact, you can see the whole list of unicode characters from U+4E00 to U+9FBF (5MB PDF).

Apparently these characters are arranged by their radicals, which is why 一 is not adjacent to 二.

Back on the topic of Character.getNumericValue(), this method is actually fully unicode aware. For example, if you pass the character 'Ⅷ', it'll return 8. Amazing! Unfortunately, getNumericValue() only returns integers, so '⅚' might not return what you expect. From the javadocs:

If the character does not have a numeric value, then -1 is returned. If the character has a numeric value that cannot be represented as a nonnegative integer (for example, a fractional value), then -2 is returned.

Anyway, I guess the conclusion is that if you're not used to it, it's difficult to make your program locale-independent.

 
E-mail this story to a friend.

You must be logged in to post comments.

Sites linking to this post: