× {{alert.msg}} Never ask again
Get notified about new tutorials RECEIVE NEW TUTORIALS

Java internal String representation: is it UTF-16?

Francis Galiegue
Mar 08, 2015
<p>You seem to be misunderstanding something.</p> <p>For all the system cares, and, MOST OF THE TIME, the developer cares, <code>char</code>s could as well be carrier pigeons, and <code>String</code>s sequence of said carrier pigeons. Although yes, internally, strings are sequences of <code>char</code>s (which are more precisely UTF-16 code units), this is not the problem at hand here.</p> <p>You don't write <code>char</code>s into files, neither do you read <code>char</code>s from files. You write, and read, bytes.</p> <p>And in order to read a sequence of bytes as a sequence of chars/carrier pigeons, you need a <em>decoder</em>; similarly (and this is what you do here), in order to turn chars/carrier pigeons into bytes, you need an <em>encoder</em>. In Java, both of these are available from a <a href="http://docs.oracle.com/javase/8/docs/api/java/nio/charset/Charset.html" rel="nofollow"><code>Charset</code></a>.</p> <p><code>String.getBytes()</code> just happens to use an encoder with the default platform character coding (obtained using <code>Charset.defaultCharset()</code>), and it happens that for your input string <code>"ABC"</code> and your JRE implementation, the sequence of bytes generated is 65, 66, 67. Hence the result.</p> <p>Now, try and <code>String.getBytes(Charset.forName("UTF-32LE"))</code>, and you'll get a different result.</p> <p>This tip was originally posted on <a href="http://stackoverflow.com/questions/28808198/Java%20internal%20String%20representation:%20is%20it%20UTF-16?/28808287">Stack Overflow</a>.</p>
comments powered by Disqus