Reads in a string that has been encoded using a
<a href="#modified-utf-8">modified UTF-8</a>
format.
The general contract of {@code readUTF}
is that it reads a representation of a Unicode
character string encoded in modified
UTF-8 format; this string of characters
is then returned as a {@code string}.
<p>
First, two bytes are read and used to
construct an unsigned 16-bit integer in
exactly the manner of the {@code readUnsignedShort}
method . This integer value is called the
<i>UTF length</i> and specifies the number
of additional bytes to be read. These bytes
are then converted to characters by considering
them in groups. The length of each group
is computed from the value of the first
byte of the group. The byte following a
group, if any, is the first byte of the
next group.
<p>
If the first byte of a group
matches the bit pattern {@code 0xxxxxxx}
(where {@code x} means "may be {@code 0}
or {@code 1}"), then the group consists
of just that byte. The byte is zero-extended
to form a character.
<p>
If the first byte
of a group matches the bit pattern {@code 110xxxxx},
then the group consists of that byte {@code a}
and a second byte {@code b}. If there
is no byte {@code b} (because byte
{@code a} was the last of the bytes
to be read), or if byte {@code b} does
not match the bit pattern {@code 10xxxxxx},
then a {@code UTFDataFormatException}
is thrown. Otherwise, the group is converted
to the character:
<pre>{@code (char)(((a & 0x1F) << 6) | (b & 0x3F))
}</pre>
If the first byte of a group
matches the bit pattern {@code 1110xxxx},
then the group consists of that byte {@code a}
and two more bytes {@code b} and {@code c}.
If there is no byte {@code c} (because
byte {@code a} was one of the last
two of the bytes to be read), or either
byte {@code b} or byte {@code c}
does not match the bit pattern {@code 10xxxxxx},
then a {@code UTFDataFormatException}
is thrown. Otherwise, the group is converted
to the character:
<pre>{@code
(char)(((a & 0x0F) << 12) | ((b & 0x3F) << 6) | (c & 0x3F))
}</pre>
If the first byte of a group matches the
pattern {@code 1111xxxx} or the pattern
{@code 10xxxxxx}, then a {@code UTFDataFormatException}
is thrown.
<p>
If end of file is encountered
at any time during this entire process,
then an {@code EOFException} is thrown.
<p>
After every group has been converted to
a character by this process, the characters
are gathered, in the same order in which
their corresponding groups were read from
the input stream, to form a {@code string},
which is returned.
<p>
The {@code writeUTF}
method of interface {@code DataOutput}
may be used to write data that is suitable
for reading by this method.
@return a Unicode string.
@exception EOFException if this stream reaches the end
before reading all the bytes.
@exception IOException if an I/O error occurs.
@exception UTFDataFormatException if the bytes do not represent a
valid modified UTF-8 encoding of a string.
Reads in a string that has been encoded using a <a href="#modified-utf-8">modified UTF-8</a> format. The general contract of {@code readUTF} is that it reads a representation of a Unicode character string encoded in modified UTF-8 format; this string of characters is then returned as a {@code string}. <p> First, two bytes are read and used to construct an unsigned 16-bit integer in exactly the manner of the {@code readUnsignedShort} method . This integer value is called the <i>UTF length</i> and specifies the number of additional bytes to be read. These bytes are then converted to characters by considering them in groups. The length of each group is computed from the value of the first byte of the group. The byte following a group, if any, is the first byte of the next group. <p> If the first byte of a group matches the bit pattern {@code 0xxxxxxx} (where {@code x} means "may be {@code 0} or {@code 1}"), then the group consists of just that byte. The byte is zero-extended to form a character. <p> If the first byte of a group matches the bit pattern {@code 110xxxxx}, then the group consists of that byte {@code a} and a second byte {@code b}. If there is no byte {@code b} (because byte {@code a} was the last of the bytes to be read), or if byte {@code b} does not match the bit pattern {@code 10xxxxxx}, then a {@code UTFDataFormatException} is thrown. Otherwise, the group is converted to the character: <pre>{@code (char)(((a & 0x1F) << 6) | (b & 0x3F)) }</pre> If the first byte of a group matches the bit pattern {@code 1110xxxx}, then the group consists of that byte {@code a} and two more bytes {@code b} and {@code c}. If there is no byte {@code c} (because byte {@code a} was one of the last two of the bytes to be read), or either byte {@code b} or byte {@code c} does not match the bit pattern {@code 10xxxxxx}, then a {@code UTFDataFormatException} is thrown. Otherwise, the group is converted to the character: <pre>{@code (char)(((a & 0x0F) << 12) | ((b & 0x3F) << 6) | (c & 0x3F)) }</pre> If the first byte of a group matches the pattern {@code 1111xxxx} or the pattern {@code 10xxxxxx}, then a {@code UTFDataFormatException} is thrown. <p> If end of file is encountered at any time during this entire process, then an {@code EOFException} is thrown. <p> After every group has been converted to a character by this process, the characters are gathered, in the same order in which their corresponding groups were read from the input stream, to form a {@code string}, which is returned. <p> The {@code writeUTF} method of interface {@code DataOutput} may be used to write data that is suitable for reading by this method. @return a Unicode string. @exception EOFException if this stream reaches the end before reading all the bytes. @exception IOException if an I/O error occurs. @exception UTFDataFormatException if the bytes do not represent a valid modified UTF-8 encoding of a string.