@InterfaceAudience.Private @InterfaceStability.Evolving public class ResettableFileInputStream extends ResettableInputStream implements RemoteMarkable, LengthMeasurable
This class makes the following assumptions:
The ability to reset()
is dependent on the underlying PositionTracker
instance's durability semantics.
A note on surrogate pairs:
The logic for decoding surrogate pairs is as follows:
If no character has been decoded by a "normal" pass, and the buffer still has remaining bytes,
then an attempt is made to read 2 characters in one pass.
If it succeeds, then the first char (high surrogate) is returned;
the second char (low surrogate) is recorded internally,
and is returned at the next call to readChar()
.
If it fails, then it is assumed that EOF has been reached.
Impacts on position, mark and reset: when a surrogate pair is decoded, the position is incremented by the amount of bytes taken to decode the entire pair (usually, 4). This is the most reasonable choice since it would not be advisable to reset a stream to a position pointing to the second char in a pair of surrogates: such a dangling surrogate would not be properly decoded without its counterpart.
Thus the behaviour of mark and reset is as follows:
mark()
is called after a high surrogate pair has been returned by
readChar()
, the marked position will be that of the character following
the low surrogate, not that of the low surrogate itself.reset()
is called after a high surrogate pair has been returned by
readChar()
, the low surrogate is always returned by the next call to
readChar()
, before the stream is actually reset to the last marked
position.This ensures that no dangling high surrogate could ever be read as long as
the same instance is used to read the whole pair. However, if reset()
is called after a high surrogate pair has been returned by readChar()
,
and a new instance of ResettableFileInputStream is used to resume reading,
then the low surrogate char will be lost,
resulting in a corrupted sequence of characters (dangling high surrogate).
This situation is hopefully extremely unlikely to happen in real life.
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_BUF_SIZE |
static int |
MIN_BUF_SIZE
The minimum acceptable buffer size to store bytes read
from the underlying file.
|
Constructor and Description |
---|
ResettableFileInputStream(File file,
PositionTracker tracker) |
ResettableFileInputStream(File file,
PositionTracker tracker,
int bufSize,
Charset charset,
DecodeErrorPolicy decodeErrorPolicy) |
Modifier and Type | Method and Description |
---|---|
void |
close() |
long |
getMarkPosition()
Return the saved mark position without moving the mark pointer.
|
long |
length()
returns the total length of the stream or file
|
void |
mark()
Marks the current position in this input stream.
|
void |
markPosition(long position)
Indicate that the specified position should be returned to in the case of
Resettable.reset() being called. |
int |
read()
Read a single byte of data from the stream.
|
int |
read(byte[] b,
int off,
int len)
Read multiple bytes of data from the stream.
|
int |
readChar()
Read a single character.
|
void |
reset()
Reset stream position to that set by
ResettableInputStream.mark() |
void |
seek(long newPos)
Seek to the specified byte position in the stream.
|
long |
tell()
Tell the current byte position.
|
public static final int DEFAULT_BUF_SIZE
public static final int MIN_BUF_SIZE
public ResettableFileInputStream(File file, PositionTracker tracker) throws IOException
file
- File to readtracker
- PositionTracker implementation to make offset position durableFileNotFoundException
- If the file to read does not existIOException
- If the position reported by the tracker cannot be soughtpublic ResettableFileInputStream(File file, PositionTracker tracker, int bufSize, Charset charset, DecodeErrorPolicy decodeErrorPolicy) throws IOException
file
- File to readtracker
- PositionTracker implementation to make offset position durablebufSize
- Size of the underlying buffer used for input. If lesser than MIN_BUF_SIZE
,
a buffer of length MIN_BUF_SIZE
will be created instead.charset
- Character set used for decoding text, as necessarydecodeErrorPolicy
- A DecodeErrorPolicy
instance to determine how
the decoder should behave in case of malformed input and/or
unmappable character.FileNotFoundException
- If the file to read does not existIOException
- If the position reported by the tracker cannot be soughtpublic int read() throws IOException
ResettableInputStream
read
in class ResettableInputStream
-1
if the end of the stream has
been reached.IOException
public int read(byte[] b, int off, int len) throws IOException
ResettableInputStream
read
in class ResettableInputStream
b
- the buffer into which the data is read.off
- Offset into the array b
at which the data is written.len
- the maximum number of bytes to read.-1
if
the end of the stream has been reached.IOException
public int readChar() throws IOException
ResettableInputStream
Read a single character.
Note that this may lead to returning only one character in a 2-char surrogate pair sequence. When this happens, the underlying implementation should never persist a mark between two chars of a two-char surrogate pair sequence.
readChar
in class ResettableInputStream
IOException
public void mark() throws IOException
ResettableInputStream
reset
method repositions this stream at the last marked
position so that subsequent reads re-read the same bytes.
Marking a closed stream should not have any effect on the stream.
mark
in interface Resettable
mark
in class ResettableInputStream
IOException
- If there is an error while setting the mark position.InputStream.mark(int)
,
InputStream.reset()
public void markPosition(long position) throws IOException
RemoteMarkable
Resettable.reset()
being called.markPosition
in interface RemoteMarkable
IOException
public long getMarkPosition() throws IOException
RemoteMarkable
getMarkPosition
in interface RemoteMarkable
IOException
public void reset() throws IOException
ResettableInputStream
ResettableInputStream.mark()
reset
in interface Resettable
reset
in class ResettableInputStream
IOException
public long length() throws IOException
LengthMeasurable
length
in interface LengthMeasurable
IOException
public long tell() throws IOException
ResettableInputStream
tell
in interface Seekable
tell
in class ResettableInputStream
IOException
public void seek(long newPos) throws IOException
ResettableInputStream
seek
in interface Seekable
seek
in class ResettableInputStream
newPos
- Absolute byte offset to seek toIOException
public void close() throws IOException
close
in interface Closeable
close
in interface AutoCloseable
close
in class ResettableInputStream
IOException
Copyright © 2009-2016 Apache Software Foundation. All Rights Reserved.