org.apache.flume.sink
Class AvroSink

java.lang.Object
  extended by org.apache.flume.sink.AbstractSink
      extended by org.apache.flume.sink.AvroSink
All Implemented Interfaces:
Configurable, LifecycleAware, NamedComponent, Sink

public class AvroSink
extends AbstractSink
implements Configurable

A Sink implementation that can send events to an RPC server (such as Flume's AvroSource).

This sink forms one half of Flume's tiered collection support. Events sent to this sink are transported over the network to the hostname / port pair using the RPC implementation encapsulated in RpcClient. The destination is an instance of Flume's AvroSource, which allows Flume agents to forward to other Flume agents, forming a tiered collection infrastructure. Of course, nothing prevents one from using this sink to speak to other custom built infrastructure that implements the same RPC protocol.

Events are taken from the configured Channel in batches of the configured batch-size. The batch size has no theoretical limits although all events in the batch must fit in memory. Generally, larger batches are far more efficient, but introduce a slight delay (measured in millis) in delivery. The batch behavior is such that underruns (i.e. batches smaller than the configured batch size) are possible. This is a compromise made to maintain low latency of event delivery. If the channel returns a null event, meaning it is empty, the batch is immediately sent, regardless of size. Batch underruns are tracked in the metrics. Empty batches do not incur an RPC roundtrip.

Configuration options

Parameter Description Unit (data type) Default
hostname The hostname to which events should be sent. Hostname or IP (String) none (required)
port The port to which events should be sent on hostname. TCP port (int) none (required)
batch-size The maximum number of events to send per RPC. events (int) 100
connect-timeout Maximum time to wait for the first Avro handshake and RPC request milliseconds (long) 20000
request-timeout Maximum time to wait RPC requests after the first milliseconds (long) 20000

Metrics

TODO


Nested Class Summary
 
Nested classes/interfaces inherited from interface org.apache.flume.Sink
Sink.Status
 
Constructor Summary
AvroSink()
           
 
Method Summary
 void configure(Context context)
           Request the implementing class to (re)configure itself.
 Sink.Status process()
          Requests the sink to attempt to consume data from attached channel
 void start()
          The start() of AvroSink is more of an optimization that allows connection to be created before the process() loop is started.
 void stop()
           Stops a service or component.
 String toString()
           
 
Methods inherited from class org.apache.flume.sink.AbstractSink
getChannel, getLifecycleState, getName, setChannel, setName
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

AvroSink

public AvroSink()
Method Detail

configure

public void configure(Context context)
Description copied from interface: Configurable

Request the implementing class to (re)configure itself.

When configuration parameters are changed, they must be reflected by the component asap.

There are no thread safety guarrantees on when configure might be called.

Specified by:
configure in interface Configurable

start

public void start()
The start() of AvroSink is more of an optimization that allows connection to be created before the process() loop is started. In case it so happens that the start failed, the process() loop will itself attempt to reconnect as necessary. This is the expected behavior since it is possible that the downstream source becomes unavailable in the middle of the process loop and the sink will have to retry the connection again.

Specified by:
start in interface LifecycleAware
Overrides:
start in class AbstractSink

stop

public void stop()
Description copied from interface: LifecycleAware

Stops a service or component.

Implementations should determine the result of any stop logic and effect the return value of LifecycleAware.getLifecycleState() accordingly.

Specified by:
stop in interface LifecycleAware
Overrides:
stop in class AbstractSink

toString

public String toString()
Overrides:
toString in class AbstractSink

process

public Sink.Status process()
                    throws EventDeliveryException
Description copied from interface: Sink

Requests the sink to attempt to consume data from attached channel

Note: This method should be consuming from the channel within the bounds of a Transaction. On successful delivery, the transaction should be committed, and on failure it should be rolled back.

Specified by:
process in interface Sink
Returns:
READY if 1 or more Events were successfully delivered, BACKOFF if no data could be retrieved from the channel feeding this sink
Throws:
EventDeliveryException - In case of any kind of failure to deliver data to the next hop destination.


Copyright © 2009-2012 Apache Software Foundation. All Rights Reserved.