org.apache.flume.sink.hbase
Class AsyncHBaseSink

java.lang.Object
  extended by org.apache.flume.sink.AbstractSink
      extended by org.apache.flume.sink.hbase.AsyncHBaseSink
All Implemented Interfaces:
Configurable, LifecycleAware, NamedComponent, Sink

public class AsyncHBaseSink
extends AbstractSink
implements Configurable

A simple sink which reads events from a channel and writes them to HBase. This Sink uses an aysnchronous API internally and is likely to perform better. The Hbase configution is picked up from the first hbase-site.xml encountered in the classpath. This sink supports batch reading of events from the channel, and writing them to Hbase, to minimize the number of flushes on the hbase tables. To use this sink, it has to be configured with certain mandatory parameters:

table: The name of the table in Hbase to write to.

columnFamily: The column family in Hbase to write to.

Other optional parameters are:

serializer: A class implementing AsyncHbaseEventSerializer. An instance of this class will be used to serialize events which are written to hbase.

serializer.*: Passed in the configure() method to serializer as an object of Context.

batchSize: This is the batch size used by the client. This is the maximum number of events the sink will commit per transaction. The default batch size is 100 events.

timeout: The length of time in milliseconds the sink waits for callbacks from hbase for all events in a transaction. If no timeout is specified, the sink will wait forever.

Note: Hbase does not guarantee atomic commits on multiple rows. So if a subset of events in a batch are written to disk by Hbase and Hbase fails, the flume transaction is rolled back, causing flume to write all the events in the transaction all over again, which will cause duplicates. The serializer is expected to take care of the handling of duplicates etc. HBase also does not support batch increments, so if multiple increments are returned by the serializer, then HBase failure will cause them to be re-written, when HBase comes back up.


Nested Class Summary
 
Nested classes/interfaces inherited from interface org.apache.flume.Sink
Sink.Status
 
Constructor Summary
AsyncHBaseSink()
           
AsyncHBaseSink(org.apache.hadoop.conf.Configuration conf)
           
 
Method Summary
 void configure(Context context)
           Request the implementing class to (re)configure itself.
 Sink.Status process()
          Requests the sink to attempt to consume data from attached channel
 void start()
           Starts a service or component.
 void stop()
           Stops a service or component.
 
Methods inherited from class org.apache.flume.sink.AbstractSink
getChannel, getLifecycleState, getName, setChannel, setName, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

AsyncHBaseSink

public AsyncHBaseSink()

AsyncHBaseSink

public AsyncHBaseSink(org.apache.hadoop.conf.Configuration conf)
Method Detail

process

public Sink.Status process()
                    throws EventDeliveryException
Description copied from interface: Sink

Requests the sink to attempt to consume data from attached channel

Note: This method should be consuming from the channel within the bounds of a Transaction. On successful delivery, the transaction should be committed, and on failure it should be rolled back.

Specified by:
process in interface Sink
Returns:
READY if 1 or more Events were successfully delivered, BACKOFF if no data could be retrieved from the channel feeding this sink
Throws:
EventDeliveryException - In case of any kind of failure to deliver data to the next hop destination.

configure

public void configure(Context context)
Description copied from interface: Configurable

Request the implementing class to (re)configure itself.

When configuration parameters are changed, they must be reflected by the component asap.

There are no thread safety guarrantees on when configure might be called.

Specified by:
configure in interface Configurable

start

public void start()
Description copied from interface: LifecycleAware

Starts a service or component.

Implementations should determine the result of any start logic and effect the return value of LifecycleAware.getLifecycleState() accordingly.

Specified by:
start in interface LifecycleAware
Overrides:
start in class AbstractSink

stop

public void stop()
Description copied from interface: LifecycleAware

Stops a service or component.

Implementations should determine the result of any stop logic and effect the return value of LifecycleAware.getLifecycleState() accordingly.

Specified by:
stop in interface LifecycleAware
Overrides:
stop in class AbstractSink


Copyright © 2009-2012 Apache Software Foundation. All Rights Reserved.