org.apache.flume.sink.hbase
Class HBaseSink

java.lang.Object
  extended by org.apache.flume.sink.AbstractSink
      extended by org.apache.flume.sink.hbase.HBaseSink
All Implemented Interfaces:
Configurable, LifecycleAware, NamedComponent, Sink

public class HBaseSink
extends AbstractSink
implements Configurable

A simple sink which reads events from a channel and writes them to HBase. The Hbase configuration is picked up from the first hbase-site.xml encountered in the classpath. This sink supports batch reading of events from the channel, and writing them to Hbase, to minimize the number of flushes on the hbase tables. To use this sink, it has to be configured with certain mandatory parameters:

table: The name of the table in Hbase to write to.

columnFamily: The column family in Hbase to write to.

This sink will commit each transaction if the table's write buffer size is reached or if the number of events in the current transaction reaches the batch size, whichever comes first.

Other optional parameters are:

serializer: A class implementing HbaseEventSerializer. An instance of this class will be used to write out events to hbase.

serializer.*: Passed in the configure() method to serializer as an object of Context.

batchSize: This is the batch size used by the client. This is the maximum number of events the sink will commit per transaction. The default batch size is 100 events.

Note: While this sink flushes all events in a transaction to HBase in one shot, Hbase does not guarantee atomic commits on multiple rows. So if a subset of events in a batch are written to disk by Hbase and Hbase fails, the flume transaction is rolled back, causing flume to write all the events in the transaction all over again, which will cause duplicates. The serializer is expected to take care of the handling of duplicates etc. HBase also does not support batch increments, so if multiple increments are returned by the serializer, then HBase failure will cause them to be re-written, when HBase comes back up.


Nested Class Summary
 
Nested classes/interfaces inherited from interface org.apache.flume.Sink
Sink.Status
 
Constructor Summary
HBaseSink()
           
HBaseSink(org.apache.hadoop.conf.Configuration conf)
           
 
Method Summary
 void configure(Context context)
           Request the implementing class to (re)configure itself.
 org.apache.hadoop.conf.Configuration getConfig()
           
 Sink.Status process()
          Requests the sink to attempt to consume data from attached channel
 void start()
           Starts a service or component.
 void stop()
           Stops a service or component.
 
Methods inherited from class org.apache.flume.sink.AbstractSink
getChannel, getLifecycleState, getName, setChannel, setName, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

HBaseSink

public HBaseSink()

HBaseSink

public HBaseSink(org.apache.hadoop.conf.Configuration conf)
Method Detail

start

public void start()
Description copied from interface: LifecycleAware

Starts a service or component.

Implementations should determine the result of any start logic and effect the return value of LifecycleAware.getLifecycleState() accordingly.

Specified by:
start in interface LifecycleAware
Overrides:
start in class AbstractSink

stop

public void stop()
Description copied from interface: LifecycleAware

Stops a service or component.

Implementations should determine the result of any stop logic and effect the return value of LifecycleAware.getLifecycleState() accordingly.

Specified by:
stop in interface LifecycleAware
Overrides:
stop in class AbstractSink

configure

public void configure(Context context)
Description copied from interface: Configurable

Request the implementing class to (re)configure itself.

When configuration parameters are changed, they must be reflected by the component asap.

There are no thread safety guarrantees on when configure might be called.

Specified by:
configure in interface Configurable

getConfig

public org.apache.hadoop.conf.Configuration getConfig()

process

public Sink.Status process()
                    throws EventDeliveryException
Description copied from interface: Sink

Requests the sink to attempt to consume data from attached channel

Note: This method should be consuming from the channel within the bounds of a Transaction. On successful delivery, the transaction should be committed, and on failure it should be rolled back.

Specified by:
process in interface Sink
Returns:
READY if 1 or more Events were successfully delivered, BACKOFF if no data could be retrieved from the channel feeding this sink
Throws:
EventDeliveryException - In case of any kind of failure to deliver data to the next hop destination.


Copyright © 2009-2014 Apache Software Foundation. All Rights Reserved.