Apache Flume Apache Software Foundation

Version 1.2.0ΒΆ

Status of this release

This release is the first release of Apache Flume as an Apache top level project and is also the first release that is considered ready for production use. As compared to the previous releases, it has substantially more features and functionality along with bug fixes and other enhancements.

Release Documentation

Changes

** New Feature
  • [FLUME-896] - Implement a Durable Memory Channel
  • [FLUME-971] - Create developer guide for Flume NG
  • [FLUME-988] - Client SDK
  • [FLUME-1085] - Implement a durable FileChannel
  • [FLUME-1157] - Implement Interceptors (previously known as Decorators) for Flume 1.x
  • [FLUME-1183] - Implement an HBase Sink which supports table level access
  • [FLUME-1215] - Implement Timestamp Interceptor
  • [FLUME-1252] - Asynchronous HBase Sink
** Improvement
  • [FLUME-828] - LoggerSink representation of the event’s body isn’t too useful
  • [FLUME-881] - Would be nice if HDFS Sink would automatically choose best writableFormat based on fileType
  • [FLUME-979] - ExecSource should optionally restart the command when it exits
  • [FLUME-985] - All HDFS Operations in HDFSEventSink should have a timeout
  • [FLUME-1001] - Support custom processors
  • [FLUME-1011] - AvroSource should have a configurable max thread count
  • [FLUME-1020] - Implement Kerberos security for HDFS Sink
  • [FLUME-1030] - Retry logic for failover sink processor to handle downstream exceptions in a predictable manner.
  • [FLUME-1032] - Fix / clean up Flume NG build
  • [FLUME-1043] - SDK should mark slf4j deps as optional
  • [FLUME-1048] - speed up mvn package: stop building .zip packages
  • [FLUME-1049] - Use hadoop-1.0.0 as basis for default Flume build instead of 0.20.205
  • [FLUME-1078] - flume-ng script has no way to add, not replace, classpath
  • [FLUME-1090] - JDBC Channel: Minimize logging under nominal conditions
  • [FLUME-1117] - Support output to files in Avro container format
  • [FLUME-1122] - Flume documentation layout should be refactored
  • [FLUME-1126] - Support RFC 3164 and 5424 syslog format timestamp parsing
  • [FLUME-1127] - Add configuration support to AbstractAvroEventSerializer for Avro sync interval and compression support
  • [FLUME-1132] - HDFSEventSink has spurious and verbose log message
  • [FLUME-1140] - Adding Xms value in flume-env.sh
  • [FLUME-1160] - ComponentConfigurationFactory catches NullPointerException
  • [FLUME-1196] - Allow different HDFS Sinks within the same agent to write to HDFS as different users
  • [FLUME-1198] - Implement a load-balancing sink processor
  • [FLUME-1212] - Flume should pick HBase jars from HBASE_HOME
  • [FLUME-1238] - Support active rolling of files created by HDFS Event Sink
  • [FLUME-1242] - Make flume user & dev guides easily editable
  • [FLUME-1275] - Add Regex Serializer for HBaseSink
  • [FLUME-1287] - Add Standalone Example to Docs
  • [FLUME-1330] - Avro Source should not use Fixed thread pool for boss threads when pool size is specified
  • [FLUME-1338] - Produce helpful error message in case that timestamp header is missing when time based bucketing is in use
  • [FLUME-1343] - Improve user guide
  • [FLUME-1345] - Use apache-flume for the artifact instead of flume-ng-dist
  • [FLUME-1351] - Add release version to Flume documentation
** Bug
  • [FLUME-862] - AvroSource breaks when config properties changes different service
  • [FLUME-1002] - FailoverSinkProcessor replaces sinks with same priority
  • [FLUME-1017] - syslog classes missing
  • [FLUME-1026] - Document Thread Safety Guarantees
  • [FLUME-1027] - Missing log4j library in Flume distribution
  • [FLUME-1031] - Depricate code generated by Thrift and Avro OG sources that is under com.cloudera package
  • [FLUME-1035] - slf4j error in flume sdk unit tests
  • [FLUME-1036] - Reconfiguration of AVRO or NETCAT source causes port bind exception
  • [FLUME-1037] - NETCAT handler theads terminate under stress test
  • [FLUME-1040] - Release-Notes says Apache Ivy instead of Apache Flume
  • [FLUME-1041] - maven warns of duplicate dependencies
  • [FLUME-1046] - invoking flume-ng script from bin directory fails
  • [FLUME-1047] - Client SDK has dependency on apache commons
  • [FLUME-1070] - Fix javadoc for configuring hdfsSink
  • [FLUME-1074] - AvroSink if any non-caught exception is thrown, an exception is thrown in finally clause
  • [FLUME-1075] - HDFSEventSink begin is called when transaction opened due to other error
  • [FLUME-1079] - Flume agent reconfiguration enters permanent bad state
  • [FLUME-1080] - Issue with HDFSEventSink for append support
  • [FLUME-1083] - Why does flume binary archive produces the following empty directories: bin/{ia64,amd64} ?
  • [FLUME-1087] - Restore Client API compat with v1.1.0
  • [FLUME-1088] - TestWAL.testThreadedAppend fails on jenkins build server
  • [FLUME-1094] - hadoop.profile=23 build is broken by slf4j-jcl dependencies
  • [FLUME-1096] - Add support to pass headers through AvroCLIClient
  • [FLUME-1098] - Hadoop jars from compilation step included in assembly build
  • [FLUME-1099] - copy-paste issue with RecoverableMemoryChannel
  • [FLUME-1102] - HDFSEventSink rollInterval is broken
  • [FLUME-1104] - HDFS rolls the first file incorrectly
  • [FLUME-1108] - FILE_ROLL sink doesn’t accept value 0 for unlimited wait time before rolling file
  • [FLUME-1109] - Syslog sources need to be refactored
  • [FLUME-1112] - HDFSCompressedDataStream append does not work
  • [FLUME-1114] - Syslog Sources does not implement maxsize
  • [FLUME-1116] - Extra event created for max payload size of 2500 bytes in Flume syslogtcp source
  • [FLUME-1119] - Remove default ports for syslog sources
  • [FLUME-1121] - Recoverable Memory Channel cannot recover data
  • [FLUME-1124] - Lifecycle supervisor can cause thread contention, sometimes causing components to not startup.
  • [FLUME-1125] - flume-ng script allows flume-env.sh to clobber some command-line arguments
  • [FLUME-1128] - Conf poller should use schedule with fixed delay
  • [FLUME-1129] - change foo to agent in sample config
  • [FLUME-1130] - flume-ng script bad ordering on FLUME_HOME var
  • [FLUME-1135] - flume-docs exclude is not sufficient for rat
  • [FLUME-1136] - Remove from executor service does not always remove the runnables from the queue
  • [FLUME-1142] - Seq source fails with multiplexing channel selector
  • [FLUME-1148] - Refactor logging
  • [FLUME-1149] - All sources get same channel list even if configuration is different.
  • [FLUME-1154] - flume-ng script should first try finding java from PATH and then try using bigtop, instead of vice-versa
  • [FLUME-1156] - If config file has empty sources, then throws NPE
  • [FLUME-1163] - HDFSEventSink leaves .tmp files in place when Flume is stopped
  • [FLUME-1164] - Configure should be called after stopping all events.
  • [FLUME-1177] - Maven deps on flume-ng-configuration module are brought in transitively instead of directly
  • [FLUME-1180] - ChannelSelectorFactory creates incorrect selector for multiplexing selector type
  • [FLUME-1181] - Context must enforce dot-separated prefix for sub-properties.
  • [FLUME-1182] - Syslog source cannot read format correctly from configuration
  • [FLUME-1184] - TestFileChannel.testThreaded fails sometimes
  • [FLUME-1188] - TestRecoverableMemoryChannel.testThreaded can fail sometimes
  • [FLUME-1190] - DurableFileChannel requires FILE enum definition in ChannelConfigurationType
  • [FLUME-1194] - RecoverableMemoryChannel prop misspelled – “rentention” should be “retention”
  • [FLUME-1200] - HDFSEventSink causes *.snappy file to be created in HDFS even when snappy isn’t used (due to missing lib)
  • [FLUME-1202] - Too many approved licenses
  • [FLUME-1204] - Add more unit tests for hbase sink
  • [FLUME-1205] - NPE related to checkpointing when using FileChannel
  • [FLUME-1213] - HDFS sink should allow bucketpath rounding down.
  • [FLUME-1216] - Need useful error message when keytab does not exist
  • [FLUME-1217] - HDFS Event Sink generates warnings due to recent change
  • [FLUME-1219] - Race conditions in BucketWriter / HDFSEventSink
  • [FLUME-1220] - Load balancing channel selector needs to be in the configuration type
  • [FLUME-1221] - ThriftLegacySource doesn’t handle fields -> headers conversions for bytebuffers
  • [FLUME-1226] - FailoverRpcClient should check for NULL batch-size property
  • [FLUME-1229] - System.nanoTime incorrectly used in filename for HDFS file rolling
  • [FLUME-1230] - Sink gets initialized even when not active
  • [FLUME-1231] - Deadlock between BucketWriter and LeaseChecker on shutdown
  • [FLUME-1232] - Cannot start agent a 3rd time when using FileChannel
  • [FLUME-1234] - Can’t use %P escape sequence for bucket path of HDFS sink
  • [FLUME-1236] - File channel has a race condition between start and create transaction method
  • [FLUME-1240] - Add version info to Flume NG
  • [FLUME-1241] - Flume dist should include the flume-ng-doc directory
  • [FLUME-1244] - Implement a load-balancing RpcClient with round/robin and random distribution capabilties.
  • [FLUME-1245] - HDFSCompressedDataStream calls finish() on sync instead of flush()
  • [FLUME-1246] - FileChannel hangs silently when Hadoop libs not found
  • [FLUME-1248] - flume-ng script gets broken when it tried to load hbase classpath
  • [FLUME-1253] - Support for running integration tests
  • [FLUME-1254] - RpcClient can hang when communication is broken with the source.
  • [FLUME-1270] - Incorrect default hdfs.callTimeout and hdfs.fileType of HDFSEventSink in FlumeUserGuide.rst
  • [FLUME-1271] - Incorrect configuration causes NPE
  • [FLUME-1280] - Make all config properties of Hbase sinks public constants
  • [FLUME-1284] - Need host interceptor for hdfs bucket path escape sequence
  • [FLUME-1288] - Async hbase sink should throw exception when hbase reports failure and check hbase table correctness
  • [FLUME-1290] - HDFS Sink should accept fileType parameters of arbitrary case
  • [FLUME-1297] - Flume tests should wait for a few seconds for agent to start.
  • [FLUME-1301] - HDFSCompressedDataStream can lose data
  • [FLUME-1303] - java.library.path value is being truncated at first ‘n’ char
  • [FLUME-1304] - Allow for faster allocation of checkpoint file.
  • [FLUME-1306] - LoadBalancingRpcClient should catch exception for invalid RpcClient and failover to valid one
  • [FLUME-1309] - Integration tests not included in assembly build artifacts
  • [FLUME-1312] - Host interceptor should support custom headers
  • [FLUME-1314] - File channel log file can grow beyond max size which causes startup failure
  • [FLUME-1315] - Null sink should support batching
  • [FLUME-1316] - AvroSink should be configurable for connect-timeout and request-timeout
  • [FLUME-1317] - Assembly build pulls in target folder from flume-ng-tests
  • [FLUME-1319] - File Channel optimize replay of logs when a checkpoint is present
  • [FLUME-1320] - Add safeguard for checkpoint corruption detection
  • [FLUME-1322] - ChannelProcessor should catch Throwable to work around close() clobbering uncaught Exceptions
  • [FLUME-1323] - Remove shutdown hook from FileChannel
  • [FLUME-1324] - File Channel Log can contain unassigned blocks
  • [FLUME-1325] - Components should be stopped in the reverse order that they were started
  • [FLUME-1327] - File Channel can deadlock in when checkpoint happens in between a put/take/commit
  • [FLUME-1329] - AvroSink can hang during Avro RPC handshake
  • [FLUME-1331] - Start method of components throwing NoClassDefFoundError are not caught
  • [FLUME-1333] - Disable running of saveVersion.sh on Windows
  • [FLUME-1341] - Build fails on jenkins because a file exists in the environment
  • [FLUME-1344] - AvroSink JMX does not report connection created count accurately
  • [FLUME-1346] - Build warning from missing maven-sphinx version in reporting section
  • [FLUME-1347] - Deprecate RecoverableMemoryChannel
  • [FLUME-1348] - Update the documentation, correcting links and removing incubation.
  • [FLUME-1349] - Document Hbase sinks.
  • [FLUME-1352] - Add documentation for HDFS path rounddown.
  • [FLUME-1355] - Improve user guide section about sink processors
  • [FLUME-1356] - Document interceptors
** Task
  • [FLUME-840] - Update project committers in pom file
  • [FLUME-991] - Make flume configuration validation component specific at time rather than at runtime
  • [FLUME-1028] - Fix jenkins build after addition of submodule
  • [FLUME-1050] - Update version of surefire plugin
  • [FLUME-1073] - Default Log4j configuration file should have a rolling policy
  • [FLUME-1082] - Add User and dev guide to Flume site
  • [FLUME-1151] - Exclude docs directory from rat
  • [FLUME-1189] - Test ReoverableMemoryChannel throughput versus FileChannel
  • [FLUME-1300] - Update user guide for File Channel
  • [FLUME-1353] - Ensure license headers are consistent
** Sub-task
  • [FLUME-748] - Create metric collection infrastructure
  • [FLUME-962] - Failover capability for Client SDK
  • [FLUME-992] - Create configuration stubs for sources, channels, sinks etc
  • [FLUME-999] - Updating init scripts and variables to fit the term agent
  • [FLUME-1052] - Core configuration component
  • [FLUME-1053] - Basic SourceConfiguration
  • [FLUME-1054] - Basic ChannelConfiguration
  • [FLUME-1055] - Basic SinkConfiguration
  • [FLUME-1105] - Allow the optional disabling of foreign keys
  • [FLUME-1107] - Configuration keys for JDBC channel contain redundant prefix.
  • [FLUME-1113] - JDBC Channel invokes size query on every put