Apache Flume Apache Software Foundation

Version 1.4.0ΒΆ

Status of this release

Apache Flume 1.4.0 is the fourth release of Flume as an Apache top-level project (TLP). Apache Flume 1.4.0 is production-ready software.

Release Documentation

Changes

Release Notes - Flume - Version v1.4.0

  • New Feature
    • [FLUME-924] - Implement a JMS source for Flume NG
    • [FLUME-997] - Support secure transport mechanism
    • [FLUME-1502] - Support for running simple configurations embedded in host process
    • [FLUME-1516] - FileChannel Write Dual Checkpoints to avoid replays
    • [FLUME-1632] - Persist progress on each file in file spooling client/source
    • [FLUME-1735] - Add support for a plugins.d directory
    • [FLUME-1894] - Implement Thrift RPC
    • [FLUME-1917] - FileChannel group commit (coalesce fsync)
    • [FLUME-2010] - Support Avro records in Log4jAppender and the HDFS Sink
    • [FLUME-2048] - Avro container file deserializer
    • [FLUME-2070] - Add a Flume Morphline Solr Sink
  • Improvement
    • [FLUME-1076] - Sink batch sizes vary wildy
    • [FLUME-1100] - HDFSWriterFactory and HDFSFormatterFactory should allow extension
    • [FLUME-1571] - Channels should check for positive capacity and transaction capacity values
    • [FLUME-1586] - File Channel should support verifying integrity of individual events.
    • [FLUME-1652] - Logutils.getLogs could NPE
    • [FLUME-1661] - ExecSource cannot execute complex Unix commands
    • [FLUME-1677] - Add File-channel dependency to flume-ng-node’s pom.xml
    • [FLUME-1699] - Make the rename of the meta file platform neutral
    • [FLUME-1702] - HDFSEventSink should write to a hidden file as opposed to a .tmp file
    • [FLUME-1740] - Remove contrib/ directory from Flume NG
    • [FLUME-1745] - FlumeConfiguration Eats Exceptions
    • [FLUME-1756] - Avro client should be able to use load balancing RPC
    • [FLUME-1757] - Improve configuration of hbase serializers
    • [FLUME-1762] - File Channel should recover automatically if the checkpoint is incomplete or bad by deleting the contents of the checkpoint directory
    • [FLUME-1768] - Multiplexing channel selector should allow optional-only channels
    • [FLUME-1769] - Replicating channel selector should support optional channels
    • [FLUME-1770] - Flume should have serializer which supports serializer the headers to a simple string
    • [FLUME-1777] - AbstractSource does not provide enough implementation for sub-classes
    • [FLUME-1790] - Commands in EncryptionTestUtils comments require high encryption pack to be installed
    • [FLUME-1794] - FileChannel check for full disks in the background
    • [FLUME-1800] - Docs for spooling source durability changes
    • [FLUME-1808] - ElasticSearchSink is missing log4.properties
    • [FLUME-1821] - Support configuration of hbase instances to be used in AsyncHBaseSink from flume config
    • [FLUME-1847] - NPE in SourceConfiguration
    • [FLUME-1848] - HDFSDataStream logger is actually for a sequence file
    • [FLUME-1855] - Sequence gen source should be able to stop after a fixed number of events
    • [FLUME-1864] - Allow hdfs idle callback to clean up closed bucket writers
    • [FLUME-1874] - Ship with log4j.properties file that has a reliable time based rolling policy
    • [FLUME-1876] - Document hadoop dependency of FileChannel when used with EmbeddedAgent
    • [FLUME-1878] - FileChannel replay should print status every 10000 events
    • [FLUME-1886] - Add a JMS enum type to SourceType so that users don’t need to enter FQCN for JMSSource
    • [FLUME-1889] - Add HBASE and ASYNC_HBASE enum types to SinkType so that users don’t need to enter FQCNs
    • [FLUME-1906] - Ability to disable WAL for put operation in HBaseSink
    • [FLUME-1915] - Enhance NettyAvroRpcClient and the use of NettyServer to optionally use compression
    • [FLUME-1926] - Optionally timeout Avro Sink Rpc Clients to avoid stickiness
    • [FLUME-1940] - Log a snapshot of Flume metrics on shutdown
    • [FLUME-1945] - HBase Serializer allow key from regular expression group
    • [FLUME-1976] - JMS Source document should provide instruction on JMS implementation jars
    • [FLUME-1977] - JMS Source connectionFactory property is not documented
    • [FLUME-1992] - ElasticSearch dependency is marked optional
    • [FLUME-1994] - Add ELASTICSEARCH enum type to SinkType to eliminate need for FQCN in agent configuration files
    • [FLUME-2004] - Need to capture metrics on the Flume exec source such as events received, rejected, etc.
    • [FLUME-2005] - Minor improvements to Flume assembly config
    • [FLUME-2008] - it would be very convenient to have a fat jar of flume-ng-log4jappender
    • [FLUME-2009] - Flume project throws error when imported into Eclipse IDE (Juno)
    • [FLUME-2013] - Parametrize java source and target version in the main pom file
    • [FLUME-2015] - ElasticSearchSink: need access to IndexRequestBuilder instance during flume event processing
    • [FLUME-2046] - Typo in HBaseSink java doc
    • [FLUME-2049] - Compile ElasticSearchSink with elasticsearch 0.90
    • [FLUME-2062] - make it possible for HBase sink to deposit event headers into corresponding column qualifiers
    • [FLUME-2063] - Add Configurable charset to RegexHbaseEventSerializer
    • [FLUME-2076] - JMX metrics support for HTTP Source
    • [FLUME-2093] - binary tarball that is created by flume’s assembly shouldn’t contain sources
    • [FLUME-2100] - Increase default batchSize of Morphline Solr Sink
    • [FLUME-2105] - Add docs for MorphlineSolrSink
  • Bug
    • [FLUME-1110] - HDFS Sink throws IllegalStateException when flume-daemon shuts down
    • [FLUME-1153] - flume-ng script is missing some agent options in help output
    • [FLUME-1175] - RollingFileSink complains of Bad File Descriptor upon a reconfig event
    • [FLUME-1262] - Move doc generation to a different profile
    • [FLUME-1285] - FileChannel has a dependency on Hadoop IO classes
    • [FLUME-1296] - Lifecycle supervisor should check if the monitor service is still running before supervising
    • [FLUME-1511] - Scribe-source doesn’t handle zero message request correctly.
    • [FLUME-1676] - ExecSource should provide a configurable charset
    • [FLUME-1688] - Bump AsyncHBase version to 1.4.1
    • [FLUME-1709] - HDFS CompressedDataStream doesn’t support serializer parameter
    • [FLUME-1720] - LICENSE file contain entry for protobuf-<version>.jar, however proper artifact name is protobuf-java-<version>.jar
    • [FLUME-1731] - SpoolableDirectorySource should have configurable support for deleting files it has already completed instead of renaming
    • [FLUME-1741] - ElasticSearch tests leave directory data/elasticsearch/nodes/ lying around
    • [FLUME-1748] - HDFS Sink should check if the thread is interrupted before performing any HDFS operations
    • [FLUME-1755] - Load balancing RPC client has issues with downed hosts
    • [FLUME-1766] - AvroSource throws confusing exception when configured without a port
    • [FLUME-1772] - AbstractConfigurationProvider should remove component which throws exception from configure method.
    • [FLUME-1773] - File Channel worker thread should not be daemon
    • [FLUME-1774] - EventBackingStoreFactory error message asks user to delete checkpoint which is now done automatically
    • [FLUME-1775] - FileChannel Log Background worker should catch Throwable
    • [FLUME-1776] - Several modules require commons-lang but do not declare this in the pom
    • [FLUME-1778] - Upgrade Flume to use Avro 1.7.3
    • [FLUME-1784] - JMSource fix minor documentation problem and parameter name
    • [FLUME-1788] - Flume Thrift source can fail intermittently because of a race condition in Thrift server implementation on some Linux systems
    • [FLUME-1789] - Unit tests TestJCEFileKeyProvider and TestFileChannelEncryption fail with IBM JDK and flume-1.3.0
    • [FLUME-1795] - Flume thrift legacy source does not have proper logging configured
    • [FLUME-1797] - TestFlumeConfiguration is in com.apache.flume.conf namespace.
    • [FLUME-1799] - Generated source tarball is missing flume-ng-embedded-agent
    • [FLUME-1802] - Missing parameter –conf in example of the Flume User Guide
    • [FLUME-1803] - Generated dist tarball is missing flume-ng-embedded-agent
    • [FLUME-1804] - JMS source not included in binary dist
    • [FLUME-1805] - Embedded agent deps should be specified in dependencyManagement section of pom
    • [FLUME-1818] - Support various layouts in log4jappender
    • [FLUME-1819] - ExecSource don’t flush the cache if there is no input entries
    • [FLUME-1820] - Should not be possible for RPC client to block indefinitely on close()
    • [FLUME-1822] - Update javadoc for FlumeConfiguration
    • [FLUME-1823] - LoadBalancingRpcClient method must throw exception if it is called after close is called.
    • [FLUME-1824] - Inflights can complete successfully even if checkpoint fails
    • [FLUME-1828] - ResettableInputStream should support seek()
    • [FLUME-1834] - Userguide on trunk is missing some memory channel props
    • [FLUME-1835] - Flume User Guide has wrong prop in Load Balancing Sink Selector
    • [FLUME-1844] - HDFSEventSink should have option to use RawLocalFileSystem
    • [FLUME-1845] - Document plugin.d directory structure
    • [FLUME-1849] - Embedded Agent doesn’t shutdown supervisor
    • [FLUME-1852] - Issues with EmbeddedAgentConfiguration
    • [FLUME-1854] - Application class can deadlock if stopped immediately after start
    • [FLUME-1863] - EmbeddedAgent pom must pull in file channel
    • [FLUME-1865] - Rename the Sequence File formatters to Serializer to be consistent with the rest of Flume
    • [FLUME-1866] - ChannelProcessor is not logging ChannelExceptions.
    • [FLUME-1867] - There’s no option to set hostname for HTTPSource
    • [FLUME-1868] - FlumeUserGuide mentions wrong FQCN for JSONHandler
    • [FLUME-1869] - Request to add “HTTP” source type to SourceType.java
    • [FLUME-1870] - Flume sends non-numeric values with type as float to Ganglia causing ganglia to crash
    • [FLUME-1872] - SpoolingDirectorySource doesn’t delete tracker file when deletePolicy is “immediate”
    • [FLUME-1879] - Secure HBase documentation
    • [FLUME-1880] - Double-logging of created HDFS files
    • [FLUME-1882] - Allow case-insensitive deserializer value for SpoolDirectorySource
    • [FLUME-1890] - Flume should set the hbase keytab and principal in HBase conf object.
    • [FLUME-1891] - Fast replay runs even when checkpoint exists.
    • [FLUME-1893] - File Channel could miss possible checkpoint corruption
    • [FLUME-1911] - Add deprecation back to the legacy thrift code
    • [FLUME-1916] - HDFS sink should poll for # of active replicas. If less than required, roll the file.
    • [FLUME-1918] - File Channel cannot handle capacity of more than 500 Million events
    • [FLUME-1922] - HDFS Sink should optionally insert the timestamp at the sink
    • [FLUME-1924] - Bug in serializer context parsing in RollingFileSink
    • [FLUME-1925] - HDFS timeouts should not starve other threads
    • [FLUME-1929] - CheckpointRebuilder main method does not work
    • [FLUME-1930] - Inflights should clean up executors on close.
    • [FLUME-1931] - HDFS Sink has a commons-lang dependency which is missing in pom
    • [FLUME-1932] - no-reload-conf command line param does not work
    • [FLUME-1937] - Issue with maxUnderReplication in HDFS sink
    • [FLUME-1939] - FlumeEventQueue must check if file is open before setting the length of the file
    • [FLUME-1943] - ExecSource tests failing on Jenkins
    • [FLUME-1948] - plugins.d directory(ies) should be separately overridable, independent of FLUME_HOME
    • [FLUME-1949] - Documentation for sink processor lists incorrect default
    • [FLUME-1955] - fileSuffix does not work with compressed streams
    • [FLUME-1958] - Remove attlasian-ide-plugin.xml from the repo
    • [FLUME-1964] - hdfs sink depends on commons-io but does not specify it in the pom
    • [FLUME-1965] - Thrift sink alias doesn’t exist
    • [FLUME-1969] - Update user Guide to explain the purpose of minimumRequiredSpace setting for FileChannel
    • [FLUME-1974] - Thrift compatibility issue with hbase-0.92
    • [FLUME-1975] - Use TThreadedSelectServer in ThriftSource if it is available
    • [FLUME-1980] - Log4jAppender should optionally drop events if append fails
    • [FLUME-1981] - Rpc client expiration can be done in a more thread-safe way
    • [FLUME-1986] - doTestInflightCorrupts should not commit transactions
    • [FLUME-1993] - On Windows, when using the spooling directory source, there is a file sharing violation when trying to delete tracker file
    • [FLUME-2002] - Flume RPC Client creates 2 threads per each log attempt if the remote flume agent goes down
    • [FLUME-2011] - “mvn test” fails
    • [FLUME-2012] - Two tests fail on Mac OS (saying they fail to load native library) with Java 7
    • [FLUME-2014] - Race condition when using local timestamp with BucketPath
    • [FLUME-2023] - Flume must login to secure HBase before creating the HTable instance
    • [FLUME-2025] - ThriftSource throws NPE in stop() if start() failed because socket open failed or if thrift server instance creation threw.
    • [FLUME-2026] - TestHTTPSource should use any available port rather than a hardcoded port number
    • [FLUME-2027] - Check for default replication fails on federated cluster in hdfs sink
    • [FLUME-2032] - HDFSEventSink doesn’t work in Windows
    • [FLUME-2036] - Make hostname optional for HTTPSource
    • [FLUME-2042] - log4jappender timeout should be configurable
    • [FLUME-2043] - JMS Source removed on failure to create configuration
    • [FLUME-2044] - HDFS Sink impersonation fails after the first file
    • [FLUME-2051] - Surefire 2.12 cannot run a single test on Windows. Upgrade to 2.12.3
    • [FLUME-2054] - Support Version Info on Windows and fix failure of TestVersionInfo
    • [FLUME-2057] - Failures in FileChannel’s TestEventQueueBackingStoreFactory on Windows
    • [FLUME-2060] - Failure in TestLog.testReplaySucceedsWithUnusedEmptyLogMetaDataFastReplay test on Windows
    • [FLUME-2072] - JMX metrics support for HBase Sink
    • [FLUME-2081] - JMX metrics support for SpoolDir
    • [FLUME-2082] - JMX support for Seq Generator Source
    • [FLUME-2083] - Avro Source should not start if SSL is enabled and keystore cannot be opened
    • [FLUME-2098] - Make Solr sink depend on the CDK version of morphlines
  • Documentation
    • [FLUME-1621] - Document new MemoryChannel parameters in Flume User Guide
    • [FLUME-1910] - Add thrift RPC documentation
    • [FLUME-1953] - Fix dev guide error that says sink can read from multiple channels
    • [FLUME-1962] - Document proper specification of lzo codec as lzop in Flume User Guide
    • [FLUME-1979] - Wrong propname for connection reset interval in avro sink
    • [FLUME-2030] - Documentation of Configuration Changes JMSSource, HBaseSink, AsyncHBaseSink and ElasticSearchSink
  • Task
    • [FLUME-1686] - Exclude target directories & Eclipse files from rat checks
    • [FLUME-2094] - Remove the deprecated - Recoverable Memory Channel
  • Sub-task