Description |
1 online resource |
|
text file rda |
Series |
Community experience distilled |
|
Community experience distilled.
|
Summary |
A starter guide that covers Apache Flume in detail. Apache Flume: Distributed Log Collection for Hadoop is intended for people who are responsible for moving datasets into Hadoop in a timely and reliable manner like software engineers, database administrators, and data warehouse administrators. |
Contents |
Cover; Copyright; Credits; About the Author; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Overview and Architecture; Flume 0.9; Flume 1.X (Flume-NG); The problem with HDFS and streaming data/logs; Sources, channels, and sinks; Flume events; Interceptors, channel selectors, and sink processors; Tiered data collection (multiple flows and/or agents); Chapter 2: Flume Quick Start; Downloading Flume; Flume in Hadoop distributions; Flume configuration file overview; Starting up with Hello World -- Summary; Chapter 3: Channels; Memory channel; File channel; Summary. |
|
Chapter 4: Sinks and Sink ProcessorsHDFS sink; Path and filename; File rotation; Compression codecs; Event serializers; Text output; Text with headers; Apache Avro; File type; Sequence file; Data stream; Compressed stream; Timeouts and workers; Sink groups; Load balancing; Failover; Summary; Chapter 5: Sources and Channel Selectors; The problem with using tail; The exec source; The spooling directory source; Syslog sources; The syslog UDP source; The syslog TCP source; The multiport syslog TCP source; Channel selectors; Replicating; Multiplexing; Summary. |
|
Chapter 6: Interceptors, ETL, and RoutingInterceptors; Timestamp; Host; Static; Regular expression filtering; Regular expression extractor; Custom interceptors; Tiering data flows; Avro Source/Sink; Command-line Avro; Log4J Appender; The Load Balancing Log4J Appender; Routing; Summary; Chapter 7: Monitoring Flume; Monitoring the agent process; Monit; Nagios; Monitoring performance metrics; Ganglia; The internal HTTP server; Custom monitoring hooks; Summary; Chapter 8: There Is No Spoon -- The Realities of Real-time Distributed Data Collection; Transport time versus log time. |
|
Time zones are evilCapacity planning; Considerations for multiple data centers; Compliance and data expiry; Summary; Index. |
Subject |
Apache Hadoop.
|
|
Apache Hadoop. |
|
Apache Hadoop. |
|
File organization (Computer science)
|
|
Electronic data processing -- Distributed processing.
|
|
Fichiers (Informatique) -- Organisation. |
|
Traitement réparti. |
|
Electronic data processing -- Distributed processing. |
|
File organization (Computer science) |
Genre |
Llibres electrònics. |
Other Form: |
Print version: 9781299735149 |
ISBN |
1299735142 (ebk) |
|
9781299735149 (ebk) |
|
9781782167921 |
|
1782167927 |
|
1782167919 |
|
9781782167914 |
|