Title Apache Flume : Distributed Log Collection for Hadoop. [O'Reilly electronic resource]

Imprint

Packt Publishing, 2013.

To Access:
Available on O’Reilly for Public Libraries

Bookmark link: https://library.naperville-lib.org:444/record=b3163711~S1

QR Code

Description	1 online resource
	text file rda
Series	Community experience distilled
	Community experience distilled.
Summary	A starter guide that covers Apache Flume in detail. Apache Flume: Distributed Log Collection for Hadoop is intended for people who are responsible for moving datasets into Hadoop in a timely and reliable manner like software engineers, database administrators, and data warehouse administrators.
Contents	Cover; Copyright; Credits; About the Author; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Overview and Architecture; Flume 0.9; Flume 1.X (Flume-NG); The problem with HDFS and streaming data/logs; Sources, channels, and sinks; Flume events; Interceptors, channel selectors, and sink processors; Tiered data collection (multiple flows and/or agents); Chapter 2: Flume Quick Start; Downloading Flume; Flume in Hadoop distributions; Flume configuration file overview; Starting up with Hello World -- Summary; Chapter 3: Channels; Memory channel; File channel; Summary.
	Chapter 4: Sinks and Sink ProcessorsHDFS sink; Path and filename; File rotation; Compression codecs; Event serializers; Text output; Text with headers; Apache Avro; File type; Sequence file; Data stream; Compressed stream; Timeouts and workers; Sink groups; Load balancing; Failover; Summary; Chapter 5: Sources and Channel Selectors; The problem with using tail; The exec source; The spooling directory source; Syslog sources; The syslog UDP source; The syslog TCP source; The multiport syslog TCP source; Channel selectors; Replicating; Multiplexing; Summary.
	Chapter 6: Interceptors, ETL, and RoutingInterceptors; Timestamp; Host; Static; Regular expression filtering; Regular expression extractor; Custom interceptors; Tiering data flows; Avro Source/Sink; Command-line Avro; Log4J Appender; The Load Balancing Log4J Appender; Routing; Summary; Chapter 7: Monitoring Flume; Monitoring the agent process; Monit; Nagios; Monitoring performance metrics; Ganglia; The internal HTTP server; Custom monitoring hooks; Summary; Chapter 8: There Is No Spoon -- The Realities of Real-time Distributed Data Collection; Transport time versus log time.
	Time zones are evilCapacity planning; Considerations for multiple data centers; Compliance and data expiry; Summary; Index.
Subject	Apache Hadoop.
	Apache Hadoop.
	Apache Hadoop.
	File organization (Computer science)
	Electronic data processing -- Distributed processing.
	Fichiers (Informatique) -- Organisation.
	Traitement réparti.
	Electronic data processing -- Distributed processing.
	File organization (Computer science)
Genre	Llibres electrònics.
Other Form:	Print version: 9781299735149
ISBN	1299735142 (ebk)
	9781299735149 (ebk)
	9781782167921
	1782167927
	1782167919
	9781782167914

Patron reviews: add a review

You can...

Add to My Lists

Save this record

For Staff

Also...

More Information

More Resources

Copies