Naperville Public Library / All Locations

LEADER 00000cam a2200805 i 4500 
003    OCoLC 
005    20240129213017.0 
006    m     o  d         
007    cr cnu---unuuu 
008    180702s2018    cau     ob    000 0 eng d 
015    GBB8M4619|2bnb 
016 7  019140156|2Uk 
019    1047669376|a1055400060|a1066575257|a1081290112|a1082143752
       |a1086447627|a1113621640 
020    9781484235225|q(electronic bk.) 
020    1484235223|q(electronic bk.) 
020    1484235215 
020    9781484235218 
024 7  10.1007/978-1-4842-3522-5|2doi 
029 1  AU@|b000063679146 
029 1  AU@|b000067503457 
029 1  CHNEW|b001063578 
029 1  CHVBK|b575141417 
029 1  UKMGB|b019140156 
035    (OCoLC)1042329316|z(OCoLC)1047669376|z(OCoLC)1055400060
       |z(OCoLC)1066575257|z(OCoLC)1081290112|z(OCoLC)1082143752
       |z(OCoLC)1086447627|z(OCoLC)1113621640 
037    com.springer.onix.9781484235225|bSpringer Nature 
040    N$T|beng|erda|epn|cN$T|dN$T|dEBLCP|dGW5XE|dUAB|dUPM|dOCLCF
       |dOCLCQ|dVT2|dWYU|dOTZ|dLVT|dUKMGB|dU3W|dUMI|dG3B|dCAUOI
       |dSTF|dSNK|dYOU|dK6U|dMERER|dOCLCQ|dCOO|dOCLCQ|dUHL|dUKAHL
       |dOCLCQ|dBRF|dOCLCQ|dOCLCO|dCOM|dOCLCQ|dYDX|dOCLCQ|dOCLCO 
049    INap 
082 04 004.36 
082 04 004.36|223 
099    eBook O'Reilly for Public Libraries 
100 1  Gupta, Saurabh,|eauthor. 
245 10 Practical Enterprise Data Lake Insights :|bhandle data-
       driven challenges in an Enterprise Big Data Lake /
       |cSaurabh Gupta, Venkata Giri.|h[O'Reilly electronic 
       resource] 
264  1 [Berkeley, CA] :|bApress,|c2018. 
300    1 online resource 
336    text|btxt|2rdacontent 
337    computer|bc|2rdamedia 
338    online resource|bcr|2rdacarrier 
347    text file 
347    |bPDF 
504    Includes bibliographical references. 
505 0  Intro; Table of Contents; About the Authors; About the 
       Technical Reviewer; Acknowledgments; Foreword; Chapter 1: 
       Introduction to Enterprise Data Lakes; Data explosion: the
       beginning; Big data ecosystem; Hadoop and MapReduce -- 
       Early days; Evolution of Hadoop; History of Data Lake; 
       Data Lake: the concept; Data lake architecture; Why Data 
       Lake?; Data Lake Characteristics; Data lake vs. Data 
       warehouse; How to achieve success with Data Lake?; Data 
       governance and data operations; Data democratization with 
       data lake; Fast Data -- Life beyond Big Data; Conclusion. 
505 8  Chapter 2: Data lake ingestion strategiesWhat is data 
       ingestion?; Understand the data sources; Structured vs. 
       Semi-structured vs. Unstructured data; Data ingestion 
       framework parameters; ETL vs. ELT; Big Data Integration 
       with Data Lake; Hadoop Distributed File System (HDFS); 
       Copy files directly into HDFS; Batched data ingestion; 
       Challenges and design considerations; Design 
       considerations; Commercial ETL tools; Real-time ingestion;
       CDC design considerations; Example of CDC pipeline: 
       Databus, LinkedIn's open-source solution; Apache Sqoop; 
       Sqoop 1; Sqoop 2; How Sqoop works? 
505 8  Sqoop design considerationsNative ingestion utilities; 
       Oracle copyToBDA; Greenplum gphdfs utility; Data transfer 
       from Greenplum to using gpfdist; Ingest unstructured data 
       into Hadoop; Apache Flume; Tiered architecture for 
       convergent flow of events; Features and design 
       considerations; Conclusion; Chapter 3: Capture Streaming 
       Data with Change-Data-Capture; Change Data Capture 
       Concepts; Strategies for Data Capture; Retention and 
       Replay; Retention Period; Types of CDC; Incremental; Bulk;
       Hybrid; CDC -- Trade-offs; CDC Tools; Challenges; 
       Downstream Propagation; Use Case. 
505 8  Centralization of Change DataAnalyzing a Centralized Data 
       Store; Metadata: Data about Data; Structure of Data; 
       Privacy/Sensitivity Information; Special Fields; Data 
       Formats; Delimited Format; Avro File Format; Consumption 
       and Checkpointing; Simple Checkpoint Mechanism; 
       Parallelism; Merging and Consolidation; Design 
       Considerations for Merge and Consolidate; Data Quality; 
       Challenges; Design Aspects; Operational Aspects; 
       Publishing to Kafka; Schema and Data; Sample Schema; 
       Schema Repository; Multiple Topics and Partitioning; 
       Sizing and Scaling; Tools; Conclusion. 
505 8  Chapter 4: Data Processing Strategies in Data 
       LakesMapReduce Processing Framework; Motivation: Why 
       MapReduce?; MapReduce V1 Refresher and Design 
       Considerations; Yet Another Resource Negotiator -- YARN; 
       YARN concepts; Hive; Hive -- Quick Refresher; Hive 
       Components; Hive Metastore (a.k.a. HCatalog); Hive -- 
       Design Considerations; Hive LLAP; Apache Pig; Pig 
       Execution Architecture; Apache Spark; Why Spark?; 
       Resilient Distributed Datasets (RDD); RDD Runtime 
       Components; RDD Composition; Datasets and DataFrames; 
       Bucketing, Sorting, and Partitioning; Deployment Modes of 
       Spark Application. 
520    Use this practical guide to successfully handle the 
       challenges encountered when designing an enterprise data 
       lake and learn industry best practices to resolve issues. 
       When designing an enterprise data lake you often hit a 
       roadblock when you must leave the comfort of the 
       relational world and learn the nuances of handling non-
       relational data. Starting from sourcing data into the 
       Hadoop ecosystem, you will go through stages that can 
       bring up tough questions such as data processing, data 
       querying, and security. Concepts such as change data 
       capture and data streaming are covered. The book takes an 
       end-to-end solution approach in a data lake environment 
       that includes data security, high availability, data 
       processing, data streaming, and more. Each chapter 
       includes application of a concept, code snippets, and use 
       case demonstrations to provide you with a practical 
       approach. You will learn the concept, scope, application, 
       and starting point. What You'll Learn: Get to know data 
       lake architecture and design principles Implement data 
       capture and streaming strategies Implement data processing
       strategies in Hadoop Understand the data lake security 
       framework and availability model. 
588 0  Online resource; title from PDF title page (EBSCO, viewed 
       July 5, 2018). 
590    O'Reilly|bO'Reilly Online Learning: Academic/Public 
       Library Edition 
650  0 Electronic data processing|xDistributed processing
       |xManagement. 
650  0 Big data. 
650  0 Information storage and retrieval systems. 
650  2 Information Systems 
650  6 Données volumineuses. 
650  6 Systèmes d'information. 
650  7 Information technology: general issues.|2bicssc 
650  7 Business mathematics & systems.|2bicssc 
650  7 Databases.|2bicssc 
650  7 Big data|2fast 
650  7 Electronic data processing|xDistributed processing
       |xManagement|2fast 
650  7 Information storage and retrieval systems|2fast 
700 1  Giri, Venkata,|eauthor. 
776 08 |iPrinted edition:|z9781484235218 
856 40 |uhttps://ezproxy.naperville-lib.org/login?url=https://
       learning.oreilly.com/library/view/~/9781484235225/?ar
       |zAvailable on O'Reilly for Public Libraries 
938    YBP Library Services|bYANK|n15575404 
938    Askews and Holts Library Services|bASKH|nAH35093466 
938    ProQuest Ebook Central|bEBLB|nEBL5438674 
938    EBSCOhost|bEBSC|n1840106 
994    92|bJFN