Library Hours
Monday to Friday: 9 a.m. to 9 p.m.
Saturday: 9 a.m. to 5 p.m.
Sunday: 1 p.m. to 9 p.m.
Naper Blvd. 1 p.m. to 5 p.m.

LEADER 00000cam a2200625Ii 4500 
001    976408019 
003    OCoLC 
005    20240129213017.0 
006    m     o  d         
007    cr unu|||||||| 
008    170317s2017    enkab   o     001 0 eng d 
019    1081417339 
020    9781786466259|q(electronic bk.) 
020    1786466252|q(electronic bk.) 
029 1  GBVCP|b897169743 
035    (OCoLC)976408019|z(OCoLC)1081417339 
037    CL0500000840|bSafari Books Online 
037    978A042E-251E-4460-88A6-41FFF582EF91|bOverDrive, Inc.
       |nhttp://www.overdrive.com 
040    UMI|beng|erda|epn|cUMI|dTEFOD|dOCLCF|dIDEBK|dSTF|dTOH
       |dOCLCQ|dN$T|dCOO|dUOK|dCEF|dKSU|dDEBBG|dUAB|dYDX|dMOF|dAU
       @|dOCLCO|dOCLCQ|dOCLCO 
049    INap 
082 04 005.7 
082 04 005.7|223 
099    eBook O'Reilly for Public Libraries 
100 1  Drabas, Tomasz,|eauthor. 
245 10 Learning PySpark :|bbuild data-intensive applications 
       locally and deploy at scale using the combined powers of 
       Python and Spark 2.0 /|cTomasz Drabas, Denny Lee ; 
       foreword by Holden Karau.|h[O'Reilly electronic resource] 
264  1 Birmingham, UK :|bPackt Publishing,|c2017. 
300    1 online resource (1 volume) :|billustrations, maps 
336    text|btxt|2rdacontent 
337    computer|bc|2rdamedia 
338    online resource|bcr|2rdacarrier 
500    Includes index. 
505 0  Cover -- Copyright -- Credits -- Foreword -- About the 
       Authors -- About the Reviewer -- www.PacktPub.com -- 
       Customer Feedback -- Table of Contents -- Preface -- 
       Chapter 1: Understanding Spark -- What is Apache Spark? --
       Spark Jobs and APIs -- Execution process -- Resilient 
       Distributed Dataset -- DataFrames -- Datasets -- Catalyst 
       Optimizer -- Project Tungsten -- Spark 2.0 architecture --
       Unifying Datasets and DataFrames -- Introducing 
       SparkSession -- Tungsten phase 2 -- Structured streaming -
       - Continuous applications -- Summary -- Chapter 2: 
       Resilient Distributed Datasets -- Internal workings of an 
       RDD -- Creating RDDs -- Schema -- Reading from files -- 
       Lambda expressions -- Global versus local scope -- 
       Transformations -- The .map(...) transformation -- The 
       .filter(...) transformation -- The .flatMap(...) 
       transformation -- The .distinct(...) transformation -- The
       .sample(...) transformation -- The .leftOuterJoin(...) 
       transformation -- The .repartition(...) transformation -- 
       Actions -- The .take(...) method -- The .collect(...) 
       method -- The .reduce(...) method -- The .count(...) 
       method -- The .saveAsTextFile(...) method -- The 
       .foreach(...) method -- Summary -- Chapter 3: DataFrames -
       - Python to RDD communications -- Catalyst Optimizer 
       refresh -- Speeding up PySpark with DataFrames -- Creating
       DataFrames -- Generating our own JSON data -- Creating a 
       DataFrame -- Creating a temporary table -- Simple 
       DataFrame queries -- DataFrame API query -- SQL query -- 
       Interoperating with RDDs -- Inferring the schema using 
       reflection -- Programmatically specifying the schema -- 
       Querying with the DataFrame API -- Number of rows -- 
       Running filter statements -- Querying with SQL -- Number 
       of rows -- Running filter statements using the where 
       Clauses -- DataFrame scenario -- on-time flight 
       performance -- Preparing the source datasets. 
505 8  Joining flight performance and airports -- Visualizing our
       flight-performance data -- Spark Dataset API -- Summary --
       Chapter 4: Prepare Data for Modeling -- Checking for 
       duplicates, missing observations, and outliers -- 
       Duplicates -- Missing observations -- Outliers -- Getting 
       familiar with your data -- Descriptive statistics -- 
       Correlations -- Visualization -- Histograms -- 
       Interactions between features -- Summary -- Chapter 5: 
       Introducing MLlib -- Overview of the package -- Loading 
       and transforming the data -- Getting to know your data -- 
       Descriptive statistics -- Correlations -- Statistical 
       testing -- Creating the final dataset -- Creating an RDD 
       of LabeledPoints -- Splitting into training and testing --
       Predicting infant survival -- Logistic regression in MLlib
       -- Selecting only the most predictable features -- Random 
       forest in MLlib -- Summary -- Chapter 6: Introducing the 
       ML Package -- Overview of the package -- Transformer -- 
       Estimators -- Classification -- Regression -- Clustering -
       - Pipeline -- Predicting the chances of infant survival 
       with ML -- Loading the data -- Creating transformers -- 
       Creating an estimator -- Creating a pipeline -- Fitting 
       the model -- Evaluating the performance of the model -- 
       Saving the model -- Parameter hyper-tuning -- Grid search 
       -- Train-validation splitting -- Other features of PySpark
       ML in action -- Feature extraction -- NLP -- related 
       feature extractors -- Discretizing continuous variables --
       Standardizing continuous variables -- Classification -- 
       Clustering -- Finding clusters in the births dataset -- 
       Topic mining -- Regression -- Summary -- Chapter 7: 
       GraphFrames -- Introducing GraphFrames -- Installing 
       GraphFrames -- Creating a library -- Preparing your 
       flights dataset -- Building the graph -- Executing simple 
       queries -- Determining the number of airports and trips. 
505 8  Determining the longest delay in this dataset -- 
       Determining the number of delayed versus on-time/early 
       flights -- What flights departing Seattle are most likely 
       to have significant delays? -- What states tend to have 
       significant delays departing from Seattle? -- 
       Understanding vertex degrees -- Determining the top 
       transfer airports -- Understanding motifs -- Determining 
       airport ranking using PageRank -- Determining the most 
       popular non-stop flights -- Using Breadth-First Search -- 
       Visualizing flights using D3 -- Summary -- Chapter 8: 
       TensorFrames -- What is Deep Learning? -- The need for 
       neural networks and Deep Learning -- What is feature 
       engineering? -- Bridging the data and algorithm -- What is
       TensorFlow? -- Installing Pip -- Installing TensorFlow -- 
       Matrix multiplication using constants -- Matrix 
       multiplication using placeholders -- Running the model -- 
       Running another model -- Discussion -- Introducing 
       TensorFrames -- TensorFrames -- quick start -- 
       Configuration and setup -- Launching a Spark cluster -- 
       Creating a TensorFrames library -- Installing TensorFlow 
       on your cluster -- Using TensorFlow to add a constant to 
       an existing column -- Executing the Tensor graph -- 
       Blockwise reducing operations example -- Building a 
       DataFrame of vectors -- Analysing the DataFrame -- 
       Computing elementwise sum and min of all vectors -- 
       Summary -- Chapter 9: Polyglot Persistence with Blaze -- 
       Installing Blaze -- Polyglot persistence -- Abstracting 
       data -- Working with NumPy arrays -- Working with pandas' 
       DataFrame -- Working with files -- Working with databases 
       -- Interacting with relational databases -- Interacting 
       with the MongoDB database -- Data operations -- Accessing 
       columns -- Symbolic transformations -- Operations on 
       columns -- Reducing data -- Joins -- Summary -- Chapter 10
       : Structured Streaming -- What is Spark Streaming?. 
505 8  Why do we need Spark Streaming? -- What is the Spark 
       Streaming application data flow? -- Simple streaming 
       application using DStreams -- A quick primer on global 
       aggregations -- Introducing Structured Streaming -- 
       Summary -- Chapter 11: Packaging Spark Applications -- The
       spark-submit command -- Command line parameters -- 
       Deploying the app programmatically -- Configuring your 
       SparkSession -- Creating SparkSession -- Modularizing code
       -- Structure of the module -- Calculating the distance 
       between two points -- Converting distance units -- 
       Building an egg -- User defined functions in Spark -- 
       Submitting a job -- Monitoring execution -- Databricks 
       Jobs -- Summary -- Index. 
520 8  Annotation|bBuild data-intensive applications locally and 
       deploy at scale using the combined powers of Python and 
       Spark 2.0 About This Book - Learn why and how you can 
       efficiently use Python to process data and build machine 
       learning models in Apache Spark 2.0 - Develop and deploy 
       efficient, scalable real-time Spark solutions - Take your 
       understanding of using Spark with Python to the next level
       with this jump start guide Who This Book Is For If you are
       a Python developer who wants to learn about the Apache 
       Spark 2.0 ecosystem, this book is for you. A firm 
       understanding of Python is expected to get the best out of
       the book. Familiarity with Spark would be useful, but is 
       not mandatory. What You Will Learn - Learn about Apache 
       Spark and the Spark 2.0 architecture - Build and interact 
       with Spark DataFrames using Spark SQL - Learn how to solve
       graph and deep learning problems using GraphFrames and 
       TensorFrames respectively - Read, transform, and 
       understand data and use it to train machine learning 
       models - Build machine learning models with MLlib and ML -
       Learn how to submit your applications programmatically 
       using spark-submit - Deploy locally built applications to 
       a cluster In Detail Apache Spark is an open source 
       framework for efficient cluster computing with a strong 
       interface for data parallelism and fault tolerance. This 
       book will show you how to leverage the power of Python and
       put it to use in the Spark ecosystem. You will start by 
       getting a firm understanding of the Spark 2.0 architecture
       and how to set up a Python environment for Spark. You will
       get familiar with the modules available in PySpark. You 
       will learn how to abstract data with RDDs and DataFrames 
       and understand the streaming capabilities of PySpark. Also,
       you will get a thorough overview of machine learning 
       capabilities of PySpark using ML and MLlib, graph 
       processing using GraphFrames, and polyglot persistence 
       using Blaze. Finally, you will learn how to deploy your 
       applications to the cloud using the spark-submit command. 
       By the end of this book, you will have established a firm 
       understanding of the Spark Python API and how it can be 
       used to build data-intensive applications. Style and 
       approach This book takes a very comprehensive, step-by-
       step approach so you understand how the Spark ecosystem 
       can be used with Python to develop efficient, scalable 
       solutions. Every chapter is standalone and written in a 
       very easy-to-understand manner, with a focus on both the 
       hows and the whys of each concept. 
588    Description based on online resource; title from title 
       page (viewed March 17, 2017). 
590    O'Reilly|bO'Reilly Online Learning: Academic/Public 
       Library Edition 
650  0 Application software|xDevelopment. 
650  0 Python (Computer program language) 
650  0 SPARK (Computer program language) 
650  6 Logiciels d'application|xDéveloppement. 
650  6 Python (Langage de programmation) 
650  7 Application software|xDevelopment|2fast 
650  7 Python (Computer program language)|2fast 
650  7 SPARK (Computer program language)|2fast 
700 1  Lee, Denny,|eauthor. 
700 1  Karau, Holden,|ewriter of foreword. 
776 08 |iPrint version:|aDrabas, Tomasz.|tLearning PySpark.
       |dBirmingham : Packt Publishing, ©2017 
856 40 |uhttps://ezproxy.naperville-lib.org/login?url=https://
       learning.oreilly.com/library/view/~/9781786463708/?ar
       |zAvailable on O'Reilly for Public Libraries 
938    ProQuest MyiLibrary Digital eBook Collection|bIDEB
       |ncis35945158 
938    EBSCOhost|bEBSC|n1477650 
938    YBP Library Services|bYANK|n13522893 
994    92|bJFN