Library Hours
Monday to Friday: 9 a.m. to 9 p.m.
Saturday: 9 a.m. to 5 p.m.
Sunday: 1 p.m. to 9 p.m.
Naper Blvd. 1 p.m. to 5 p.m.
     
Limit search to available items
Results Page:  Previous Next
Author Pandey, Brij Kishore, author.

Title Building ETL Pipelines with Python [electronic resource] : Create and Deploy Enterprise-Ready ETL Pipelines by Employing Modern Methods / Brij Kishore Pandey, Emily Ro Schoof. [O'Reilly electronic resource]

Edition 1st edition.
Imprint Birmingham : Packt Publishing, Limited, 2023.
QR Code
Description 1 online resource (246 p.)
Note Description based upon print version of record.
Includes index.
Contents Cover -- Title Page -- Copyright -- Dedication -- Contributors -- Table of Contents -- Preface -- Part 1: Introduction to ETL, Data Pipelines, and Design Principles -- Chapter 1: A Primer on Python and the Development Environment -- Introducing Python fundamentals -- An overview of Python data structures -- Python if...else conditions or conditional statements -- Python looping techniques -- Python functions -- Object-oriented programming with Python -- Working with files in Python -- Establishing a development environment -- Version control with Git tracking
Documenting environment dependencies with requirements.txt -- Utilizing module management systems (MMSs) -- Configuring a Pipenv environment in PyCharm -- Summary -- Chapter 2: Understanding the ETL Process and Data Pipelines -- What is a data pipeline? -- How do we create a robust pipeline? -- Pre-work -- understanding your data -- Design planning -- planning your workflow -- Architecture development -- developing your resources -- Putting it all together -- project diagrams -- What is an ETL data pipeline? -- Batch processing -- Streaming method -- Cloud-native -- Automating ETL pipelines
Exploring use cases for ETL pipelines -- Summary -- References -- Chapter 3: Design Principles for Creating Scalable and Resilient Pipelines -- Technical requirements -- Understanding the design patterns for ETL -- Basic ETL design pattern -- ETL-P design pattern -- ETL-VP design pattern -- ELT two-phase pattern -- Preparing your local environment for installations -- Open source Python libraries for ETL pipelines -- Pandas -- NumPy -- Scaling for big data packages -- Dask -- Numba -- Summary -- References -- Part 2: Designing ETL Pipelines with Python
Chapter 4: Sourcing Insightful Data and Data Extraction Strategies -- Technical requirements -- What is data sourcing? -- Accessibility to data -- Types of data sources -- Getting started with data extraction -- CSV and Excel data files -- Parquet data files -- API connections -- Databases -- Data from web pages -- Creating a data extraction pipeline using Python -- Data extraction -- Logging -- Summary -- References -- Chapter 5: Data Cleansing and Transformation -- Technical requirements -- Scrubbing your data -- Data transformation -- Data cleansing and transformation in ETL pipelines
Understanding the downstream applications of your data -- Strategies for data cleansing and transformation in Python -- Preliminary tasks -- the importance of staging data -- Transformation activities in Python -- Creating data pipeline activity in Python -- Summary -- Chapter 6: Loading Transformed Data -- Technical requirements -- Introduction to data loading -- Choosing the load destination -- Types of load destinations -- Best practices for data loading -- Optimizing data loading activities by controlling the data import method -- Creating demo data -- Full data loads -- Incremental data loads
Note Precautions to consider
Summary Develop production-ready ETL pipelines by leveraging Python libraries and deploying them for suitable use cases Key Features Understand how to set up a Python virtual environment with PyCharm Learn functional and object-oriented approaches to create ETL pipelines Create robust CI/CD processes for ETL pipelines Purchase of the print or Kindle book includes a free PDF eBook Book Description Modern extract, transform, and load (ETL) pipelines for data engineering have favored the Python language for its broad range of uses and a large assortment of tools, applications, and open source components. With its simplicity and extensive library support, Python has emerged as the undisputed choice for data processing. In this book, you'll walk through the end-to-end process of ETL data pipeline development, starting with an introduction to the fundamentals of data pipelines and establishing a Python development environment to create pipelines. Once you've explored the ETL pipeline design principles and ET development process, you'll be equipped to design custom ETL pipelines. Next, you'll get to grips with the steps in the ETL process, which involves extracting valuable data; performing transformations, through cleaning, manipulation, and ensuring data integrity; and ultimately loading the processed data into storage systems. You'll also review several ETL modules in Python, comparing their pros and cons when building data pipelines and leveraging cloud tools, such as AWS, to create scalable data pipelines. Lastly, you'll learn about the concept of test-driven development for ETL pipelines to ensure safe deployments. By the end of this book, you'll have worked on several hands-on examples to create high-performance ETL pipelines to develop robust, scalable, and resilient environments using Python. What you will learn Explore the available libraries and tools to create ETL pipelines using Python Write clean and resilient ETL code in Python that can be extended and easily scaled Understand the best practices and design principles for creating ETL pipelines Orchestrate the ETL process and scale the ETL pipeline effectively Discover tools and services available in AWS for ETL pipelines Understand different testing strategies and implement them with the ETL process Who this book is for If you are a data engineer or software professional looking to create enterprise-level ETL pipelines using Python, this book is for you. Fundamental knowledge of Python is a prerequisite.
Subject Data mining.
Python (Computer program language)
Big data.
Electronic data processing.
Exploration de données (Informatique)
Python (Langage de programmation)
Données volumineuses.
Big data
Data mining
Electronic data processing
Python (Computer program language)
Added Author Schoof, Emily Ro, author.
Other Form: Print version: Pandey, Brij Kishore Building ETL Pipelines with Python Birmingham : Packt Publishing, Limited,c2023
ISBN 9781804615539
1804615536
Patron reviews: add a review
Click for more information
EBOOK
No one has rated this material

You can...
Also...
- Find similar reads
- Add a review
- Sign-up for Newsletter
- Suggest a purchase
- Can't find what you want?
More Information