Library Hours
Monday to Friday: 9 a.m. to 9 p.m.
Saturday: 9 a.m. to 5 p.m.
Sunday: 1 p.m. to 9 p.m.
Naper Blvd. 1 p.m. to 5 p.m.

LEADER 00000cam a2200709 i 4500 
003    OCoLC 
005    20240129213017.0 
006    m     o  d         
007    cr nn||||mamaa 
008    201112s2020    caua    o     001 0 eng d 
019    1220827695|a1223089267|a1225934204|a1226591554|a1228545474
       |a1237481601 
020    9781484265765|q(electronic bk.) 
020    1484265769|q(electronic bk.) 
020    1484265750 
020    9781484265758 
024 7  10.1007/978-1-4842-6576-5|2doi 
029 1  AU@|b000068389360 
029 1  AU@|b000068747767 
029 1  AU@|b000068846530 
035    (OCoLC)1226259870|z(OCoLC)1220827695|z(OCoLC)1223089267
       |z(OCoLC)1225934204|z(OCoLC)1226591554|z(OCoLC)1228545474
       |z(OCoLC)1237481601 
037    58DF8702-00B8-4EF9-AABC-CE52863E1C24|bOverDrive, Inc.
       |nhttp://www.overdrive.com 
040    S2H|beng|erda|epn|cS2H|dEBLCP|dLEATE|dUKAHL|dOCLCO|dYDXIT
       |dOCLCF|dGW5XE|dVT2|dUPM|dYDX|dOCL|dOCLCO|dTEFOD|dOCLCO
       |dOCLCQ|dCOM|dN$T|dTOH|dOCLCQ|dOCLCO 
049    INap 
082 04 005.7 
082 04 005.7|223 
099    eBook O'Reilly for Public Librairies 
100 1  Patel, Jay M.,|eauthor. 
245 10 Getting structured data from the Internet :|brunning web 
       crawlers/scrapers on a big data production scale /|cJay M.
       Patel.|h[O'Reilly electronic resource] 
264  1 [Berkeley, CA] :|bApress,|c[2020] 
300    1 online resource (xix, 397 pages) :|billustrations 
336    text|btxt|2rdacontent 
337    computer|bc|2rdamedia 
338    online resource|bcr|2rdacarrier 
347    text file|2rdaft|0http://rdaregistry.info/termList/
       fileType/1002 
347    |bPDF 
500    Includes index. 
505 0  Chapter 1: Introduction to Web Scraping -- Chapter 2: Web 
       Scraping in Python Using Beautiful Soup Library -- Chapter
       3: Introduction to Cloud Computing and Amazon Web Services
       (AWS) -- Chapter 4: Natural Language Processing (NLP) and 
       Text Analytics -- Chapter 5: Relational Databases and SQL 
       Language -- Chapter 6: Introduction to Common Crawl 
       Datasets -- Chapter 7: Web Crawl Processing on Big Data 
       Scale -- Chapter 8: Advanced Web Crawlers -- 
520    Utilize web scraping at scale to quickly get unlimited 
       amounts of free data available on the web into a 
       structured format. This book teaches you to use Python 
       scripts to crawl through websites at scale and scrape data
       from HTML and JavaScript-enabled pages and convert it into
       structured data formats such as CSV, Excel, JSON, or load 
       it into a SQL database of your choice. This book goes 
       beyond the basics of web scraping and covers advanced 
       topics such as natural language processing (NLP) and text 
       analytics to extract names of people, places, email 
       addresses, contact details, etc., from a page at 
       production scale using distributed big data techniques on 
       an Amazon Web Services (AWS)-based cloud infrastructure. 
       It covers developing a robust data processing and 
       ingestion pipeline on the Common Crawl corpus, containing 
       petabytes of data publicly available and a web crawl data 
       set available on AWS's registry of open data. Getting 
       Structured Data from the Internet also includes a step-by-
       step tutorial on deploying your own crawlers using a 
       production web scraping framework (such as Scrapy) and 
       dealing with real-world issues (such as breaking Captcha, 
       proxy IP rotation, and more). Code used in the book is 
       provided to help you understand the concepts in practice 
       and write your own web crawler to power your business 
       ideas. You will: Understand web scraping, its applications
       /uses, and how to avoid web scraping by hitting publicly 
       available rest API endpoints to directly get data Develop 
       a web scraper and crawler from scratch using lxml and 
       BeautifulSoup library, and learn about scraping from 
       JavaScript-enabled pages using Selenium Use AWS-based 
       cloud computing with EC2, S3, Athena, SQS, and SNS to 
       analyze, extract, and store useful insights from crawled 
       pages Use SQL language on PostgreSQL running on Amazon 
       Relational Database Service (RDS) and SQLite using 
       SQLalchemy Review sci-kit learn, Gensim, and spaCy to 
       perform NLP tasks on scraped web pages such as name entity
       recognition, topic clustering (Kmeans, Agglomerative 
       Clustering), topic modeling (LDA, NMF, LSI), topic 
       classification (naive Bayes, Gradient Boosting Classifier)
       and text similarity (cosine distance-based nearest 
       neighbors) Handle web archival file formats and explore 
       Common Crawl open data on AWS Illustrate practical 
       applications for web crawl data by building a similar 
       website tool and a technology profiler similar to 
       builtwith.com Write scripts to create a backlinks database
       on a web scale similar to Ahrefs.com, Moz.com, 
       Majestic.com, etc., for search engine optimization (SEO), 
       competitor research, and determining website domain 
       authority and ranking Use web crawl data to build a news 
       sentiment analysis system or alternative financial 
       analysis covering stock market trading signals Write a 
       production-ready crawler in Python using Scrapy framework 
       and deal with practical workarounds for Captchas, IP 
       rotation, and more. 
590    O'Reilly|bO'Reilly Online Learning: Academic/Public 
       Library Edition 
650  0 Big data. 
650  0 Programming languages (Electronic computers) 
650  0 Data mining. 
650  0 Automatic data collection systems. 
650  2 Data Mining 
650  6 Données volumineuses. 
650  6 Exploration de données (Informatique) 
650  6 Collecte automatique des données. 
650  7 Data mining|2fast 
650  7 Automatic data collection systems|2fast 
650  7 Big data|2fast 
650  7 Programming languages (Electronic computers)|2fast 
776 08 |iPrint version:|z9781484265758 
776 08 |iPrint version:|z9781484265772 
856 40 |uhttps://ezproxy.naperville-lib.org/login?url=https://
       learning.oreilly.com/library/view/~/9781484265765/?ar
       |zAvailable on O'Reilly for Public Libraries 
938    Askews and Holts Library Services|bASKH|nAH37890098 
938    ProQuest Ebook Central|bEBLB|nEBL6395781 
938    YBP Library Services|bYANK|n301736281 
938    EBSCOhost|bEBSC|n2678338 
994    92|bJFN