更新时间:2021-06-10 19:35:21
coverpage
Title Page
Dedication
Packt Upsell
Why subscribe?
Packt.com
Contributors
About the author
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Introduction to Web Scraping
Learning about data on the internet
Introduction to XPath (XML Path)
Data extraction systems
Web scraping techniques
Traditional copy and paste
Text grabbing and regular expression
Document Object Model (DOM)
Semantic annotation recognition
Web scraping tools
JavaScript tools
Web crawling frameworks
Web crawling environment in R
Summary
XML Path Language and Regular Expression Language
XML Path (XPath)
Nodes
Relationships between nodes
Parent
Child
Sibling
Ancestor
Descendant
Predicates
Selecting unknown nodes
Selecting several paths
Regular expression language (Regex)
How to match a single character
How to match the characters of a set
How to match words
Exercises on RegEx and XPath
RegEx exercises
XPath exercises
Web Scraping with rvest
Introducing rvest
Step-by-step web scraping with rvest
Writing XPath rules
Writing your first scraping script
Playing with data
Web Scraping with Rselenium
Advantages and disadvantages of using Selenium for web scraping
RSelenium
Step-by-step web scraping with RSelenium
Collecting data with RSelenium
Storing Data and Creating Cronjob
Cloud engine models
Infrastructure as a service (IaaS)
Platform as a service (PaaS)
Software as a service (SaaS)
Mobile backend as a service (MBaaS)
Function as a service (FaaS)
Some of the cloud services
Amazon Web Services (AWS)
Google Cloud
Cronjob
Storing data and creating schedule jobs for web scraping
Creating an AWS RDS Instance
Connecting to the PostgreSQL database on AWS
Creating cronjob
Other Books You May Enjoy
Leave a review - let other readers know what you think