This post looks at how to speed up a Python web scraping and crawling script with multithreading via the concurrent.futures module.
Web scraping is a term used to describe the process of downloading and extracting structured data from the web using a program or algorithm. It’s a useful skill to have when you need to extract data from a website that does not have a public API.
The tutorials and articles on TestDriven teach how to leverage parallelism and concurrency in order to speed up web scrapers that scrape large amounts of data.
Latest Posts (2)
This post details how to run a Python and Selenium-based web scraper in parallel with Selenium Grid and Docker Swarm.