Want to learn more? Take the full course at at your own pace. More than a video, you’ll learn hands-on coding & quickly apply skills to your daily work.
—
Hi everyone, my name is Thomas Laetsch. I’m currently a data scientist working in the Center for Data Science at New York University.
In this course, Web Scraping with Python, you will learn some of the fundamental techniques in computational web scraping. That is, you will learn to create software to automate data extraction from online sources.
Before moving to specifics and technicalities, let me convince you that these techniques can be a valuable addition to your data-science know-how, and that this course will be the perfect place to start or strengthen the foundational pieces of this skill set.
You might ask yourself why businesses might employ those with experience web-scraping? What can businesses gain from web-scraping? Well, they can scrape competitor sites to gather prices for similar products or services to compare and adjust their own price set-points. They can scrape online reviews of their products or services, and gather public opinion around the company in general. They can scrape social media sites, or other public forums for contact or other information of clients or potential clients, to meaningfully direct resources towards this group of possible customers. And this was just a short list!
We list here a few fun things you can do scraping the web. You could search for your favorite memes from your favorite sites. You can scour through classified ads, looking for your favorite things. You can look for trending topics on social media sites. You can look for recipes you might be interested in on cooking blogs. In fact, there’s a whole lot you can do!
To better visualize the focus of what you will learn in these lectures and exercises, let’s roughly breakdown the web-scraping pipeline into three pieces.
The first piece is the setup, that is, defining the goal or task and identifying the online sources which you believe will help you achieve the desired end result.
The second is the acquisition of these online data. This includes accessing the data, parsing this information, and extracting these data into meaningful and useful data structures.
The third is the processing phase, where you run these downloaded data through whatever analyses or processes needed to achieve the desired goal.
This course focuses on the acquisition phase. To accomplish this, we will be using python and the web-crawling framework scrapy. We chose scrapy since we can jump in quickly, and easily scale to large scraping projects. However, even if you aren’t sold on using scrapy or python, you will still build techniques and intuition that will be valuable in any computational web-scraping environment you enjoy!
So, I hope you’re as thrilled as I am to take part in this course, and gain the skills to start scraping the web for whatever reasons excite you!
#PythonTutorial #WebScraping #Python #DataCamp
I like your storytelling method and tone!