A Python Web Scraping Weekend- 3 mins
What is Web Scraping?
Web scraping is fundamentally the extracting of large amounts of data from websites. I built a program in Python that allowed me to grab data from the HTML tags of a website by implementing some neat searching functions found in a Python library called BeautifulSoup.
For example, with this web scraper I could grab data on prices for a given item (or all items, if you wanted) on my favorite shopping website and set up a notificaiton system to alert me when the price for that item drops below a certain threshold. Or you could go the more hilarious route and do what I did: monitor the social habits of users who comment on adult video websites.
- Node.js for the web server
- ExpressJS to send data from my Node server to the client on a data request
- Python-Shell to run Python scripts on a Node server
- Node-Schedule to run my Python scripts at scheduled intervals
This app used a Python script to scrape user comments from a random video on an adult entertainment site, then placed the images over inspirational wallpapers from reddit. Although the project seems foolish, I was able to learn a ton.
This project could’ve been more functional had I wrote the tools in a way that allowed for better communication between each module.
If I would have built the entire project with Django, the Python script wouldn’t need to output CSV to be read in by my Node server or if I chose to write the web scraper in PHP I could schedule requests to scrape in PHP with something like Goutta.
The general flow of the app is as follows:
Startup (app.js). Server starts, Python begins first scrape.
Upon Python scrape completion, result data is output in CSV.
At scheduled intervals or when the page is refreshed (should normalize this somehow), the Python script is rerun.
Node grabs a “random” entry from the comment data read in through the CSV file and sends to the client via the API I created with ExpressJS routing.
User gets potentially comedic wallpaper.
Here’s the source code on GitHub if you’d like to see it!
After finishing this project I posted it on a Facebook group called Hackathon Hackers just to see if anyone would give it a look and maybe get a chuckle out of my weekend expedition. To my surprise (and delight), the project was met with positivity from the community. I received comments and messages with suggestions on bugs to fix, how I could improve the app, and people who just wanted to tell me I made them laugh and that they couldn’t wait to show it to their friends.
I’m glad I built something fun while learning new technology and in the end got to show it to people who appreciated it. The “Juxtaposition Generator” was one of my simpler app endeavors, but extremely enjoyable nonetheless.
Thanks for reading.
Hopefully this post has helped you in some way. If you’d like more resources on the subject or would like clarification, feel free to post a comment below.