How and why I built a simple web-scrapig script to notify us about our favourite food
The problem I wanted to solve
A restaurant next to our office has a special meal called "borzaska". It's nothing super fancy, there are dozens of places that prepare such food in our city, but the way this place prepares it is really awesome. It is really popular in the neighbourhood.
The only thing is, it is totally random when they have it on the menu. They have a website for their daily menu that gets regularly updated, but often times we forget to check it. And as this food is very popular, if we don't go get them when the restaurant opens, there's a great chance we will stand in queue for 15-20 minutes, and it's also possible it will be sold out by the time we get there.
So to solve this problem, we needed something that checks the menu of the restaurant every day, and notifies us if "borzaska" is on that day.
Solution: a simple web-scrapig script that notifies us about our favourite food
I built a 37 line python script that goes to the restaurant's website every day at 9am, looks for the word "borzaska". If it finds it, it sends an alert via e-mail to me and my colleagues.
I used Python 3.6 and the following modules:
- requests for downloading the menu
- BeautifulSoup for parsing the contents of the downloaded menu/site
- smtplib for sending e-mail alerts
The process of building the web-scrapig script
1. Installing Python and the required modules
I headed to Python downloads and got the latest version of Python 3.6.x
To install the required modules, I used the PIP tool from the command line. On windows this means using the pip command, on macOS / Linux it's pip3.
Since I was about to use this script on my windows laptop, these were the commands I executed one by one in cmd:
pip install beautifulsoup4 pip install requests pip install smtplib
As my intention was to use this script on Windows, this guide will focus on that platform. If you need help implementing a similar script on macOS or Linux, then don't hesitate to contact me.
2. Creating a new python file, importing the modules
Since this is a very simple script, it was more than adequate to use IDLE (the built-in editor that comes with Python) to create it. I simply added a .py file, and started to add code to it. No need to install a complex IDE like PyCharm.
OK, it was time to start writing code!
The first thing to do was to import the necessary external modules with this line:
import bs4, requests, smtplib
One thing to notice here is that the library for beautifulsoup4 is called bs4.
3. Using requests to download the menu and checking for error.
To download the menu, a new object has to be created:
getPage = requests.get('http://www.somesite.com/menu')
getPage object will contain all the data downloaded from the website.
To validate if all the download happened without any issues, the raise_for_status() function can be used:
This will display an error message if something goes wrong, and stops your script.
4. Using BeautifulSoup to parse the text
getPage object has all the raw data downloaded from the website, but we need something that parses and returns just the text of the foods on menu for us. This is what the BeautifulSoup function is for.
First, we need to create a new object that will contain all the text:
menu = bs4.BeautifulSoup(getPage.text, 'html.parser')
Awesome, but this is still too much information. There's all kind of other text on the menu page we are not interested in, how could we get just a list of foods?
For this, we need basic HTML understanding. Using Chrome's inspector we need to find out the HTML tag that is used for listing the foods. In my example it was easy, the page for daily menus were built in a way that all the listed foods were under a div class called "foodname".
Once you have the id or class name to look for, use the select function and create yet another object. In my example it looked like this:
foods = menu.select('.foodname')
As you can see, you can select classes by putting dot before the name. In my case it was easy because no other HTML element uses this class, so selecting it only selected foods.
To recap, this is what my code looked like at this stage:
import bs4, requests, smtplib # Download page getPage = requests.get('http://www.somesite.com/menu') getPage.raise_for_status() #if error it will stop the program # Parse text for foods menu = bs4.BeautifulSoup(getPage.text, 'html.parser') foods = menu.select('.foodname')
5. Creating the logic to know if we have the desired food on the menu
Ok, at this point we have a list of foods, how do we know if the one we're looking for is in the list? For this, I added some variables and a for loop:
the_one = 'borzaska' flength = len(the_one) available = False for food in foods: for i in range(len(food.text)): chunk = food.text[i:i+flength].lower() if chunk == the_one: available = True
I'll try to explain what is happening here:
- I created a variable called
the_oneto store the name of the food we are looking for.
- created another variable called
flength(as in foodlength), to store the number of characters my desired food consist of. This can be calculated using the
- added final variable that will change to
Trueif we identify our desired food on the list. It's basic value should be
for loopworks the following way:
foodsis a list of all the meals found on the webpage. Stating
for food in foods:will create a new variable called
food, cycle through all the items in the
foodslist one by one, and as it does this it will assign the contents of each item to the
foodvariable. Furthermore, it will execute everything stated after the
for loopwill run inside our initial one, which will cycle through the food name:
for i in range(len(food.text)): chunk = food.text[i:i+flength].lower() available = True
And as it's cycling through, it examines chunks of text that are exactly the same length as the food we are looking for. If it finds our food, it will change the
available variable to
Also, notice the
lower() function I used. This makes all capital letters non-capital. And as you can see I added the desired food using all small letters. Pay attention to this, as python code is case sensitive! This means if you are looking for cheese, but the website has Cheese on the menu, the above logic would not find it without converting Cheese to cheese! For this reason I always tend to convert everything to non-capital letters.
All right! At this point, we have a script that:
- downloads a page from the web
- parses all the text, then creates a list by selecting the HTML tag for foods and assigns this list to an object.
- cycles through the food list, and if it finds the one we are looking for, it changes the value of the
If it doesn't find it, the value remains
6. Send an e-mail alert if the food is available
The next thing to do is to validate the value of our
available variable. If it's True then we should send an alert via e-mail, if it's False then the script should do nothing, or maybe just print to console and exit.
This can be executed via a simple if-else logic:
if available == True: #send e-mail alert #print to console e-mail addresses where the alert was sent. else: #print to console that the food is not available.
To achieve this, I used the smtplib module and a gmail account. smtplib can be used with other services as well such as outlook, but my code example will detail how to set up e-mail sending via gmail. The code is the following:
if available == True: conn = smtplib.SMTP('smtp.gmail.com', 587) # smtp address and port conn.ehlo() # call this to start the connection conn.starttls() # starts tls encryption. When we send our password it will be encrypted. conn.login('email@example.com, 'appkey') conn.sendmail('firstname.lastname@example.org', toAddress, 'Subject: Borzaska Alert!\n\nAttention!\n\nYour favourite food is available today!\n\nBon apetite!:\nFood Notifier V1.0') conn.quit() print('Sent notificaton e-mails for the following recipients:\n') for i in range(len(toAddress)): print(toAddress[i]) print('') else: print('Your favourite food is not available today.')
- the beginning of the code is straightforward and explained by comments.
conn.login('email@example.com, 'appspecificpassw')- here we need to add our e-mail address, and Gmail’s Application-Specific Password
conn.sendmail()should contain the e-mail we're using, addresses where we want to send (
toAddressis a list of e-mails that we will create soon), and the subject & body of the e-mail. Notice that
/ncan be used to add new lines.
7. Adding e-mails to the toAddress variable
Instead of adding e-mail addresses to the if/else logic, it makes more sense to add a variable there, and add declare it's contents at the top of the script. This is why I added
toAddress to the logic, and added the following just under the import line:
# ------------------- E-mail list ------------------------ toAddress = ['firstname.lastname@example.org','email@example.com'] # --------------------------------------------------------
And we are set! if you run this script from IDLE, it will scrape the web page, and send an e-mail to the defined addresses if the desired food is on the menu.
8. Scheduling the script to run every day automatically on Windows.
The final touch was to automate running the script. The reason to create the script was because we forgot to check the menu every day. If I need to run the script manually every day, there's an equal chance that I will forget to run it. And actually I might just check the menu on the website instead, it's not a bigger effort.
So running the script automatically is crucial, there's no point of it otherwise.
To be able to run it on Windows automatically, the following has to be done:
- put a so called shebang line to the first line of the script. This will let Windows know to use the Python interpreter when calling the script outside of IDLE.
- Create an executable file that will run the script. On windows this means creating a .bat file.
- Schedule the .bat file to run every morning.
Adding the shebang line is easy. Just add
#! python3 to the first line of your file.
Creating the .bat file: create a new file called
start_notifier.bat (you can call it whatever you like), and add this as content:
@py.exe "C:\Users\username\Documents\Yourscript.py" %* @pause
After @py.exe, add the source of your file.
@pause will keep the terminal window open until you press a button.
To schedule the .bat file, use the built-in Task Scheduler on Windows. Here's a guide that will walk you through the process. Don't worry, it's from 2012, but the scheduler works the same way today on Windows 10.
I scheduled my .bat file to run every morning, at 9am.
Final thoughts and next steps
This is a very simple script, that could be improved in various ways. My aim here was to help beginners to explore the ways python can be used, and I think this is a very good example. If I wanted to make this script better, I would definitely consider the following as next steps:
- it's not ideal to schedule your script to run on your computer. There are tons of reasons why you might be not using your laptop/computer every day at 9am (or at the scheduled time), and in that case the script will not run. It's worth digging into how you can run it on a web server. Or how to run them on your smartphone
- it would make sense to collect the e-mails you want to send the alert to in a separate text file. The script could open and read it. This way you could easily add and remove people, just by editing the text file. No need to edit the code.
- finally, the script could also be upgraded to send notifications to your favourite chat service via APIs. E-mails are so last decade!
Hope you enjoyed! You can find the final code on GitHub