A Practical Guide to Parsing Recipe Ingredients with Python (Flask)
Learn How to Extract Quantities and Ingredients from Cooking Recipes with Python
Processing a list of ingredients into a dictionary
When I started to learn code in 2020, I was only interested in front-end development. When I started to work on projects, I got frustrated I didn't know how to work with databases and the backend.
From last, I went back to start studying from the foundation of computer science to backend and databases (C -> python) with CS50. I was blown away by Python, coming from Javascript.
Introduction
In this tutorial, I'll be walking you through a Python program I'm working on. Briefly covering the datatypes from my notes and a few programming concepts and techniques used in practice.
There are many frameworks in Python for web development, I'm going to be using the Flask framework, which is very minimal and beginner-friendly.
Here is a pretty good introduction to getting started with Flask by olalekan temitayo if you want to check it out.
Datatypes
Before we start, here is a list of common data types (some of which I've not used or even heard of before) useful for Python.
Integer (int)
whole numbers without fractions
example: 0, 10, - 30, etc
Float (float)
numbers with fractions
example: 4.35, -1.6, 5.25563, etc
String (str)
a sequence of characters, enclosed in quotes (either ' ' or " ")
example: "hello world", "567", etc
Boolean (bool)
Represents either: True or False
Used in conditional statements and logical operations
List:
an ordered collection of elements enclosed within square brackets
[]
list can be modified and elements of various data types
example: [1, 'hello', True]
Tuple:
similar to lists, but immutable (cannot be modified) where elements are enclosed within parentheses
()
used for fixed collections of items
examples: (1, 2, 3), ('cold', 'hot')
Dictionary (dict):
an unordered collection of key-value pairs enclosed within curly braces
{}
each key is associated with a value, accessed using keys instead of indexes
example: {'name': 'Steve Jobs', 'company': 'Apple'}
Set:
an unordered collection of unique elements enclosed within curly braces
{}
ensures each element appears only once
example: {'sodium', 'carbon', 'hydrogen'}
NoneType (None):
a null value or rather the absence of value.
used when a variable is yet to be assigned a value.
example: None
Understanding the characteristics and usage is essential for effective programming.
This is barely scratching the surface but good to get started. For more details, check Python documentation on datatypes (link below in additional resources).
Let's get to the good part.
Concepts and Techniques Covered
These are some of the essential programming concepts and techniques covered in this tutorial.
String Manipulation
Regular Expressions
Dictionary Manipulation
Conditional Statements and Loops
Context
I'm building a web app for users to collect recipes, plan meals and build a shopping list. The first core feature I'm building is creating (or rather copy-pasting recipes from wherever) and saving meals.
Each meal is a collection of ingredients with a specific quantity of a standard unit of measure.
I'm going to split this into 3 parts:
Extract the quantity and ingredients from each line.
Restructure into a dictionary of ingredients with quantity and unit of measure.
Process user input and show output for the user to confirm ingredients and measures before saving it to the database.
Code Walkthrough
Frontend - user input
The HTML will be rendered to the user based on their request method where GET
would prompt the user for a list of ingredients with details in a textarea
input.
Here is the code for /new-meal.html
which gets the input from users and POST
the form to the same URL where our function helps break it down into a dictionary.
<h1>Add new meal</h1>
<form action="/new-meal" method="post">
<textarea autofocus autocomplete="off" id="meal" name="meal" rows="5" cols="33">
</textarea>
<button type="submit">Continue</button>
</form>
Define route and backend
Let's start with defining the route and backend code first.
@app.route("/new-meal", methods=["GET", "POST"])
def new_meal():
"Create a new meal post for the current user."""
if request.method == "POST":
user_input = request.form.get("meal")
result = restructure_input(user_input)
return render_template("/new-meal.html/", meal=result)
return render_template("/new-meal.html/")
This route will accept both GET
and POST
methods, the latter needs to be explicitly defined otherwise by default all routes are GET
only.
When the request type is POST
we get()
the data and passing into the restructure_input(user_input)
.
The results are returned with the same HTML template for confirmation.
We are going to import re
module from python
import re
Function for restructuring
This function will receive user input and return the same data in a dictionary format.
The user input is a list of ingredients and their quantity, we are going to use this function to split()
the data by each line.
def restructure_input(input_text):
"""Restructures user input into a dictionary of ingredients and quantities."""
ingredients_dict = {}
lines = input_text.split('\n')
# more code for looping through each line here
return ingredients_dict
Now we are going to loop through each line and find the quantity, unit of measure, and ingredient.
The key will be the ingredient name and quantity and the unit of measure will be the value.
The strip()
function helps to remove extra white spaces.
We will define another function to help us extract the data into quantity and ingredients before passing it into the dictionary.
for line in lines:
if line.strip():
quantity, ingredient = extract_quantity_and_ingredient(line)
ingredients_dict[ingredient] = quantity
Here is the full code for structure_input:
def restructure_input(input_text):
"""Restructures user input into a dictionary of ingredients and quantities."""
ingredients_dict = {}
lines = input_text.split('\n')
for line in lines:
if line.strip():
quantity, ingredient = extract_quantity_and_ingredient(line)
ingredients_dict[ingredient] = quantity
return ingredients_dict
Function to extract items
To identify the unit of measurement, I have created the following list
to cross-check inside our extract function.
units_of_measurement = [
'ml', 'mL', 'milliliter', 'millilitre', 'cc', 'l', 'L', 'liter', 'litre',
'dl', 'dL', 'deciliter', 'decilitre', 'teaspoon', 'tsp', 't', 'tbl', 'tbs', 'tbsp', 'tbsps',
'tablespoon', 'fluid ounce', 'fl oz', 'gill', 'cup', 'c', 'pint', 'p', 'pt', 'fl pt',
'quart', 'q', 'qt', 'fl qt', 'gallon', 'g', 'gal', 'mg', 'milligram', 'milligramme',
'g', 'gram', 'gramme', 'kg', 'kilogram', 'kilogramme', 'pound', 'lb', '#', 'ounce', 'oz'
]
Some ingredients come with quantity in fraction format like this: ¼
I could not get the regex to filter this, so we are going to find and replace all of these using the function below.
def preprocess_line(line):
"""Preprocesses the line to replace common fractions with their equivalent fractional representation."""
fractions_mapping = {
'½': '1/2',
'¼': '1/4',
'¾': '3/4',
'⅓': '1/3',
'⅔': '2/3',
'⅕': '1/5',
'⅖': '2/5',
'⅗': '3/5',
'⅘': '4/5',
}
for fraction, equivalent in fractions_mapping.items():
line = line.replace(fraction, equivalent)
return line
Let's define how we are going to extract the ingredients and quantity. This function will accept each line and preprocess to extract the data and return the variables quantity and ingredient.
def extract_quantity_and_ingredient(line):
"""Extracts quantity and ingredient from a line."""
line = fractions_mapping(line)
# code for splitting up extracting data here
return quantity, ingredient
First, let's get the quantity by identifying the numeric parts. re
is a module in Python for regular expressions (regex). This will search for specified patterns and extract what we need.
numeric_parts = re.findall(r'\d+/\d+|\d+', line)
Here is a breakdown of what each part does:
\d+
matches one or more digits. The+
means "one or more of the preceding element"./
matches the forward slash character exactly.|
is the OR operator, it matches the pattern before or the pattern after it.
This will match any positive integer or fraction like 1/4
.
Next, we will combine the numeric into the quantity variable and remove the data and any spaces to extract the ingredient name.
quantity = ' '.join(numeric_parts)
ingredient = re.sub(r'-?\d+/\d+|-?\d+', '', line).strip()
We use the same regex with one additional step to include negative integers as well.
-?
matches an optional negative sign. The?
means "zero or one of the preceding elements".sub()
function is a part of there
module in Python which returns a string after replacing the specified pattern with a replacement, in this case''
to remove it andstrip()
of all white spaces.
Next, we will clean up the ingredient
string by removing hyphens, anything in parentheses, and extra spaces, and converting the string to lowercase.
Sometimes there might be bullet points or a dash in the line, using a replace()
function we'll look for any -
and remove them.
ingredient = ingredient.replace('-', '')
ingredient = re.sub(r'\([^)]*\)', '', ingredient).strip()
ingredient = re.sub(r'\s+', ' ', ingredient).lower()
Here's what each line does:
first line is to remove all hyphens or dashes from a string
the first regular expression is to find and remove any parentheses and the content within and then trim any leading whitespace
the last line is to replace any sequence of one or more whitespace characters with a single space, and then convert everything to lowercase.
This helps to standardize the format of the ingredient name.
For the last part, here is a for
loop to look for units of measurement and extract them.
for unit in units_of_measurement:
if f' {unit} ' in f' {ingredient} ':
ingredient = ingredient.replace(unit, '').strip()
quantity += ' ' + unit
break
return quantity, ingredient
The if
conditions looks for any match to {unit}
with a white space on both sides in the string and then suffix to the quantity while removing from the ingredient name.
If no match is found, returns as it is.
Here is the whole code for this function.
def extract_quantity_and_ingredient(line):
"""Extracts quantity and ingredient from a line."""
# preprocess fractional values
line = preprocess_line(line)
# Find all numeric parts in the line
numeric_parts = re.findall(r'-?\d+/\d+|-?\d+', line)
if numeric_parts:
# Combine numeric parts into a single quantity
quantity = ' '.join(numeric_parts)
# Remove numeric parts and any leading/trailing spaces to get the ingredient
ingredient = re.sub(r'-?\d+/\d+|-?\d+', '', line).strip()
# Remove dashes from the ingredient
ingredient = ingredient.replace('-', '')
# Remove any parentheses and extra spaces from the ingredient
ingredient = re.sub(r'\([^)]*\)', '', ingredient).strip()
ingredient = re.sub(r'\s+', ' ', ingredient).lower()
# Look for units of measurement in the ingredient and extract them
for unit in units_of_measurement:
if f' {unit} ' in f' {ingredient} ':
ingredient = ingredient.replace(unit, '').strip()
quantity += ' ' + unit
break
return quantity, ingredient
return None, None
For a finishing touch, we add the following to the HTML to show the result.
<div>
<form action="/confirm-meal" method="post">
{% for ingredient, quantity in meal.items() %}
<div>
<label for="{{ ingredient }}">{{ ingredient }}</label>
<input type="text" id="{{ ingredient }}" name="{{ ingredient }}" value="{{ quantity }}">
</div>
{% endfor %}
<button type="submit">Confirm</button>
</form>
</div>
this will show each ingredient and quantity in an input field before confirmation. /confirm-meal
will save this recipe for the user.
Usage
Here is a quick recording of how this feature providing the output with the above code.
Conclusion
This was a good exercise to get familiar with string manipulation and using regex to filter and clean the input from users.