A Practical Guide to Parsing Recipe Ingredients with Python (Flask)

A Practical Guide to Parsing Recipe Ingredients with Python (Flask)

Learn How to Extract Quantities and Ingredients from Cooking Recipes with Python

Processing a list of ingredients into a dictionary

When I started to learn code in 2020, I was only interested in front-end development. When I started to work on projects, I got frustrated I didn't know how to work with databases and the backend.

From last, I went back to start studying from the foundation of computer science to backend and databases (C -> python) with CS50. I was blown away by Python, coming from Javascript.

Introduction

In this tutorial, I'll be walking you through a Python program I'm working on. Briefly covering the datatypes from my notes and a few programming concepts and techniques used in practice.

There are many frameworks in Python for web development, I'm going to be using the Flask framework, which is very minimal and beginner-friendly.

Here is a pretty good introduction to getting started with Flask by olalekan temitayo if you want to check it out.

Datatypes

Before we start, here is a list of common data types (some of which I've not used or even heard of before) useful for Python.

  1. Integer (int)

    • whole numbers without fractions

    • example: 0, 10, - 30, etc

  2. Float (float)

    • numbers with fractions

    • example: 4.35, -1.6, 5.25563, etc

  3. String (str)

    • a sequence of characters, enclosed in quotes (either ' ' or " ")

    • example: "hello world", "567", etc

  4. Boolean (bool)

    • Represents either: True or False

    • Used in conditional statements and logical operations

  5. List:

    • an ordered collection of elements enclosed within square brackets []

    • list can be modified and elements of various data types

    • example: [1, 'hello', True]

  6. Tuple:

    • similar to lists, but immutable (cannot be modified) where elements are enclosed within parentheses ()

    • used for fixed collections of items

    • examples: (1, 2, 3), ('cold', 'hot')

  7. Dictionary (dict):

    • an unordered collection of key-value pairs enclosed within curly braces {}

    • each key is associated with a value, accessed using keys instead of indexes

    • example: {'name': 'Steve Jobs', 'company': 'Apple'}

  8. Set:

    • an unordered collection of unique elements enclosed within curly braces {}

    • ensures each element appears only once

    • example: {'sodium', 'carbon', 'hydrogen'}

  9. NoneType (None):

    • a null value or rather the absence of value.

    • used when a variable is yet to be assigned a value.

    • example: None

Understanding the characteristics and usage is essential for effective programming.

This is barely scratching the surface but good to get started. For more details, check Python documentation on datatypes (link below in additional resources).

Let's get to the good part.

Concepts and Techniques Covered

These are some of the essential programming concepts and techniques covered in this tutorial.

  • String Manipulation

  • Regular Expressions

  • Dictionary Manipulation

  • Conditional Statements and Loops

Context

I'm building a web app for users to collect recipes, plan meals and build a shopping list. The first core feature I'm building is creating (or rather copy-pasting recipes from wherever) and saving meals.

Each meal is a collection of ingredients with a specific quantity of a standard unit of measure.

I'm going to split this into 3 parts:

  1. Extract the quantity and ingredients from each line.

  2. Restructure into a dictionary of ingredients with quantity and unit of measure.

  3. Process user input and show output for the user to confirm ingredients and measures before saving it to the database.

Code Walkthrough

Frontend - user input

The HTML will be rendered to the user based on their request method where GET would prompt the user for a list of ingredients with details in a textarea input.

Here is the code for /new-meal.html which gets the input from users and POST the form to the same URL where our function helps break it down into a dictionary.

<h1>Add new meal</h1>

<form action="/new-meal" method="post">
    <textarea autofocus autocomplete="off" id="meal" name="meal" rows="5" cols="33">
    </textarea>            
    <button type="submit">Continue</button>
</form>

Define route and backend

Let's start with defining the route and backend code first.

@app.route("/new-meal", methods=["GET", "POST"])
def new_meal():
    "Create a new meal post for the current user."""

    if request.method == "POST":
        user_input = request.form.get("meal")
        result = restructure_input(user_input)
        return render_template("/new-meal.html/", meal=result)

    return render_template("/new-meal.html/")

This route will accept both GET and POST methods, the latter needs to be explicitly defined otherwise by default all routes are GET only.

When the request type is POST we get() the data and passing into the restructure_input(user_input).

The results are returned with the same HTML template for confirmation.

We are going to import re module from python

import re

Function for restructuring

This function will receive user input and return the same data in a dictionary format.

The user input is a list of ingredients and their quantity, we are going to use this function to split() the data by each line.

def restructure_input(input_text):
    """Restructures user input into a dictionary of ingredients and quantities."""

    ingredients_dict = {}
    lines = input_text.split('\n')

    # more code for looping through each line here

    return ingredients_dict

Now we are going to loop through each line and find the quantity, unit of measure, and ingredient.

The key will be the ingredient name and quantity and the unit of measure will be the value.

The strip() function helps to remove extra white spaces.

We will define another function to help us extract the data into quantity and ingredients before passing it into the dictionary.

    for line in lines:
        if line.strip():
            quantity, ingredient = extract_quantity_and_ingredient(line)
            ingredients_dict[ingredient] = quantity

Here is the full code for structure_input:

def restructure_input(input_text):
    """Restructures user input into a dictionary of ingredients and quantities."""

    ingredients_dict = {}
    lines = input_text.split('\n')

    for line in lines:
        if line.strip():
            quantity, ingredient = extract_quantity_and_ingredient(line)
            ingredients_dict[ingredient] = quantity

    return ingredients_dict

Function to extract items

To identify the unit of measurement, I have created the following list to cross-check inside our extract function.


    units_of_measurement = [
    'ml', 'mL', 'milliliter', 'millilitre', 'cc', 'l', 'L', 'liter', 'litre',
    'dl', 'dL', 'deciliter', 'decilitre', 'teaspoon', 'tsp', 't', 'tbl', 'tbs', 'tbsp', 'tbsps',
    'tablespoon', 'fluid ounce', 'fl oz', 'gill', 'cup', 'c', 'pint', 'p', 'pt', 'fl pt',
    'quart', 'q', 'qt', 'fl qt', 'gallon', 'g', 'gal', 'mg', 'milligram', 'milligramme',
    'g', 'gram', 'gramme', 'kg', 'kilogram', 'kilogramme', 'pound', 'lb', '#', 'ounce', 'oz'
]

Some ingredients come with quantity in fraction format like this: ¼ I could not get the regex to filter this, so we are going to find and replace all of these using the function below.

def preprocess_line(line):
        """Preprocesses the line to replace common fractions with their equivalent fractional representation."""
        fractions_mapping = {
            '½': '1/2',
            '¼': '1/4',
            '¾': '3/4',
            '⅓': '1/3',
            '⅔': '2/3',
            '⅕': '1/5',
            '⅖': '2/5',
            '⅗': '3/5',
            '⅘': '4/5',
        }
        for fraction, equivalent in fractions_mapping.items():
            line = line.replace(fraction, equivalent)
        return line

Let's define how we are going to extract the ingredients and quantity. This function will accept each line and preprocess to extract the data and return the variables quantity and ingredient.

    def extract_quantity_and_ingredient(line):
        """Extracts quantity and ingredient from a line."""
        line = fractions_mapping(line)
        # code for splitting up extracting data here
        return quantity, ingredient

First, let's get the quantity by identifying the numeric parts. re is a module in Python for regular expressions (regex). This will search for specified patterns and extract what we need.

numeric_parts = re.findall(r'\d+/\d+|\d+', line)

Here is a breakdown of what each part does:

  • \d+ matches one or more digits. The + means "one or more of the preceding element".

  • / matches the forward slash character exactly.

  • | is the OR operator, it matches the pattern before or the pattern after it.

This will match any positive integer or fraction like 1/4.

Next, we will combine the numeric into the quantity variable and remove the data and any spaces to extract the ingredient name.

quantity = ' '.join(numeric_parts)
ingredient = re.sub(r'-?\d+/\d+|-?\d+', '', line).strip()

We use the same regex with one additional step to include negative integers as well.

  • -? matches an optional negative sign. The? means "zero or one of the preceding elements".

  • sub() function is a part of the re module in Python which returns a string after replacing the specified pattern with a replacement, in this case '' to remove it and strip() of all white spaces.

Next, we will clean up the ingredient string by removing hyphens, anything in parentheses, and extra spaces, and converting the string to lowercase.

Sometimes there might be bullet points or a dash in the line, using a replace() function we'll look for any - and remove them.

        ingredient = ingredient.replace('-', '')
        ingredient = re.sub(r'\([^)]*\)', '', ingredient).strip()
        ingredient = re.sub(r'\s+', ' ', ingredient).lower()

Here's what each line does:

  • first line is to remove all hyphens or dashes from a string

  • the first regular expression is to find and remove any parentheses and the content within and then trim any leading whitespace

  • the last line is to replace any sequence of one or more whitespace characters with a single space, and then convert everything to lowercase.

This helps to standardize the format of the ingredient name.

For the last part, here is a for loop to look for units of measurement and extract them.


        for unit in units_of_measurement:
            if f' {unit} ' in f' {ingredient} ':
                ingredient = ingredient.replace(unit, '').strip()
                quantity += ' ' + unit
                break
        return quantity, ingredient

The if conditions looks for any match to {unit} with a white space on both sides in the string and then suffix to the quantity while removing from the ingredient name.

If no match is found, returns as it is.

Here is the whole code for this function.

    def extract_quantity_and_ingredient(line):
        """Extracts quantity and ingredient from a line."""
        # preprocess fractional values
        line = preprocess_line(line)
        # Find all numeric parts in the line
        numeric_parts = re.findall(r'-?\d+/\d+|-?\d+', line)
        if numeric_parts:
            # Combine numeric parts into a single quantity
            quantity = ' '.join(numeric_parts)
            # Remove numeric parts and any leading/trailing spaces to get the ingredient
            ingredient = re.sub(r'-?\d+/\d+|-?\d+', '', line).strip()
            # Remove dashes from the ingredient
            ingredient = ingredient.replace('-', '')
            # Remove any parentheses and extra spaces from the ingredient
            ingredient = re.sub(r'\([^)]*\)', '', ingredient).strip()
            ingredient = re.sub(r'\s+', ' ', ingredient).lower()
            # Look for units of measurement in the ingredient and extract them
            for unit in units_of_measurement:
                if f' {unit} ' in f' {ingredient} ':
                    ingredient = ingredient.replace(unit, '').strip()
                    quantity += ' ' + unit
                    break
            return quantity, ingredient
        return None, None

For a finishing touch, we add the following to the HTML to show the result.

     <div>
        <form action="/confirm-meal" method="post">
            {% for ingredient, quantity in meal.items() %}
            <div>
                <label for="{{ ingredient }}">{{ ingredient }}</label>
                <input type="text" id="{{ ingredient }}" name="{{ ingredient }}" value="{{ quantity }}">
            </div>
            {% endfor %}
            <button type="submit">Confirm</button>
        </form>
    </div>

this will show each ingredient and quantity in an input field before confirmation. /confirm-meal will save this recipe for the user.

Usage

Here is a quick recording of how this feature providing the output with the above code.

Conclusion

This was a good exercise to get familiar with string manipulation and using regex to filter and clean the input from users.

Additional Resources

Get started with flask

Python - Data types

Python - Built-in data types

Regex101

Did you find this article valuable?

Support Raif's Tech Blog by becoming a sponsor. Any amount is appreciated!