How Codeflash sped up Pydantic code by 34% - Recursion to Stack Optimization

Saurabh Misra

March 17, 2025

With many complex functions, the most natural way to write them is with a recursive implementation. However, a recursive implementation can be slow in Python because recursive calls can be expensive. Is there a better way? Yes! We can remove the recursion and manage an explicit stack ourselves.

In this blog post, we will explain how the recursion to explicit stack optimization works, share an example of such an optimization merged into Pydantic, and discuss how we can automatically discover such optimizations for a codebase using Codeflash.

What makes recursion slow in Python?

Being a dynamic language, in Python a function call can be expensive. Taking a bird’s eye view, this is because the following steps occur while calling a function:

Prepare for the call
- Check if the function name is defined
- Fully evaluate the function arguments
- Ensure agreement between arguments and function parameters
Establish frame for the function call
- Set up space on stack labeled with the function name
- Write down the return address in the frame
- Pass the arguments to parameter variables
Execute the new function

Similarly while returning from a function, the following steps are involved:

Fully evaluate the return expression
Jump back to the origination of the call found in the return address
1. Evaluate the function call to the return value

An effective way of speeding up such functions is to remove these recursive function calls and manage the stack ourselves explicitly. Let's see how this optimization works by looking at some Pydantic code, exploring how we can find such optimizations on our codebases.

‍

Optimization on Pydantic code

Pydantic is one of the most popular and important libraries in Python when it comes to defining data classes that self-validate. Since regular objects in Python don’t check for the validity of data types and values, Pydantic’s validating data classes have become an important component of building reliable Python software.

‍

Internally, at a very high level, Pydantic does 2 things. The first step is to parse the given Pydantic dataclass definitions and convert them to a schema that can be validated later on. This code is written in Python. The second step involves using the schema definitions and validating the new Pydantic objects in runtime against that schema. This happens in Rust.

‍

This was a recursive code in Pydantic schema code which collected the string associated _$ref key in a deeply nested object. This is a very recursive code.
‍


def _get_all_json_refs(item: Any) -> set[JsonRef]:
     """Get all the definitions references from a JSON schema.""" 
    refs: set[JsonRef] = set() 
    if isinstance(item, dict): 
        for key, value in item.items(): 
            if key == '$ref' and isinstance(value, str): 
                # the isinstance check ensures that '$ref' isn't the name of a property, etc. 
                refs.add(JsonRef(value)) 
            elif isinstance(value, dict): 
                refs.update(_get_all_json_refs(value)) 
            elif isinstance(value, list): 
                for item in value: 
                    refs.update(_get_all_json_refs(item)) 
    elif isinstance(item, list): 
        for item in item: 
            refs.update(_get_all_json_refs(item)) 
    return refs

We used Codeflash to optimize this function. Codeflash works by using LLMs to generate possible optimizations for the code and then testing them for correctness and speedup. Here is the Pull Request that Codeflash created.

This is the optimized code that Codeflash found:
‍


def _get_all_json_refs(item: Any) -> set[JsonRef]:
     """Get all the definitions references from a JSON schema.""" 
    refs: set[JsonRef] = set() 
    if isinstance(item, dict): 
        for key, value in item.items(): 
            if key == '$ref' and isinstance(value, str): 
                # the isinstance check ensures that '$ref' isn't the name of a property, etc. 
                refs.add(JsonRef(value)) 
            elif isinstance(value, dict): 
                refs.update(_get_all_json_refs(value)) 
            elif isinstance(value, list): 
                for item in value: 
                    refs.update(_get_all_json_refs(item)) 
    elif isinstance(item, list): 
        for item in item: 
            refs.update(_get_all_json_refs(item)) 
    return refs

The main optimization is to introduce a _stack variable and replace all recursive calls by adding the function argument to the _stack. Moreover, for lists, instead of iterating through the list and calling a function recursively like this:
‍


 elif isinstance(item, list): 
        for item in item: 
            refs.update(_get_all_json_refs(item))

The optimization directly extends the stack like this:
‍


 elif isinstance(item, list): 
        for item in item: 
            refs.update(_get_all_json_refs(item))

This approach prevents several recursive calls in one statement!

‍

Compared with a benchmark that Codeflash generated, this new stack-based approach was 34% faster than the recursive approach! This optimization was merged into Pydantic mainline code, speeding up projects wherever Pydantic is used!

‍

You can re-create the same results by running _{codeflash —-file pydantic/json_schema.py —- function _get_all_json_refs} on a local fork of Pydantic codebase before the optimization was merged in. If you are feeling adventurous, you can also optimize the whole of Pydantic’s codebase by running _{codeflash --all}. Feel free to share the results with me or open a PR on the Pydantic project itself with the results you find!

‍

Tradeoffs and considerations

Converting every recursive function to an explicit-stack approach may or may not result in performance optimization. Explicit stack may be a good approach for optimization if the function:

Results in a lot of deep recursive calls.
Does minimal processing and has significant relative overhead with using recursion.

If the above conditions are not met then such an optimization may not be a good idea. I’d suggest you try this category of optimization for yourself to see if it makes your code faster. Of course, an easier approach would be to just let Codeflash do all the hard work to figure out the best way to optimize your code, including doing the kinds of optimizations discussed in this post.

We saw how this type of optimization significantly sped up a real Pydantic function. I encourage you to see if any such optimizations exist in your Python codebase by scanning it with Codeflash. Codeflash can also find all the other possible optimization types for your codebase. Getting started is super quick by running pip install codeflash on your project’s environment and then running codeflash init . More instructions to get started are here in our documentation.

‍

Let’s make the world’s code faster. Happy optimizing 🚀

Want more Codeflash content?

Join our newsletter and stay updated with fresh insights and exclusive content.

Thank you! We received your submission!

Oops! Something went wrong.

Table of contents

This is some text inside of a div block.

Stay in the Loop!

Join our newsletter and stay updated with the latest in performance optimization automation.

Thank you! We received your submission!

Oops! Something went wrong.

Before equalization

After equalization

Featured posts

How Roboflow Accelerated Computer Vision Model Performance by 25% with Codeflash

Saurabh Misra

April 30, 2025

How Roboflow Accelerated Computer Vision Model Performance by 25% with Codeflash

Saurabh Misra

April 7, 2025

LLMs struggle to write performant code

If you use AI coding assistants, you are unfortunately shipping a lot of slow, un-optimized code. In this blog, I will show why AI struggles with writing performant code, and show how you can always ship performant code even when using AI coding assistants.