Yield in Python – beyond the data generation


Welcome to the next pikoTutorial!

yield is a well known keyword in Python which allows to optimize the code by generating data streams on the fly instead of generating the same data all at once. To get started, let’s look at a simple example where we need to create a data stream of squares of numbers from 0 to 100 million. The naive approach would be to just create a list of these numbers:

data = [i**2 for i in range(100_000_000)]

However, this approach ignores the fact that such data has to be fully allocated in memory, which is, of course, a waste of resources because we may end up not needing certain values, and we almost certainly won’t need them all at once. So the better approach is to use a generator function by utilizing keyword yield:

def generate_data():
    for i in range(100_000_000):
        yield i**2

This function will generate individual values on the fly by pausing the function execution at yield keyword after the new value has been returned. Thanks to this mechanism, there’s no need for space allocation for all the 100 millions values – we get every subsequent value one by one and only when it is really needed.

That’s a well known application of yield keyword, but what more can we do with it?

Yield from

Let’s look at an example in which we have 2 generator functions and we want to use the first one in the body of the second one:

def sub_generator():
    yield 1
    yield 2

def main_generator():
    for value in sub_generator():
        yield value
    print("Finally able to proceed to main generator logic!")
    yield 3

for output in main_generator():
    print(output)

The output of such code is:

1
2
Finally able to proceed to main generator logic!
3

In such situations, when we first want to obtain values from the sub-generator, we can use the yield from keyword to avoid explicit loop iterating over the sub-generator:

def sub_generator():
    yield 1
    yield 2

def main_generator():
    yield from sub_generator()
    print("Finally able to proceed to main generator logic!")
    yield 3

for output in main_generator():
    print(output)

Coroutines

Untill this moment, all the examples above showed yield in a role of slightly different return statement – we write yield on the left and the value to be returned on the right. This can however be reversed – yield can actually be assigned to a variable inside the function:

def echo_coroutine():
    while True:
        value = yield
        print(f"Received value = {value}")

coroutine = echo_coroutine()
next(coroutine)

coroutine.send(12)
coroutine.send(24)

The output of such code is:

Received value = 12
Received value = 24

You can treat value = yield line as “assign to a variable whatever will be sent here in the future with send function”.

State machines

Another interesting application of generator functions could be the implementation of FSM (finite state machine). If states are organized in a circular order,, such implementation, together with next() function, may result in a very friendly and verbose interface for iterating over repeating states:

def state_machine():
    while True:
        # perform steps to transition to State 1
        yield "Reached State 1"
        # perform steps to transition to State 2
        yield "Reached State 2"
        # perform steps to transition to State 3
        yield "Reached State 3"

state = state_machine()

for i in range(6):
    print(next(state))

The output of such code is:

Reached State 1
Reached State 2
Reached State 3
Reached State 1
Reached State 2
Reached State 3

Flattening nested structures

In the first part of this article I described how yield from keywords can be used to delegate to sub-generators within generator functions. It’s worth noting that the role of sub-generator can be played by…the generator function itself! This allows for recursive generator function invocation which can be used e.g. for flattening the nested lists:

def flatten_list(input):
    for element in input:
        if isinstance(element, list):
            yield from flatten_list(element)
        else:
            yield element

data = [[1, 2, [3, 4]], [5, 6], 7]
flattened_data = list(flatten_list(data))
print(flattened_data)  # Output: [1, 2, 3, 4, 5, 6, 7]