Generator Functions

Grouping Lines

Let’s open the widgets.txt file and write a script to parse the data in it. Here’s what the beginning looks like before we do any work with it:

WIDGET READING 1
name = Donald
battery = 10.7
rpm = 8.7
temp = 13.8
WIDGET READING 2
name = William
battery = 2.5
rpm = 1.7
temp = 14.1

We want to group the lines for each widget reading separately. How could we do that?

This would allow us to collect each set of lines into separate entries in a new list:

groups = []
with open('widgets.txt') as widget_file:
    for line in widget_file:
        if line.startswith('WIDGET'):
            groups.append("")
        groups[-1] += line
>>> groups[:4]
['WIDGET READING 1\nname = Walter\nbattery = 70.7\nrpm = 2.8\ntemp = 13.2\n', 'WIDGET READING 2\nname = Marjorie\nbattery = 30.8\nrpm = 3.1\ntemp = 13.8\n', 'WIDGET READING 3\nname = Dwight\nbattery = 92.9\nrpm = 8.2\ntemp = 18.1\n', 'WIDGET READING 4\nname = Courtney\nbattery = 10.0\nrpm = 4.1\ntemp = 14.4\n']

We’ve made line groups, but it’d be nice if we instead had lists of dictionaries with the keys and values of each item.

import re

groups = []
with open('widgets.txt') as widget_file:
    for line in widget_file:
        if line.startswith('WIDGET'):
            group = {}
            groups.append(group)
        else:
            key, value = re.split(r'\s*=\s*', line)
            group[key] = value

This will allow us to use both the keys and values separately, which is much more useful.

>>> groups[:2]
[{'name': 'Donald\n', 'battery': '10.7\n', 'rpm': '8.7\n', 'temp': '13.8\n'}, {'name': 'William\n', 'battery': '2.5\n', 'rpm': '1.7\n', 'temp': '14.1\n'}]
>>> groups[0]['battery']
'10.7\n'
>>> groups[0]['name']
'Donald\n'

Grouping Efficiently

Let’s move that code we wrote into a function, so we can more easily work with widget data:

import re

def get_widget_groups(widget_file):
    groups = []
    for line in widget_file:
        if line.startswith('WIDGET'):
            group = {}
            groups.append(group)
        else:
            key, value = re.split(r'\s*=\s*', line)
            group[key] = value
    return groups

If we use this function to process the lines in our file, we’ll see that it takes a while for us to start looping because all of the items have to be stored before we can loop over them:

with open('widgets.txt') as widget_file:
    for group in get_widget_groups(widget_file):
        print(group[4])

This returns an error, but takes quite a bit of time to do so. Why?

Our function builds up a list of every widget group before it returns the list and subsequently, the key error. Is there any way to make this function return a lazy iterable?

Generator Functions

Generators are lazy iterables. Let’s see how processing of this widget file would look if we used a generator function.

import re

def get_widget_groups(widget_file):
    group = None
    for line in widget_file:
        if line.startswith('WIDGET'):
            if group:
                yield group
            group = {}
        else:
            key, value = re.split(r'\s*=\s*', line)
            group[key] = value
    yield group

The “magic” of a generator function is in the yield statement. When the function is called, it returns a generator object. When the generator object is looped over the first time, it executes as you would expect, from the start of the function. Once it finds a yield, it returns it to the caller, but it remembers where it left off. In the next loop, it continues from where it left off, immediately following the yield, rather than starting at the beginning again.

If we use this new generator function to process the lines in our file, we’ll see that the lines in our file start being processed as soon as we start looping:

with open('widgets.txt') as widget_file:
    for group in get_widget_groups(widget_file):
        print(group[4])

This returns the same key error as before, but does so immediately.

Generator Objects

Let’s make another generator function and see how it behaves.

>>> def count(n=0):
...     print("start")
...     while True:
...         yield n
...         n += 1
...         print("loop")
...     print("end")
...
>>> count()
<generator object count at 0x7f57a0436360>

Calling our generator function doesn’t seem to actually call the function. It seems to create a generator object.

Let’s see what this object has:

>>> c = count()
>>> c()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'generator' object is not callable
>>> c[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'generator' object is not subscriptable
>>> len(c)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: object of type 'generator' has no len()

This object can’t be called, it doesn’t have a length and can’t be indexed.

What else could we try to do with this object?

Let’s try to loop over it:

>>> for x in c:
...     print(x)
...     if x > 3:
...         break
...
start
0
loop
1
loop
2
loop
3
loop
4

That’s interesting. When we start looping over it the function is entered and our while loop starts executing.

But wait… it never returned out of the function because “end” was never printed!

Let’s loop over it again:

>>> for x in c:
...     print(x)
...     if x > 3:
...         break
...
loop
5

This time it started counting at 5! And it still hasn’t ended the function.

But if you think about it, it isn’t too strange. Our generator object c is essentially a single-use object that keeps yielding answers forever, due to the while True loop inside count(). There is no logic to end looping in the generator count(); the ending happens in the code that uses it.

Let’s create a get_big_primes generator and add print statements at the start and end of the function. Now we’ll see that it ends itself.

import sys


def is_prime(candidate):
    for n in range(2, candidate // 2):
        if candidate % n == 0:
            return False
    return True

def get_big_primes(limit):
    print("start")
    n = 999999
    count = 0
    while count < limit:
        if is_prime(n):
            print("found prime: {}".format(n))
            yield n
            count += 1
        n += 1
    print("end")

limit = int(sys.argv[1])
for prime in get_big_primes(limit):
    print(prime)
$ python primes.py 5
start
found prime: 1000003
1000003
found prime: 1000033
1000033
found prime: 1000037
1000037
found prime: 1000039
1000039
found prime: 1000081
1000081
end

Because our generator function get_big_primes ends itself based on the input limit, we do see the printing of the “end” message.

Note

Python 3 added a yield from syntax to allow more easily yielding every item in an iterable.

In Python 3, this:

def generatorify(iterable):
    yield from iterable

Is a shortcut for this:

def generatorify(iterable):
    for x in iterable:
        yield x

Generator Expressions

We’ve talked about both generator expressions and generator functions. Let’s take a look at how generator expressions are related to generator functions.

You can think of generator expressions as a special-purpose generator function.

If you have a generator function that looks like this:

def do_something(elements):
    for item in elements:
        if condition(item):
            yield some_operation(item)

You could rewrite that as a generator expression like this:

def do_something(elements):
    return (
        some_operation(item)
        for item in elements
        if condition(item)
    )

If there’s no condition:

def do_something(elements):
    for item in elements:
        yield some_operation(item)

That’s equivalent to this:

def do_something(elements):
    return (
        some_operation(item)
        for item in elements
    )

For example here’s a generator function that takes an iterable of lines (a file object for example) and operates on it:

def trim_line_endings(lines):
    for line in lines:
        yield line.rstrip('\n')

Here’s an equivalent function with a generator expression:

def trim_line_endings(lines):
    return (
        line.rstrip('\n')
        for line in lines
    )

Example usage:

>>> lines = ['hello \n', 'there\n']
>>> trim_line_endings(lines)
['hello ', 'there']

Here’s the same generator expression outside of a function:

trimmed_lines = (
    line.rstrip('\n')
    for line in lines
)

Generator Function Exercises

These exercises are all in the functions.py file in the exercises directory. Edit the file to add the functions or fix the error(s) in the existing function(s).

To run the test: from the exercises folder, type python test.py <function_name>, like this:

$ python test.py unique

Unique

Edit the function unique to be a generator function that takes an iterable and yields the iterable elements in order, skipping duplicate values.

Example:

>>> from functions import unique
>>> list(unique([6, 7, 0, 9, 0, 1, 2, 7, 7, 9]))
[6, 7, 0, 9, 1, 2]
>>> list(unique([]))
[]
>>> ''.join(unique("hello there"))
'helo tr'

Float Range

Edit the float_range function so that it works like range except that the start, stop, and step can be fractional.

You can ignore negative step values (I’ll only ever provide positive step values to you).

>>> list(float_range(2.5, 5))
[2.5, 3.5, 4.5]
>>> list(float_range(2.5, 5, 0.5))
[2.5, 3.0, 3.5, 4.0, 4.5]
>>> list(float_range(2.5, 5, step=0.5))
[2.5, 3.0, 3.5, 4.0, 4.5]

Interleave

Edit the interleave function in generators.py so that it two iterables and returns a generator object with each of the given items “interleaved” (e.g. first item from first iterable, first item from second, second item from first, second item from second, and so on). You may assume the input iterables have the same number of elements.

Try to use a generator function for this.

>>> list(interleave([1, 2, 3, 4], [5, 6, 7, 8]))
[1, 5, 2, 6, 3, 7, 4, 8]
>>> nums = [1, 2, 3, 4]
>>> list(interleave(nums, (n**2 for n in nums)))
[1, 1, 2, 4, 3, 9, 4, 16]

Pairwise

Edit the function pairwise to be a generator function that accepts an iterable and yields a tuple containing each item and the item following it. The last item should treat the item after it as None.

Example:

>>> from functions import pairwise
>>> list(pairwise([1, 2, 3]))
[(1, 2), (2, 3), (3, None)]
>>> list(pairwise([]))
[]
>>> list(pairwise("hey"))
[('h', 'e'), ('e', 'y'), ('y', None)]

Stop On

Edit the function stop_on to be a generator function that accepts an iterable and a value and yields from the given iterable repeatedly until the given value is reached.

Example:

>>> from functions import stop_on
>>> list(stop_on([1, 2, 3], 3))
[1, 2]
>>> next(stop_on([1, 2, 3], 1), 0)
0

Around

Edit the function around to be a generator function that accepts an iterable and yields a tuple containing the previous item, the current item, and the next item. The previous item should start at None and the next item should be None for the last item in the iterable.

Example:

>>> from functions import around
>>> list(around([1, 2, 3, 4]))
[(None, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, None)]
>>> list(around([]))
[]
>>> list(around("hey"))
[(None, 'h', 'e'), ('h', 'e', 'y'), ('e', 'y', None)]

Deep Flatten

Edit the function deep_flatten to be a generator function that “flattens” nested iterables. In other words the function should accept an iterable of iterables and yield non-iterable items in order.

Example:

>>> from functions import deep_flatten
>>> list(deep_flatten([0, [1, [2, 3]], [4]]))
[0, 1, 2, 3, 4]
>>> list(deep_flatten([[()]]))
[]

Big Primes

Edit the get_primes_over function to return a given number of primes above 1,000,000. Make it a generator.

It should work like this:

>>> from generators import get_primes_over
>>> primes = get_primes_over(5)
>>> next(primes)
1000003
>>> next(primes)
1000033
>>> next(primes)
1000037
>>> next(primes)
1000039
>>> next(primes)
1000081
>>> next(primes)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> list(get_primes_over(3))
[1000003, 1000033, 1000037]

You can use this function to determine whether a number is prime:

def is_prime(candidate):
    """Return True if candidate number is prime."""
    for n in range(2, candidate):
        if candidate % n == 0:
            return False
    return True
Write more Pythonic code

I send out 1 Python exercise every week through a Python skill-building service called Python Morsels.

If you'd like to improve your Python skills every week, sign up!

You can find the Privacy Policy here.
reCAPTCHA protected (Google Privacy Policy & TOS)