Generator Expressions

Generator Expressions

Sometimes we don’t really care if a list comprehension returns a list, or some other kind of iterable.

Remember when we used a list comprehension for the squares of a list of numbers? Let’s use a generator expression instead. We can make a generator expression like this:

>>> numbers = [1, 2, 3, 4]
>>> squares = (n ** 2 for n in numbers)
>>> squares
<generator object <genexpr> at 0x7f733d4f7e10>

Generators don’t work like other iterables because generators are iterators. We’ll talk more about what iterators are later, but let’s look at some important aspects of iterators that aren’t like iterables like lists, strings, or sets.

They can’t be indexed:

>>> squares[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'generator' object is not subscriptable

And they can’t tell us their length:

>>> len(squares)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: object of type 'generator' has no len()

But you can loop over generators:

>>> for s in squares:
...     print(s)
...
1
4
9
16

But only once:

>>> for s in squares:
...     print(s)
...

Because generators are single-use iterables.

Let’s look at how to loop over generators manually. We’ll use the built in Python function next.

Each time we call next it will give us the next item in the generator. When it exhausts the items in the generator, it gives a StopIteration exception.

>>> squares = (n ** 2 for n in numbers)
>>> next(squares)
1
>>> next(squares)
4
>>> next(squares)
9
>>> next(squares)
16
>>> next(squares)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

At this point, we can no longer loop over the items. They are consumed—gone forever.

So why are generator expressions called generator expressions? They look very similar to list comprehensions or dictionary comprehensions, so why aren’t they called generator comprehensions? I don’t know.

Calling them generator comprehensions is fine because people will know what you mean.

Piping Generators

Generators are single-use iterables. That means they can only be looped over once. This feature is what allows generators to be both lazy and memory efficient.

Because generators are lazy, it’s not uncommon to see generators piped into other generators.

Let’s take another look at the SSH log parsing code we had before and see if we can turn any of the list comprehensions we used into generator expressions.

import gzip
import re

user_re = re.compile(r'^(.*) \S+ farnsworth .* session opened for user (\w+)')
with gzip.open('sshd.log.gz', mode='rt') as log_file:
    matches = [
        user_re.search(line)
        for line in log_file
    ]
    days = [
        (match.group(1), match.group(2))
        for match in matches
        if match
    ]
    seen = set()
    for day in days:
        if day not in seen:
            print(day[0], day[1])
        seen.add(day)

The list comprehensions matches and days are ideal candidates for generator expressions. Since we’re piping each directly into each other, that means we’re only looping over each one once. Any time you’re creating an iterable that you’ll only loop over once, it’s a good time to ask yourself if a generator would be a better choice.

So we can refactor matches like this:

matches = (
    user_re.search(line)
    for line in log_file
)

And days like this:

days = (
    (match.group(1), match.group(2))
    for match in matches
    if match
)

Here’s the full result:

import gzip
import re

user_re = re.compile(r'^(.*) \S+ farnsworth .* session opened for user (\w+)')
with gzip.open('sshd.log.gz', mode='rt') as log_file:
    matches = (
        user_re.search(line)
        for line in log_file
    )
    days = (
        (match.group(1), match.group(2))
        for match in matches
        if match
    )
    seen = set()
    for day in days:
        if day not in seen:
            print(day[0], day[1])
        seen.add(day)

Notice that we’re making a lazy generator called matches which we then pipe into a generator expression called days which we then loop over.

Neither of those generators does any work until we start looping over days and even then they only compute one element at a time.

Why Generators?

It’s quite common to filter and process data while looping over lines in a file.

Generator expressions allow us to break down many forms of filtering and data processing into multiple small steps.

Intermediary steps are important for understanding the end result. Important things deserve names and intermediary variables provide names for unnamed things.

Generator Exercises

These exercises are all in the generators.py file in the exercises directory. Edit the file to add the functions or fix the error(s) in the existing function(s).

To run the test: from the exercises folder, type python test.py <function_name>, like this:

$ python test.py all_together

Sum All

Edit the function sum_all so that it accepts a list of lists of numbers and returns the sum of all of the numbers Use a generator expression.

>>> from loops import sum_all
>>> matrix = [[1, 2, 3], [4, 5, 6]]
>>> sum_all(matrix)
21
>>> sum_all([[0, 1], [4, 2], [3, 1]])
11

All Together

Edit the function all_together so that it takes any number of iterables and strings them together.

Make sure the return value of your function is a generator.

Example:

>>> from generators import all_together
>>> list(all_together([1, 2], (3, 4), "hello"))
[1, 2, 3, 4, 'h', 'e', 'l', 'l', 'o']
>>> nums = all_together([1, 2], (3, 4))
>>> list(all_together(nums, nums))
[1, 2, 3, 4]

Interleave

Edit the function interleave so that it accepts two iterables and returns a generator object with each of the given items “interleaved” (item 0 from iterable 1, then item 0 from iterable 2, then item 1 from iterable 1, and so on).

Hint

The built-in zip function will be useful for this.

Example:

>>> from generators import interleave
>>> list(interleave([1, 2, 3, 4], [5, 6, 7, 8]))
[1, 5, 2, 6, 3, 7, 4, 8]
>>> nums = [1, 2, 3, 4]
>>> list(interleave(nums, (n**2 for n in nums)))
[1, 1, 2, 4, 3, 9, 4, 16]

Deep Add

Edit the deep_add function so that it accepts an iterable of iterables of numbers of unknown depth and returns the sums of all the numbers.

Example:

>>> from exception import deep_add
>>> deep_add([1, 2, 3, 4])
10
>>> deep_add([(1, 2), [3, {4, 5}]])
15

Parse Number Ranges

Edit the parse_ranges function so that it accepts a string containing ranges of numbers and returns a list of the actual numbers contained in the ranges. The range numbers are inclusive.

It should work like this:

>>> from generators import parse_ranges
>>> parse_ranges('1-2,4-4,8-10')
[1, 2, 4, 8, 9, 10]
>>> parse_ranges('0-0, 4-8, 20-21, 43-45')
[0, 4, 5, 6, 7, 8, 20, 21, 43, 44, 45]

Is Prime

Rewrite the is_prime function in one expression.

def is_prime(candidate):
    for n in range(2, candidate // 2):
        if candidate % n == 0:
            return False
    return True

Hint

Use the any or all built-in functions and a generator expression.

It should work like this:

>>> from generators import is_prime
>>> is_prime(9)
False
>>> is_prime(11)
True
>>> is_prime(23)
True
Write more Pythonic code

I send out 1 Python exercise every week through a Python skill-building service called Python Morsels.

If you'd like to improve your Python skills every week, sign up!

You can find the Privacy Policy here.
reCAPTCHA protected (Google Privacy Policy & TOS)