Generator Expressions¶
Generator Expressions¶
Sometimes we don’t really care if a list comprehension returns a list, or some other kind of iterable.
Remember when we used a list comprehension for the squares of a list of numbers? Let’s use a generator expression instead. We can make a generator expression like this:
>>> numbers = [1, 2, 3, 4]
>>> squares = (n ** 2 for n in numbers)
>>> squares
<generator object <genexpr> at 0x7f733d4f7e10>
Generators don’t work like other iterables because generators are iterators. We’ll talk more about what iterators are later, but let’s look at some important aspects of iterators that aren’t like iterables like lists, strings, or sets.
They can’t be indexed:
>>> squares[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'generator' object is not subscriptable
And they can’t tell us their length:
>>> len(squares)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'generator' has no len()
But you can loop over generators:
>>> for s in squares:
... print(s)
...
1
4
9
16
But only once:
>>> for s in squares:
... print(s)
...
Because generators are single-use iterables.
Let’s look at how to loop over generators manually. We’ll use the built in Python function next
.
Each time we call next
it will give us the next item in the generator. When it exhausts the items in the generator, it gives a StopIteration
exception.
>>> squares = (n ** 2 for n in numbers)
>>> next(squares)
1
>>> next(squares)
4
>>> next(squares)
9
>>> next(squares)
16
>>> next(squares)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
At this point, we can no longer loop over the items. They are consumed—gone forever.
So why are generator expressions called generator expressions? They look very similar to list comprehensions or dictionary comprehensions, so why aren’t they called generator comprehensions? I don’t know.
Calling them generator comprehensions is fine because people will know what you mean.
Piping Generators¶
Generators are single-use iterables. That means they can only be looped over once. This feature is what allows generators to be both lazy and memory efficient.
Because generators are lazy, it’s not uncommon to see generators piped into other generators.
Let’s take another look at the SSH log parsing code we had before and see if we can turn any of the list comprehensions we used into generator expressions.
import gzip
import re
user_re = re.compile(r'^(.*) \S+ farnsworth .* session opened for user (\w+)')
with gzip.open('sshd.log.gz', mode='rt') as log_file:
matches = [
user_re.search(line)
for line in log_file
]
days = [
(match.group(1), match.group(2))
for match in matches
if match
]
seen = set()
for day in days:
if day not in seen:
print(day[0], day[1])
seen.add(day)
The list comprehensions matches
and days
are ideal candidates for generator expressions. Since we’re piping each directly into each other, that means we’re only looping over each one once. Any time you’re creating an iterable that you’ll only loop over once, it’s a good time to ask yourself if a generator would be a better choice.
So we can refactor matches
like this:
matches = (
user_re.search(line)
for line in log_file
)
And days
like this:
days = (
(match.group(1), match.group(2))
for match in matches
if match
)
Here’s the full result:
import gzip
import re
user_re = re.compile(r'^(.*) \S+ farnsworth .* session opened for user (\w+)')
with gzip.open('sshd.log.gz', mode='rt') as log_file:
matches = (
user_re.search(line)
for line in log_file
)
days = (
(match.group(1), match.group(2))
for match in matches
if match
)
seen = set()
for day in days:
if day not in seen:
print(day[0], day[1])
seen.add(day)
Notice that we’re making a lazy generator called matches
which we then pipe into a generator expression called days
which we then loop over.
Neither of those generators does any work until we start looping over days
and even then they only compute one element at a time.
Why Generators?¶
It’s quite common to filter and process data while looping over lines in a file.
Generator expressions allow us to break down many forms of filtering and data processing into multiple small steps.
Intermediary steps are important for understanding the end result. Important things deserve names and intermediary variables provide names for unnamed things.
Generator Exercises¶
These exercises are all in the generators.py
file in the exercises
directory. Edit the file to add the functions or fix the error(s) in the existing function(s).
To run the test: from the exercises
folder, type python test.py <function_name>
, like this:
$ python test.py all_together
Sum All¶
Edit the function sum_all
so that it accepts a list of lists of numbers and returns the sum of all of the numbers
Use a generator expression.
>>> from loops import sum_all
>>> matrix = [[1, 2, 3], [4, 5, 6]]
>>> sum_all(matrix)
21
>>> sum_all([[0, 1], [4, 2], [3, 1]])
11
All Together¶
Edit the function all_together
so that it takes any number of iterables and strings them together.
Make sure the return value of your function is a generator.
Example:
>>> from generators import all_together
>>> list(all_together([1, 2], (3, 4), "hello"))
[1, 2, 3, 4, 'h', 'e', 'l', 'l', 'o']
>>> nums = all_together([1, 2], (3, 4))
>>> list(all_together(nums, nums))
[1, 2, 3, 4]
Interleave¶
Edit the function interleave
so that it accepts two iterables and returns a generator object with each of the given items “interleaved” (item 0 from iterable 1, then item 0 from iterable 2, then item 1 from iterable 1, and so on).
Hint
The built-in zip
function will be useful for this.
Example:
>>> from generators import interleave
>>> list(interleave([1, 2, 3, 4], [5, 6, 7, 8]))
[1, 5, 2, 6, 3, 7, 4, 8]
>>> nums = [1, 2, 3, 4]
>>> list(interleave(nums, (n**2 for n in nums)))
[1, 1, 2, 4, 3, 9, 4, 16]
Deep Add¶
Edit the deep_add
function so that it accepts an iterable of iterables of numbers of unknown depth and returns the sums of all the numbers.
Example:
>>> from exception import deep_add
>>> deep_add([1, 2, 3, 4])
10
>>> deep_add([(1, 2), [3, {4, 5}]])
15
Parse Number Ranges¶
Edit the parse_ranges
function so that it accepts a string containing ranges of numbers and returns a list of the actual numbers contained in the ranges.
The range numbers are inclusive.
It should work like this:
>>> from generators import parse_ranges
>>> parse_ranges('1-2,4-4,8-10')
[1, 2, 4, 8, 9, 10]
>>> parse_ranges('0-0, 4-8, 20-21, 43-45')
[0, 4, 5, 6, 7, 8, 20, 21, 43, 44, 45]
Is Prime¶
Rewrite the is_prime
function in one expression.
def is_prime(candidate):
for n in range(2, candidate // 2):
if candidate % n == 0:
return False
return True
Hint
Use the any
or all
built-in functions and a generator expression.
It should work like this:
>>> from generators import is_prime
>>> is_prime(9)
False
>>> is_prime(11)
True
>>> is_prime(23)
True
I send out 1 Python exercise every week through a Python skill-building service called Python Morsels.
If you'd like to improve your Python skills every week, sign up!
You can find the Privacy Policy here.reCAPTCHA protected (Google Privacy Policy & TOS)