Generator Functions¶
Grouping Lines¶
Let’s open the widgets.txt
file and write a script to parse the data in it. Here’s what the beginning looks like before we do any work with it:
WIDGET READING 1
name = Donald
battery = 10.7
rpm = 8.7
temp = 13.8
WIDGET READING 2
name = William
battery = 2.5
rpm = 1.7
temp = 14.1
We want to group the lines for each widget reading separately. How could we do that?
This would allow us to collect each set of lines into separate entries in a new list:
groups = []
with open('widgets.txt') as widget_file:
for line in widget_file:
if line.startswith('WIDGET'):
groups.append("")
groups[-1] += line
>>> groups[:4]
['WIDGET READING 1\nname = Walter\nbattery = 70.7\nrpm = 2.8\ntemp = 13.2\n', 'WIDGET READING 2\nname = Marjorie\nbattery = 30.8\nrpm = 3.1\ntemp = 13.8\n', 'WIDGET READING 3\nname = Dwight\nbattery = 92.9\nrpm = 8.2\ntemp = 18.1\n', 'WIDGET READING 4\nname = Courtney\nbattery = 10.0\nrpm = 4.1\ntemp = 14.4\n']
We’ve made line groups, but it’d be nice if we instead had lists of dictionaries with the keys and values of each item.
import re
groups = []
with open('widgets.txt') as widget_file:
for line in widget_file:
if line.startswith('WIDGET'):
group = {}
groups.append(group)
else:
key, value = re.split(r'\s*=\s*', line)
group[key] = value
This will allow us to use both the keys and values separately, which is much more useful.
>>> groups[:2]
[{'name': 'Donald\n', 'battery': '10.7\n', 'rpm': '8.7\n', 'temp': '13.8\n'}, {'name': 'William\n', 'battery': '2.5\n', 'rpm': '1.7\n', 'temp': '14.1\n'}]
>>> groups[0]['battery']
'10.7\n'
>>> groups[0]['name']
'Donald\n'
Grouping Efficiently¶
Let’s move that code we wrote into a function, so we can more easily work with widget data:
import re
def get_widget_groups(widget_file):
groups = []
for line in widget_file:
if line.startswith('WIDGET'):
group = {}
groups.append(group)
else:
key, value = re.split(r'\s*=\s*', line)
group[key] = value
return groups
If we use this function to process the lines in our file, we’ll see that it takes a while for us to start looping because all of the items have to be stored before we can loop over them:
with open('widgets.txt') as widget_file:
for group in get_widget_groups(widget_file):
print(group[4])
This returns an error, but takes quite a bit of time to do so. Why?
Our function builds up a list of every widget group before it returns the list and subsequently, the key error. Is there any way to make this function return a lazy iterable?
Generator Functions¶
Generators are lazy iterables. Let’s see how processing of this widget file would look if we used a generator function.
import re
def get_widget_groups(widget_file):
group = None
for line in widget_file:
if line.startswith('WIDGET'):
if group:
yield group
group = {}
else:
key, value = re.split(r'\s*=\s*', line)
group[key] = value
yield group
The “magic” of a generator function is in the yield
statement. When the function is called, it returns a generator object. When the generator object is looped over the first time, it executes as you would expect, from the start of the function. Once it finds a yield
, it returns it to the caller, but it remembers where it left off. In the next loop, it continues from where it left off, immediately following the yield
, rather than starting at the beginning again.
If we use this new generator function to process the lines in our file, we’ll see that the lines in our file start being processed as soon as we start looping:
with open('widgets.txt') as widget_file:
for group in get_widget_groups(widget_file):
print(group[4])
This returns the same key error as before, but does so immediately.
Generator Objects¶
Let’s make another generator function and see how it behaves.
>>> def count(n=0):
... print("start")
... while True:
... yield n
... n += 1
... print("loop")
... print("end")
...
>>> count()
<generator object count at 0x7f57a0436360>
Calling our generator function doesn’t seem to actually call the function. It seems to create a generator object.
Let’s see what this object has:
>>> c = count()
>>> c()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'generator' object is not callable
>>> c[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'generator' object is not subscriptable
>>> len(c)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'generator' has no len()
This object can’t be called, it doesn’t have a length and can’t be indexed.
What else could we try to do with this object?
Let’s try to loop over it:
>>> for x in c:
... print(x)
... if x > 3:
... break
...
start
0
loop
1
loop
2
loop
3
loop
4
That’s interesting. When we start looping over it the function is entered and our while
loop starts executing.
But wait… it never returned out of the function because “end” was never printed!
Let’s loop over it again:
>>> for x in c:
... print(x)
... if x > 3:
... break
...
loop
5
This time it started counting at 5! And it still hasn’t ended the function.
But if you think about it, it isn’t too strange. Our generator object c
is essentially a single-use object that keeps yielding answers forever, due to the while True
loop inside count()
. There is no logic to end looping in the generator count()
; the ending happens in the code that uses it.
Let’s create a get_big_primes
generator and add print statements at the start and end of the function. Now we’ll see that it ends itself.
import sys
def is_prime(candidate):
for n in range(2, candidate // 2):
if candidate % n == 0:
return False
return True
def get_big_primes(limit):
print("start")
n = 999999
count = 0
while count < limit:
if is_prime(n):
print("found prime: {}".format(n))
yield n
count += 1
n += 1
print("end")
limit = int(sys.argv[1])
for prime in get_big_primes(limit):
print(prime)
$ python primes.py 5
start
found prime: 1000003
1000003
found prime: 1000033
1000033
found prime: 1000037
1000037
found prime: 1000039
1000039
found prime: 1000081
1000081
end
Because our generator function get_big_primes
ends itself based on the input limit
, we do see the printing of the “end” message.
Note
Python 3 added a yield from
syntax to allow more easily yielding every item in an iterable.
In Python 3, this:
def generatorify(iterable):
yield from iterable
Is a shortcut for this:
def generatorify(iterable):
for x in iterable:
yield x
Generator Expressions¶
We’ve talked about both generator expressions and generator functions. Let’s take a look at how generator expressions are related to generator functions.
You can think of generator expressions as a special-purpose generator function.
If you have a generator function that looks like this:
def do_something(elements):
for item in elements:
if condition(item):
yield some_operation(item)
You could rewrite that as a generator expression like this:
def do_something(elements):
return (
some_operation(item)
for item in elements
if condition(item)
)
If there’s no condition:
def do_something(elements):
for item in elements:
yield some_operation(item)
That’s equivalent to this:
def do_something(elements):
return (
some_operation(item)
for item in elements
)
For example here’s a generator function that takes an iterable of lines (a file object for example) and operates on it:
def trim_line_endings(lines):
for line in lines:
yield line.rstrip('\n')
Here’s an equivalent function with a generator expression:
def trim_line_endings(lines):
return (
line.rstrip('\n')
for line in lines
)
Example usage:
>>> lines = ['hello \n', 'there\n']
>>> trim_line_endings(lines)
['hello ', 'there']
Here’s the same generator expression outside of a function:
trimmed_lines = (
line.rstrip('\n')
for line in lines
)
Generator Function Exercises¶
These exercises are all in the functions.py
file in the exercises
directory. Edit the file to add the functions or fix the error(s) in the existing function(s).
To run the test: from the exercises
folder, type python test.py <function_name>
, like this:
$ python test.py unique
Unique¶
Edit the function unique
to be a generator function that takes an iterable and yields the iterable elements in order, skipping duplicate values.
Example:
>>> from functions import unique
>>> list(unique([6, 7, 0, 9, 0, 1, 2, 7, 7, 9]))
[6, 7, 0, 9, 1, 2]
>>> list(unique([]))
[]
>>> ''.join(unique("hello there"))
'helo tr'
Float Range¶
Edit the float_range
function so that it works like range
except that the start
, stop
, and step
can be fractional.
You can ignore negative step
values (I’ll only ever provide positive step
values to you).
>>> list(float_range(2.5, 5))
[2.5, 3.5, 4.5]
>>> list(float_range(2.5, 5, 0.5))
[2.5, 3.0, 3.5, 4.0, 4.5]
>>> list(float_range(2.5, 5, step=0.5))
[2.5, 3.0, 3.5, 4.0, 4.5]
Head¶
Edit the head
function to lazily gives the first n
items of a given iterable.
Try to use a generator function for this.
>>> list(head([1, 2, 3, 4, 5], n=2))
[1, 2]
>>> first_4 = head([1, 2, 3, 4, 5], n=4)
>>> list(zip(first_4, first_4))
[(1, 2), (3, 4)]
Interleave¶
Edit the interleave
function in generators.py
so that it two iterables and returns a generator object with each of the given items “interleaved” (e.g. first item from first iterable, first item from second, second item from first, second item from second, and so on).
You may assume the input iterables have the same number of elements.
Try to use a generator function for this.
>>> list(interleave([1, 2, 3, 4], [5, 6, 7, 8]))
[1, 5, 2, 6, 3, 7, 4, 8]
>>> nums = [1, 2, 3, 4]
>>> list(interleave(nums, (n**2 for n in nums)))
[1, 1, 2, 4, 3, 9, 4, 16]
Pairwise¶
Edit the function pairwise
to be a generator function that accepts an iterable and yields a tuple containing each item and the item following it. The last item should treat the item after it as None
.
Example:
>>> from functions import pairwise
>>> list(pairwise([1, 2, 3]))
[(1, 2), (2, 3), (3, None)]
>>> list(pairwise([]))
[]
>>> list(pairwise("hey"))
[('h', 'e'), ('e', 'y'), ('y', None)]
Stop On¶
Edit the function stop_on
to be a generator function that accepts an iterable and a value and yields from the given iterable repeatedly until the given value is reached.
Example:
>>> from functions import stop_on
>>> list(stop_on([1, 2, 3], 3))
[1, 2]
>>> next(stop_on([1, 2, 3], 1), 0)
0
Around¶
Edit the function around
to be a generator function that accepts an iterable and yields a tuple containing the previous item, the current item, and the next item. The previous item should start at None
and the next item should be None
for the last item in the iterable.
Example:
>>> from functions import around
>>> list(around([1, 2, 3, 4]))
[(None, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, None)]
>>> list(around([]))
[]
>>> list(around("hey"))
[(None, 'h', 'e'), ('h', 'e', 'y'), ('e', 'y', None)]
Deep Flatten¶
Edit the function deep_flatten
to be a generator function that “flattens” nested iterables. In other words the function should accept an iterable of iterables and yield non-iterable items in order.
Example:
>>> from functions import deep_flatten
>>> list(deep_flatten([0, [1, [2, 3]], [4]]))
[0, 1, 2, 3, 4]
>>> list(deep_flatten([[()]]))
[]
Big Primes¶
Edit the get_primes_over
function to return a given number of primes above 1,000,000. Make it a generator.
It should work like this:
>>> from generators import get_primes_over
>>> primes = get_primes_over(5)
>>> next(primes)
1000003
>>> next(primes)
1000033
>>> next(primes)
1000037
>>> next(primes)
1000039
>>> next(primes)
1000081
>>> next(primes)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> list(get_primes_over(3))
[1000003, 1000033, 1000037]
You can use this function to determine whether a number is prime:
def is_prime(candidate):
"""Return True if candidate number is prime."""
for n in range(2, candidate):
if candidate % n == 0:
return False
return True
I send out 1 Python exercise every week through a Python skill-building service called Python Morsels.
If you'd like to improve your Python skills every week, sign up!
You can find the Privacy Policy here.reCAPTCHA protected (Google Privacy Policy & TOS)