Advanced For Loops¶
Advanced-stage For Loops¶
We use loops often in Python because it’s common to need to go through all of the items in an iterable — a list, set, tuple, dictionary, or file for example — and do something with each item.
It’s also common for our for loops to get out of control, and become a sprawling mess.
Here’s an example:
import gzip
import re
user_re = re.compile(r'^(.*) \S+ farnsworth .* session opened for user (\w+)')
with gzip.open('sshd.log.gz', mode='rt') as log_file:
seen = set()
for line in log_file:
match = user_re.search(line)
if match:
day = (match.group(1), match.group(2))
if day not in seen:
print(day[0], day[1])
seen.add(day)
The program above opens a gzipped file sshd.log.gz
, and prints each day that each user successfully logged in, in chronological order.
Chunked Looping¶
Let’s change that large for
loop to a series of very small for
loops. First we’ll get all the matches:
matches = []
for line in log_file:
match = user_re.search(line)
matches.append(user_re.search(line))
Then we can loop over all these matches, grabbing days from valid matches:
days = []
for match in matches:
if match:
day = (match.group(1), match.group(2))
days.append(day)
Then we could print out the days uniquely (using a set to keep track of which days we’ve seen):
seen = set()
for day in days:
if day not in seen:
print(day[0], day[1])
seen.add(day)
Let’s look at it all together.
import gzip
import re
user_re = re.compile(r'^(.*) \S+ farnsworth .* session opened for user (\w+)')
with gzip.open('sshd.log.gz', mode='rt') as log_file:
# Find all matches
matches = []
for line in log_file:
match = user_re.search(line)
matches.append(user_re.search(line))
# Make a list of days from matches
days = []
for match in matches:
if match:
day = (match.group(1), match.group(2))
days.append(day)
# Print out the days uniquely
seen = set()
for day in days:
if day not in seen:
print(day[0], day[1])
seen.add(day)
We’re using more memory here, but this won’t be a problem for the reasonably-sized log file we’re working with.
Writing our code this way breaks down the problem more obviously for readers of our code.
Converting to Comprehensions¶
Those first two for
loops can be copy-pasted into list comprehensions. Comprehensions tend to make code look less like looping and more like data processing.
We can make a list comprehension for matches
like this:
matches = [
user_re.search(line)
for line in log_file
]
And then a list comprehension for days
like this:
days = [
(match.group(1), match.group(2))
for match in matches
if match
]
This last loop has to remain a for loop because it’s printing while adding items to a set.
seen = set()
for day in days:
if day not in seen:
print(day[0], day[1])
seen.add(day)
These variable names are fairly descriptive so we don’t really need comments in our code anymore.
import gzip
import re
user_re = re.compile(r'^(.*) \S+ farnsworth .* session opened for user (\w+)')
with gzip.open('sshd.log.gz', mode='rt') as log_file:
matches = [
user_re.search(line)
for line in log_file
]
days = [
(match.group(1), match.group(2))
for match in matches
if match
]
seen = set()
for day in days:
if day not in seen:
print(day[0], day[1])
seen.add(day)
Our output becomes:
Jun 04 nancy
Jun 04 taylor
Jun 04 virgil
Jun 05 virgil
Jun 05 vickie
Jun 07 wanda
Jun 07 juan
I send out 1 Python exercise every week through a Python skill-building service called Python Morsels.
If you'd like to improve your Python skills every week, sign up!
You can find the Privacy Policy here.reCAPTCHA protected (Google Privacy Policy & TOS)