Advanced For Loops

Advanced-stage For Loops

We use loops often in Python because it’s common to need to go through all of the items in an iterable — a list, set, tuple, dictionary, or file for example — and do something with each item.

It’s also common for our for loops to get out of control, and become a sprawling mess.

Here’s an example:

import gzip
import re


user_re = re.compile(r'^(.*) \S+ farnsworth .* session opened for user (\w+)')
with gzip.open('sshd.log.gz', mode='rt') as log_file:
    seen = set()
    for line in log_file:
        match = user_re.search(line)
        if match:
            day = (match.group(1), match.group(2))
            if day not in seen:
                print(day[0], day[1])
            seen.add(day)

The program above opens a gzipped file sshd.log.gz, and prints each day that each user successfully logged in, in chronological order.

Chunked Looping

Let’s change that large for loop to a series of very small for loops. First we’ll get all the matches:

matches = []
for line in log_file:
    match = user_re.search(line)
    matches.append(user_re.search(line))

Then we can loop over all these matches, grabbing days from valid matches:

days = []
for match in matches:
    if match:
        day = (match.group(1), match.group(2))
        days.append(day)

Then we could print out the days uniquely (using a set to keep track of which days we’ve seen):

seen = set()
for day in days:
    if day not in seen:
        print(day[0], day[1])
    seen.add(day)

Let’s look at it all together.

import gzip
import re


user_re = re.compile(r'^(.*) \S+ farnsworth .* session opened for user (\w+)')
with gzip.open('sshd.log.gz', mode='rt') as log_file:

    # Find all matches
    matches = []
    for line in log_file:
        match = user_re.search(line)
        matches.append(user_re.search(line))

    # Make a list of days from matches
    days = []
    for match in matches:
        if match:
            day = (match.group(1), match.group(2))
            days.append(day)

    # Print out the days uniquely
    seen = set()
    for day in days:
        if day not in seen:
            print(day[0], day[1])
        seen.add(day)

We’re using more memory here, but this won’t be a problem for the reasonably-sized log file we’re working with.

Writing our code this way breaks down the problem more obviously for readers of our code.

Converting to Comprehensions

Those first two for loops can be copy-pasted into list comprehensions. Comprehensions tend to make code look less like looping and more like data processing.

We can make a list comprehension for matches like this:

matches = [
    user_re.search(line)
    for line in log_file
]

And then a list comprehension for days like this:

days = [
    (match.group(1), match.group(2))
    for match in matches
    if match
]

This last loop has to remain a for loop because it’s printing while adding items to a set.

seen = set()
for day in days:
    if day not in seen:
        print(day[0], day[1])
    seen.add(day)

These variable names are fairly descriptive so we don’t really need comments in our code anymore.

import gzip
import re

user_re = re.compile(r'^(.*) \S+ farnsworth .* session opened for user (\w+)')
with gzip.open('sshd.log.gz', mode='rt') as log_file:
    matches = [
        user_re.search(line)
        for line in log_file
    ]
    days = [
        (match.group(1), match.group(2))
        for match in matches
        if match
    ]
    seen = set()
    for day in days:
        if day not in seen:
            print(day[0], day[1])
        seen.add(day)

Our output becomes:

Jun 04 nancy
Jun 04 taylor
Jun 04 virgil
Jun 05 virgil
Jun 05 vickie
Jun 07 wanda
Jun 07 juan
Write more Pythonic code

I send out 1 Python exercise every week through a Python skill-building service called Python Morsels.

If you'd like to improve your Python skills every week, sign up!

You can find the Privacy Policy here.
reCAPTCHA protected (Google Privacy Policy & TOS)