How to Traverse a Directory Tree in Python – Guide to os.walk

When you use a scripting language like Python, one thing you will find yourself doing over and over again is walking a directory tree, and processing files. While there are many ways to do this, Python offers a built-in function that makes this process a breeze.

Basic Python Directory Traversal

Here’s a really simple example that walks a directory tree, printing out the name of each directory and the files contained:

# Import the os module, for the os.walk function
import os
 
# Set the directory you want to start from
rootDir = '.'
for dirName, subdirList, fileList in os.walk(rootDir):
    print('Found directory: %s' % dirName)
    for fname in fileList:
        print('\t%s' % fname)

os.walk takes care of the details, and on every pass of the loop, it gives us three things:

  • dirName: The next directory it found.
  • subdirList: A list of sub-directories in the current directory.
  • fileList: A list of files in the current directory.

Let’s say we have a directory tree that looks like this:

+--- test.py
|
+--- [subdir1]
| |
| +--- file1a.txt
| +--- file1b.png
|
+--- [subdir2]
|
+--- file2a.jpeg
+--- file2b.html

The code above will produce the following output:

Found directory: .
    file2a.jpeg
    file2b.html
    test.py
Found directory: ./subdir1
    file1a.txt
    file1b.png
Found directory: ./subdir2

Changing the Way the Directory Tree is Traversed

By default, Python will walk the directory tree in a top-down order (a directory will be passed to you for processing), then Python will descend into any sub-directories. We can see this behaviour in the output above; the parent directory (.) was printed first, then its 2 sub-directories.

Sometimes we want to traverse the directory tree bottom-up (files at the very bottom of the directory tree are processed first), then we work our way up the directories. We can tell os.walk to do this via the topdown parameter:

import os
 
rootDir = '.'
for dirName, subdirList, fileList in os.walk(rootDir, topdown=False):
    print('Found directory: %s' % dirName)
    for fname in fileList:
        print('\t%s' % fname)

Which gives us this output:

Found directory: ./subdir1
    file1a.txt
    file1b.png
Found directory: ./subdir2
Found directory: .
    file2a.jpeg
    file2b.html
    test.py

Now we get the files in the sub-directories first, then we ascend up the directory tree.

Selectively Recursing Into Sub-Directories

The examples so far have simply walked the entire directory tree, but os.walk allows us to selectively skip parts of the tree.

For each directory os.walk gives us, it also provides a list of sub-directories (in subdirList). If we modify this list, we can control which sub-directories os.walk will descend into. Let’s tweak our example above so that we skip the first sub-directory.

import os
 
rootDir = '.'
for dirName, subdirList, fileList in os.walk(rootDir):
    print('Found directory: %s' % dirName)
    for fname in fileList:
        print('\t%s' % fname)
    # Remove the first entry in the list of sub-directories
    # if there are any sub-directories present
    if len(subdirList) > 0:
        del subdirList[0]

This gives us the following output:

Found directory: .
    file2a.jpeg
    file2b.html
    test.py
Found directory: ./subdir2

We can see that the first sub-directory (subdir1) was indeed skipped.

This only works when the directory is being traversed top-down since for a bottom-up traversal, sub-directories are processed before their parent directory, so trying to modify the subdirList would be pointless since by that time, the sub-directories would have already been processed!

It’s also important to modify the subdirList in-place, so that the code calling us will see the changes. If we did something like this:

subdirList = subdirList[1:]

… we would create a new list of sub-directories, one that the calling code wouldn’t know about.

For a more comprehensive tutorial on Python’s os.walk method, checkout the recipe Recursive File and Directory Manipulation in Python. Or to take a look at traversing directories in another way (using recursion), checkout the recipe Recursive Directory Traversal in Python: Make a list of your movies!.

Leave a Reply

Your email address will not be published. Required fields are marked *