Lesson 7, Bit 5: Filenames, Paths, Error Handling, and Debugging

Filenames and Paths

Files are organized into directories (also called "folders"). Every running program has a "current directory", which is the default directory for most operations. For example, when you open a file for reading, Python looks for it in the current directory.

The os module provides functions for working with files and directories ("os" stands for "operating system"). os.getcwd returns the name of the current directory:

Code Output
import os

cwd = os.getcwd()

print(cwd)
C:\Python

cwd stands for "current working directory". The result in this example is C:\Python.

A string like 'C:\Python' that identifies a file or directory is called a path.

A simple filename, like mbox.txt is also considered a path, but it is a relative path because it relates to the current directory. If the current directory is C:\Python, the filename mbox.txt would refer to C:\Python \mbox.txt.

A path that begins with / does not depend on the current directory; it is called an absolute path. To find the absolute path to a file, you can use os.path.abspath:

Code Output
import os

path = os.path.abspath('mbox.txt')

print(path)
C:\Python\mbox.txt

os.path provides other functions for working with filenames and paths. For example, os.path.exists checks whether a file or directory exists:

Code Result
import os

os.path.exists('mbox.txt')
True

If it exists, os.path.isdir checks whether it's a directory:

Code Result
os.path.isdir('mbox.txt') False
os.path.isdir('C:\\Python') True

A quick note: notice that the backslash is listed twice in the string 'C:\\Python' above. Why is that?  Well, remember that a backslash is the escape sequence for creating special characters like newlines \n and tabs \t.  When Python sees a backslash, it assumes an escape sequence is coming. As a result, you need to use two backslashes for a backslash to display: one is the escape sequence and the other is the character to be displayed.

Similarly, os.path.isfile checks whether it's a file.

Code Result
os.path.isfile('words.txt') True
os.path.isfile('C:\\Python') False

os.listdir returns a list of the files (and other directories) in the given directory:

Code Result
os.listdir('C:\\Python') ['append.txt', 'DLLs', 'Doc', 'examples.py', 'include', 'labs.py', 'Lib', 'libs', 'LICENSE.txt', 'mbox.txt', 'NEWS.txt', 'output.txt', 'python.exe', 'python3.dll', 'python35.dll', 'pythonw.exe', 'README.txt', 'Scripts', 'tcl', 'Tools', 'vcruntime140.dll', 'words.txt']

To demonstrate these functions, the following example "walks" through a directory, prints the names of all the files, and calls itself recursively on all the directories.

Line Code Notes
1 import os

Import in the os module.

2 def walk(dirname):

Define the walk function.  It has a single argument: dirname.

3     for name in os.listdir(dirname):

The method listdir produces a list. We're using a for loop to go through each element in the directory called dirname.

4         path = os.path.join(dirname, name)

os.path.join takes a directory and a file name and joins them into a complete path. We assign this to the variable path.

5         if os.path.isfile(path):

We check to see whether path is a file using os.path.isfile.

6             print(path)

If it is a file, then we display the full path.

7         else:

Otherwise…

8             walk(path)

We run this same function for that directory.

Here is the output to this program when we use 'C:\\Python' as dirname:

C:\Python\append.txt
C:\Python\DLLs\py.ico
C:\Python\DLLs\pyc.ico
C:\Python\DLLs\pyexpat.pyd
C:\Python\DLLs\select.pyd
C:\Python\DLLs\sqlite3.dll
(about 1,200 more lines)

Catching Exceptions with Files

A lot of things can go wrong when you try to read and write files. If you try to open a file that doesn't exist, you get an IOError:

Code Result
fin = open('bad_file') IOError: [Errno 2] No such file or directory: 'bad_file'

If you don't have permission to access a file:

Code Result
fout = open('/etc/passwd', 'w') PermissionError: [Errno 13] Permission denied: '/etc/passwd'

And if you try to open a directory for reading, you get

Code Result
fin = open('/home') IsADirectoryError: [Errno 21] Is a directory: '/home'

To avoid these errors, you could use functions like os.path.exists and os.path.isfile, but it would take a lot of time and code to check all the possibilities (if "Errno 21" is any indication, there are at least 21 things that can go wrong).

It is better to go ahead and try - and deal with problems if they happen - which is exactly what the try statement does. Here is how you might use it with opening a file:

try:
    fin = open('bad_file')

except:
    print('Something went wrong.')

Python starts by executing the try clause. If all goes well, it skips the except clause and proceeds. If an exception occurs, it jumps out of the try clause and runs the except clause.

Handling an exception with a try statement is called catching an exception. In this example, the except clause prints an error message that is not very helpful. In general, catching an exception gives you a chance to fix the problem, or try again, or at least end the program gracefully.

Launch Exercise

Going back to our example, we need to assume that the open call might fail and add recovery code when the open fails as follows:

file_name = input('Enter the file name: ')

try:
    fin = open(file_name)

except:
    print("File cannot be opened:", file_name)

else:
    count = 0

    for line in fin:
        line = line.strip()

        if not line.startswith('Subject:') :
            continue

        else:
            count += 1

    print("There were", count, "subject lines in", file_name)

Now when our user (or QA team) types in silliness or bad file names, we "catch" them and recover gracefully:

Enter the file name: mbox.txt
There were 1797 subject lines in mbox.txt
Enter the file name: na na boo boo
File cannot be opened: na na boo boo

Protecting the open call is a good example of the proper use of try and except in a Python program. We use the term "Pythonic" when we are doing something the "Python way". We might say that the above example is the Pythonic way to open a file.

Once you become more skilled in Python, you can engage in repartee with other Python programmers to decide which of two equivalent solutions to a problem is "more Pythonic". The goal to be "more Pythonic" captures the notion that programming is part engineering and part art. We are not always interested in just making something work, we also want our solution to be elegant and to be appreciated as elegant by our peers.

Debugging

When you are reading and writing files, you might run into problems with whitespace. These errors can be hard to debug because spaces, tabs, and newlines are normally invisible:

Code Output
s = '1 2\t 3\n 4'
print(s)
1 2   3
4

The built - in function repr can help. It takes any object as an argument and returns a string representation of the object. For strings, it represents whitespace characters with backslash sequences:

Code Output
s = '1 2\t 3\n 4'
print(repr(s))
'1 2\t 3\n 4'

This can be helpful for debugging.

One other problem you might run into is that different systems use different characters to indicate the end of a line. Some systems use a newline, represented \n. Others use a return character, represented \r. Some use both. If you move files between different systems, these inconsistencies might cause problems.

For most systems, there are applications to convert from one format to another. You can find them (and read more about this issue) at https://en.wikipedia.org/wiki/Newline. Or, of course, you could write one yourself.