Lesson 7, Bit 5: Filenames, Paths, Error Handling, and Debugging
Filenames and Paths
Files are organized into directories (also called "folders"). Every running program has a "current directory", which is the default directory for most operations. For example, when you open a file for reading, Python looks for it in the current directory.
The os module provides functions for working with files and
directories ("os" stands for "operating system"). os.getcwd
returns the name of the current directory:
| Code | Output |
|---|---|
import os |
C:\Python |
cwd stands for "current working directory". The result in this
example is C:\Python.
A string like 'C:\Python' that identifies a file or directory is
called a path.
A simple filename, like mbox.txt is also considered a path, but
it is a relative path because it relates to the current directory. If the
current directory is C:\Python, the filename mbox.txt
would refer to C:\Python \mbox.txt.
A path that begins with / does not depend on the current
directory; it is called an absolute path. To find the
absolute path to a file, you can use os.path.abspath:
| Code | Output |
|---|---|
import os |
C:\Python\mbox.txt |
os.path provides other functions for working with filenames and
paths. For example, os.path.exists checks whether a file or
directory exists:
| Code | Result |
|---|---|
import os |
True |
If it exists, os.path.isdir checks whether it's a directory:
| Code | Result |
|---|---|
os.path.isdir('mbox.txt') |
False |
os.path.isdir('C:\\Python') |
True |
A quick note: notice that the backslash is listed twice in the string
'C:\\Python' above. Why is that? Well, remember that a
backslash is the escape sequence for creating special characters like newlines
\n and tabs \t. When Python sees a backslash, it
assumes an escape sequence is coming. As a result, you need to use two
backslashes for a backslash to display: one is the escape sequence and the other
is the character to be displayed.
Similarly, os.path.isfile checks whether it's a file.
| Code | Result |
|---|---|
os.path.isfile('words.txt') |
True |
os.path.isfile('C:\\Python') |
False |
os.listdir returns a list of the files (and other directories)
in the given directory:
| Code | Result |
|---|---|
os.listdir('C:\\Python') |
['append.txt', 'DLLs', 'Doc', 'examples.py', 'include',
'labs.py', 'Lib', 'libs', 'LICENSE.txt', 'mbox.txt', 'NEWS.txt',
'output.txt', 'python.exe', 'python3.dll', 'python35.dll', 'pythonw.exe',
'README.txt', 'Scripts', 'tcl', 'Tools', 'vcruntime140.dll',
'words.txt'] |
To demonstrate these functions, the following example "walks" through a directory, prints the names of all the files, and calls itself recursively on all the directories.
| Line | Code | Notes |
|---|---|---|
1 |
import os |
Import in the |
2 |
def walk(dirname): |
Define the |
3 |
for name in
os.listdir(dirname): |
The method |
4 |
path =
os.path.join(dirname, name) |
|
5 |
if
os.path.isfile(path): |
We check to see whether |
6 |
print(path) |
If it is a file, then we display the full path. |
7 |
else: |
Otherwise… |
8 |
walk(path) |
We run this same function for that directory. |
Here is the output to this program when we use 'C:\\Python' as
dirname:
C:\Python\append.txt
C:\Python\DLLs\py.ico
C:\Python\DLLs\pyc.ico
C:\Python\DLLs\pyexpat.pyd
C:\Python\DLLs\select.pyd
C:\Python\DLLs\sqlite3.dll
(about 1,200 more lines)Catching Exceptions with Files
A lot of things can go wrong when you try to read and write files. If you try
to open a file that doesn't exist, you get an IOError:
| Code | Result |
|---|---|
fin = open('bad_file') |
IOError: [Errno 2] No such file or directory:
'bad_file' |
If you don't have permission to access a file:
| Code | Result |
|---|---|
fout = open('/etc/passwd', 'w') |
PermissionError: [Errno 13] Permission denied:
'/etc/passwd' |
And if you try to open a directory for reading, you get
| Code | Result |
|---|---|
fin = open('/home') |
IsADirectoryError: [Errno 21] Is a directory:
'/home' |
To avoid these errors, you could use functions like
os.path.exists and os.path.isfile, but it would take a
lot of time and code to check all the possibilities (if "Errno 21"
is any indication, there are at least 21 things that can go wrong).
It is better to go ahead and try - and deal with problems if they happen -
which is exactly what the try statement does. Here is how you
might use it with opening a file:
try:
fin = open('bad_file')
except:
print('Something went wrong.')
Python starts by executing the try clause. If all goes well, it
skips the except clause and proceeds. If an exception occurs, it
jumps out of the try clause and runs the except
clause.
Handling an exception with a try statement is called catching an
exception. In this example, the except clause prints an error
message that is not very helpful. In general, catching an exception gives you a
chance to fix the problem, or try again, or at least end the program
gracefully.
Going back to our example, we need to assume that the open call might fail and add recovery code when the open fails as follows:
file_name =
input('Enter the file name: ')
try:
fin = open(file_name)
except:
print("File cannot be opened:", file_name)
else:
count = 0
for line in fin:
line = line.strip()
if not
line.startswith('Subject:') :
continue
else:
count +=
1
print("There were", count, "subject
lines in", file_name)
Now when our user (or QA team) types in silliness or bad file names, we "catch" them and recover gracefully:
Enter the file name:
mbox.txt
There were 1797 subject lines in
mbox.txtEnter the file name: na na boo boo
File cannot be opened: na na boo boo
Protecting the open call is a good example of the proper use of
try and except in a Python program. We use the term
"Pythonic" when we are doing something the "Python way". We might say that the
above example is the Pythonic way to open a file.
Once you become more skilled in Python, you can engage in repartee with other Python programmers to decide which of two equivalent solutions to a problem is "more Pythonic". The goal to be "more Pythonic" captures the notion that programming is part engineering and part art. We are not always interested in just making something work, we also want our solution to be elegant and to be appreciated as elegant by our peers.
Debugging
When you are reading and writing files, you might run into problems with whitespace. These errors can be hard to debug because spaces, tabs, and newlines are normally invisible:
| Code | Output |
|---|---|
s = '1 2\t 3\n 4' |
1 2 3 |
The built - in function repr can help. It takes any object as an
argument and returns a string representation of the object. For strings, it
represents whitespace characters with backslash sequences:
| Code | Output |
|---|---|
s = '1 2\t 3\n 4' |
'1 2\t 3\n 4' |
This can be helpful for debugging.
One other problem you might run into is that different systems use different
characters to indicate the end of a line. Some systems use a newline,
represented \n. Others use a return character, represented
\r. Some use both. If you move files between different systems,
these inconsistencies might cause problems.
For most systems, there are applications to convert from one format to another. You can find them (and read more about this issue) at https://en.wikipedia.org/wiki/Newline. Or, of course, you could write one yourself.
