Lesson 7, Bit 3: Writing and Appending Data

Writing Files

To write a file, you have to open it with mode 'w' as a second parameter:

Code Output
fout = open('output.txt', 'w')

print(fout)
<_io.TextIOWrapper name='output.txt' mode='w' encoding='cp1252'>

If the file already exists, opening it in write mode clears out the old data and starts fresh, so be careful!

If the file doesn't exist, a new one is created.

The write method of the file handle object puts data into the file. The write method accepts a string as its input:

line1 = "This here's the wattle,\n"
fout.write(line1)

Again, the file object keeps track of where it is, so if you call write again, it adds the new data to the end.

We must make sure to manage the ends of lines as we write to the file by explicitly inserting the newline character when we want to end a line. The print statement automatically appends a newline, but the write method does not add the newline automatically.

line2 = 'the emblem of our land.\n'
fout.write(line2)

When you are done writing, you have to close the file to make sure that the last bit of data is physically written to the disk so it will not be lost if the power goes off.

fout.close()

It is good programming practice to always close your files when you are finished with them. When we are writing files, we certainly want to explicitly close the files so as to leave nothing to chance.

The writelines Method

If you have a list of strings, you can write them all into the file in one fell swoop using the writelines method.

Code Output
my_list = ['Line 1', 'Line 2', 'Line 3']

fout = open('output.txt', 'w')

fout.writelines(my_list)

fout.close()
None

When we re-open output.txt in read mode and display the lines, this is what we get:

Code Output
fout = open('output.txt', 'r')

for line in fout:
    print(line)
Line 1Line 2Line 3

Notice that there are no line breaks.  Something to keep in mind when using writelines: your list needs to have line breaks included if you want to separate your strings with line breaks.

Appending to Files

What if you want to merely append data to an existing file?  You can use the append mode of 'a':

Code Output
fout = open('append.txt', 'a')

print(fout)
<_io.TextIOWrapper name='append.txt' mode='a' encoding='cp1252'>

Just like with the write mode, append will create the file if it does not already exist. Unlike write mode, append will leave all of the existing data intact, and will merely add data to the end of the file.

We add to the file using the write method:

line = "I am a lumberjack\n"
fout.write(line)

And we close the file the same way:

fout.close()

Later when we reopen the file and add another line to it, the original line will still be there:

Code Output
fout = open('append.txt', 'a')

fout.write("and I'm okay")

fout.readlines()
I am a lumberjack
and I'm okay

Available Modes

We have seen the modes 'r', 'w', and 'a'.  You can add a plus sign + to the mode to increase its functionality. It can get confusing, so here is a chart which might help:

Function r r+ w w+ a a+
read file X X   X   X
write to file   X X X X X
create file if it doesn't exist     X X X X
truncate file (delete contents)     X X    
start at beginning of file X X X X    
start at end of file         X X

By default, open assumes that you are working with a text document.  If you wish to work with a binary file, then you can pass the mode 'b' into any of the above modes.  You can also explicitly specify a text file by using the mode 't':

Mode Description
'r+b'

Read and write a binary file.

'ab'

Append to a binary file

'w+t'

Read and write a text file, truncating the file first and auto-creating it if it doesn't exist.

You should use the mode which provides the functionality you need, without providing too much functionality.  If you only need to read a file, then don't use a mode which allows you to read and write to the file.

Dictionaries, Lists, and Tuples: File Example

One of the common uses of a dictionary is to count the occurrence of words in a file with some written text. Let's start with a very simple file of words taken from the text of Act 2, Scene 2 of Romeo and Juliet.  Here is a small sample from romeo-full.txt:

Romeo and Juliet

Act 2, Scene 2

SCENE II. Capulet's orchard.

Enter ROMEO
ROMEO

He jests at scars that never felt a wound.
JULIET appears above at a window

But, soft! what light through yonder window breaks?
It is the east, and Juliet is the sun.
Arise, fair sun, and kill the envious moon,
Who is already sick and pale with grief,
That thou her maid art far more fair than she:

We will write a Python program to read through the lines of the file, break each line into a list of words, and then loop through each of the words in the line and count each word using a dictionary.

You will see that we have two for loops. The outer loop is reading the lines of the file and the inner loop is iterating through each of the words on that particular line. This is an example of a pattern called nested loops because one of the loops is the outer loop and the other loop is the inner loop.

Because the inner loop executes all of its iterations each time the outer loop makes a single iteration, we think of the inner loop as iterating "more quickly" and the outer loop as iterating more slowly.

The combination of the two nested loops ensures that we will count every word on every line of the input file.

fname = input('Enter the file name: ')

try:
    fhand = open(fname)

except:
    print('File cannot be opened:', fname)
    exit()

counts = dict()

for line in fhand:
    words = line.split()

    for word in words:
        if word not in counts:
            counts[word] = 1

        else:
            counts[word] += 1

print(counts)

When we run the program, we see a raw dump of all of the counts in unsorted hash order.

Enter the file name: romeo-full.txt
{'A': 2, 'name.': 1, 'sweet;': 1, 'foot,': 1, 'region': 1, 'that': 23, 'Sweet': 1, 'entreat': 1, 'think': 2, 'compliment!': 1, 'night.': 1, 'JULIET,': 2, 'walls': 1, 'Echo': 1, 'Exit': 2, 'come': 2, 'wanting': 1, 'Sweet,': 2, 'else,': 1, 'orchard': 1, 'sea,': 2, 'who': 3, 'one': 2, 'place?': 1, 'little': 1, 'May': 1, 'sight;': 1, 'was': 1, 'cheek!': 1, 'blessed': 2, ...}

It is a bit inconvenient to look through the dictionary to find the most common words and their counts, so we need to add some more Python code to get us the output that will be more helpful.  We will use list sorting to make this happen by adding this code:

# Sort the dictionary by value
lst = list()

for key, val in counts.items():
    lst.append((val, key))

lst.sort(reverse=True)

for key, val in lst[:10] :
    print(key, val)

The first part of the program which reads the file and computes the dictionary that maps each word to the count of words in the document is unchanged. But instead of simply printing out counts and ending the program, we construct a list of (val, key) tuples and then sort the list in reverse order.

Since the value is first, it will be used for the comparisons. If there is more than one tuple with the same value, it will look at the second element (the key), so tuples where the value is the same will be further sorted by the alphabetical order of the key.

At the end we write a nice for loop which does a multiple assignment iteration and prints out the ten most common words by iterating through a slice of the list (lst[:10]).

So now the output finally looks like what we want for our word frequency analysis.

60 I
31 to
30 the
29 thou
28 JULIET
27 ROMEO
23 that
22 my
22 and
22 a

The fact that this complex data parsing and analysis can be done with an easy - to - understand 19 - line Python program is one reason why Python is a good choice as a language for exploring information.