Lesson 7, Bit 3: Writing and Appending Data
Writing Files
To write a file, you have to open it with mode 'w' as a second
parameter:
| Code | Output |
|---|---|
fout = open('output.txt', 'w') |
<_io.TextIOWrapper name='output.txt' mode='w'
encoding='cp1252'> |
If the file already exists, opening it in write mode clears out
the old data and starts fresh, so be careful!
If the file doesn't exist, a new one is created.
The write method of the file handle object puts data into the file. The
write method accepts a string as its input:
line1 = "This
here's the wattle,\n"
fout.write(line1)
Again, the file object keeps track of where it is, so if you call
write again, it adds the new data to the end.
We must make sure to manage the ends of lines as we write to the file by
explicitly inserting the newline character when we want to end a line. The
print statement automatically appends a newline, but the
write method does not add the newline automatically.
line2
= 'the emblem of our land.\n'
fout.write(line2)When you are done writing, you have to close the file to make sure that the last bit of data is physically written to the disk so it will not be lost if the power goes off.
fout.close()
It is good programming practice to always close your files when you are finished with them. When we are writing files, we certainly want to explicitly close the files so as to leave nothing to chance.
The writelines Method
If you have a list of strings, you can write them all into the file in one
fell swoop using the writelines method.
| Code | Output |
|---|---|
my_list = ['Line 1', 'Line 2', 'Line 3'] |
None |
When we re-open output.txt in read mode and display the lines,
this is what we get:
| Code | Output |
|---|---|
fout = open('output.txt', 'r') |
Line 1Line 2Line 3 |
Notice that there are no line breaks. Something to keep in mind when
using writelines: your list needs to have line breaks included if
you want to separate your strings with line breaks.
Appending to Files
What if you want to merely append data to an existing file? You can use
the append mode of 'a':
| Code | Output |
|---|---|
fout = open('append.txt', 'a') |
<_io.TextIOWrapper name='append.txt' mode='a'
encoding='cp1252'> |
Just like with the write mode, append will create the file if it does not
already exist. Unlike write mode, append will leave
all of the existing data intact, and will merely add data to the end of the
file.
We add to the file using the write method:
line = "I am
a lumberjack\n"
fout.write(line)
And we close the file the same way:
fout.close()
Later when we reopen the file and add another line to it, the original line will still be there:
| Code | Output |
|---|---|
fout = open('append.txt', 'a') |
I am a lumberjack |
Available Modes
We have seen the modes 'r', 'w', and
'a'. You can add a plus sign + to the mode to
increase its functionality. It can get confusing, so here is a chart which might
help:
| Function | r | r+ | w | w+ | a | a+ |
|---|---|---|---|---|---|---|
| read file | X | X | X | X | ||
| write to file | X | X | X | X | X | |
| create file if it doesn't exist | X | X | X | X | ||
| truncate file (delete contents) | X | X | ||||
| start at beginning of file | X | X | X | X | ||
| start at end of file | X | X |
By default, open assumes that you are working with a text document. If
you wish to work with a binary file, then you can pass the mode 'b'
into any of the above modes. You can also explicitly specify a text file
by using the mode 't':
| Mode | Description |
|---|---|
'r+b' |
Read and write a binary file. |
|
Append to a binary file |
|
Read and write a text file, truncating the file first and auto-creating it if it doesn't exist. |
You should use the mode which provides the functionality you need, without providing too much functionality. If you only need to read a file, then don't use a mode which allows you to read and write to the file.
Dictionaries, Lists, and Tuples: File Example
One of the common uses of a dictionary is to count the occurrence of words in
a file with some written text. Let's start with a very simple file of words
taken from the text of Act 2, Scene 2 of Romeo and Juliet. Here is a small
sample from romeo-full.txt:
Romeo and Juliet
Act 2, Scene 2
SCENE II. Capulet's orchard.
Enter
ROMEO
ROMEO
He jests at scars that never felt a wound.
JULIET appears above at a window
But, soft! what light through
yonder window breaks?
It is the east, and Juliet is the sun.
Arise, fair sun, and kill the envious moon,
Who is already sick and
pale with grief,
That thou her maid art far more fair than she:
We will write a Python program to read through the lines of the file, break each line into a list of words, and then loop through each of the words in the line and count each word using a dictionary.
You will see that we have two for loops. The outer loop is reading the lines of the file and the inner loop is iterating through each of the words on that particular line. This is an example of a pattern called nested loops because one of the loops is the outer loop and the other loop is the inner loop.
Because the inner loop executes all of its iterations each time the outer loop makes a single iteration, we think of the inner loop as iterating "more quickly" and the outer loop as iterating more slowly.
The combination of the two nested loops ensures that we will count every word on every line of the input file.
fname = input('Enter the file name:
')
try:
fhand =
open(fname)
except:
print('File
cannot be opened:', fname)
exit()
counts = dict()
for line in fhand:
words = line.split()
for word in words:
if word not in counts:
counts[word]
= 1
else:
counts[word]
+= 1
print(counts)
When we run the program, we see a raw dump of all of the counts in unsorted hash order.
Enter the file name: romeo-full.txt
{'A': 2, 'name.': 1, 'sweet;': 1, 'foot,': 1, 'region': 1, 'that': 23,
'Sweet': 1, 'entreat': 1, 'think': 2, 'compliment!': 1, 'night.': 1, 'JULIET,':
2, 'walls': 1, 'Echo': 1, 'Exit': 2, 'come': 2, 'wanting': 1, 'Sweet,': 2,
'else,': 1, 'orchard': 1, 'sea,': 2, 'who': 3, 'one': 2, 'place?': 1, 'little':
1, 'May': 1, 'sight;': 1, 'was': 1, 'cheek!': 1, 'blessed': 2, ...}It is a bit inconvenient to look through the dictionary to find the most common words and their counts, so we need to add some more Python code to get us the output that will be more helpful. We will use list sorting to make this happen by adding this code:
# Sort the dictionary by value
lst = list()
for key, val in counts.items():
lst.append((val, key))
lst.sort(reverse=True)
for key, val in lst[:10] :
print(key, val)
The first part of the program which reads the file and computes the
dictionary that maps each word to the count of words in the document is
unchanged. But instead of simply printing out counts and ending the program, we
construct a list of (val, key) tuples and then sort the list in
reverse order.
Since the value is first, it will be used for the comparisons. If there is more than one tuple with the same value, it will look at the second element (the key), so tuples where the value is the same will be further sorted by the alphabetical order of the key.
At the end we write a nice for loop which does a multiple assignment
iteration and prints out the ten most common words by iterating through a slice
of the list (lst[:10]).
So now the output finally looks like what we want for our word frequency analysis.
60 I
31 to
30 the
29 thou
28 JULIET
27 ROMEO
23 that
22 my
22 and
22 a
The fact that this complex data parsing and analysis can be done with an easy - to - understand 19 - line Python program is one reason why Python is a good choice as a language for exploring information.