Lesson 7, Bit 4: Databases and Pickles
Databases
A database is a file that is organized for storing data. Many databases are organized like a dictionary in the sense that they map from keys to values. The biggest difference between a database and a dictionary is that the database is on disk (or other permanent storage), so it persists after the program ends.
The module dbm provides an interface for creating and updating
database files. As an example, I'll create a database that contains captions for
image files.
Opening a database is similar to opening other files:
| Code | Notes |
|---|---|
import dbm |
The mode |
When you create a new item, dbm updates the database
file.
db['cleese.png'] = 'Photo of John Cleese.'When you access one of the items, dbm reads the file:
| Code | Result |
|---|---|
db['cleese.png'] |
b'Photo of John Cleese.' |
The result is a bytes object, which is why it
begins with b. A bytes object is similar to a string in many ways.
When you get farther into Python, the difference becomes important, but for now
we can ignore it.
If you make another assignment to an existing key, dbm replaces
the old value:
| Code | Result |
|---|---|
db['cleese.png'] = 'A silly walk' |
|
db['cleese.png'] |
b'A silly walk' |
Some dictionary methods, like keys and items, don't
work with database objects. But iteration with a for loop
works:
for key in db:
print(key,
db[key])As with other files, you should close the database when you are done:
db.close()Pickling
A limitation of dbm is that the keys and values have to be strings or bytes. If you try to use any other type, you get an error.
The pickle module can help. It’s part of the Python standard library, so it’s always available. It’s fast; the bulk of it is written in C, like the Python interpreter itself. It can store arbitrarily complex Python data structures.
What can the pickle module store?
- All the native datatypes that Python supports: booleans, integers,
floating point numbers, strings, bytes objects, byte arrays, and
None. - Lists, tuples, dictionaries, and sets containing any combination of native datatypes.
- Lists, tuples, dictionaries, and sets containing any combination of lists, tuples, dictionaries, and sets containing any combination of native datatypes (and so on, to the maximum nesting level that Python supports).
- Functions (with caveats). First, we will use the dumps method to show you what pickling effectively does to an object.
First, we will use the dumps method to show you what pickling
effectively does to an object.
pickle.dumps takes an object as a parameter and returns a string
representation (dumps is short for "dump string"):
| Code | Result |
|---|---|
import pickle |
b'\x80\x03]q\x00(K\x01K\x02K\x03e.' |
So what just happened?
The pickle module takes a Python data structure and
serializes the data structure using a data format
called "the pickle protocol."
The pickle protocol is Python-specific; there is no guarantee
of cross-language compatibility. You probably couldn't take the
shoplistfile file you just created and do anything useful with it
in Perl, PHP, Java, or any other language.
Not every Python data structure can be serialized by the pickle
module. The pickle protocol has changed several times as new data
types have been added to the Python language, but there are still
limitations.
As a result of these changes, there is no guarantee of compatibility between different versions of Python itself. Newer versions of Python support the older serialization formats, but older versions of Python do not support newer formats (since they don't support the newer data types).
Unless you specify otherwise, the functions in the pickle
module will use the latest version of the pickle protocol. This
ensures that you have maximum flexibility in the types of data you can
serialize, but it also means that the resulting file will not be readable by
older versions of Python that do not support the latest version of the
pickle protocol.
The latest version of the pickle protocol is a binary format.
Be sure to open your pickle files in binary mode, or the data will
get corrupted during writing.
Pickles and Saving to a File
So now that we used the dumps method to see what pickling is
actually doing to our objects, let's see how we can use the dump
method (note: no "s" in dump) to load it into a file.
The dump method accepts a minimum of two arguments: the object
that we are pickling and the file in which we are storing the data, like
this:
pickle.dump(object, file)Here is an example where we take a shopping list and save it for later in a file.
| Line | Code | Notes |
|---|---|---|
1 |
import pickle |
Import in the |
2 |
shoplistfile = 'shoplist.data' |
The name of the file where we will store the object |
3 |
shoplist = ['apple', 'mango', 'carrot'] |
The list of things to buy. Notice that this is a Python
|
4 |
fout = open(shoplistfile, 'wb') |
Open the file in write mode. Notice the |
5 |
pickle.dump(shoplist, fout) |
Dump the object to a file |
6 |
fout.close() |
Close the file. |
Hooray! We successfully pickled the list.
Reading Pickle Data from a File
Now that we have a pickled file, we want to be able to get that data
back. We can use the load method to accomplish this.
First we need to open our file in binary mode. Then we can use the
load method to "unpickle" the data and make it usable again.
The load method accepts the file handler as its argument.
Let's continue with our shopping list example.
| Line | Code | Notes |
|---|---|---|
7 |
fin = open(shoplistfile, 'rb') |
Read back from the storage |
8 |
storedlist = pickle.load(fin) |
Load the object from the file |
9 |
print(storedlist) |
Display the list. |
Here is our output:
['apple', 'mango', 'carrot']
It's our original list! We can pick up where we left off and do whatever we want with it.
However, it needs to be noted that this is not identical to the original list – it is a different object. Here's an example where we pickle a list, then unpickle it.
| Code | Result |
|---|---|
import pickle |
b'\x80\x03]q\x00(K\x01K\x02K\x03e.' |
t2 = pickle.loads(s) |
[1, 2, 3] |
Although the new object has the same value as the old, it is not (in general) the same object:
| Code | Result |
|---|---|
t1 == t2 |
True |
t1 is t2 |
False |
In other words, pickling and then unpickling has the same effect as copying the object.
You can use pickle to store non-strings in a database. In fact,
this combination is so common that it has been encapsulated in a module called
shelve, although we don't have time to discuss it in this
course.
Multiple Pickles in one File
We can store multiple pickles in a single file as well. The caveat here is that you have to remember which order they went in, because it follows a "first-in / first-out" rule:
| Code | Result |
|---|---|
import pickle |
That saved each list as a pickle into one file. Now let's extract them:
| Code | Result |
|---|---|
fin = open('list.dat', 'rb') |
['apple', 'mango', 'carrot'] |
First in - first out. You have to keep track of it.