How to Pickle-A Pickling and Unpickling Tutorial

Python offers three distinct modules in the standard library that allow you to serialize and deserialize objects. They are The marshal module, The json module, The pickle module. In this tutorial, we are going to explain more about the Python pickle module and unpickle module.

This tutorial helps you learn How to pickle with ease along with examples for a better understanding. Also, gain extra knowledge on pickling and unpickling such as pickling files, types of pickling and unpickling, exceptions, functions, advantages, disadvantages, etc.

The Python pickle & unpickle Tutorial contains the following stuff: 

Python Pickle Module | Object Serialization in Python

It executes binary protocols for serializing and de-serializing a Python object structure. The Python object hierarchy is turned into a byte stream is called the process of “Pickling”. The inverse operation is “unpickling”, where a byte stream (from a binary file or bytes-like object) is turned back into an object hierarchy. Pickling (and unpickling) is also referred to as “serialization”, “marshalling,” or “flattening”. Yet, to evade confusion, the words utilized here are “pickling” and “unpickling”.

Prerequisites

Knowledge of  the following is required:

  1. Python 3
  2. Basic Python data structures like a dictionary
  3. File operations in Python

Introduction About pickle in Python

Literally, the term pickle means storing something in a saline solution. Only here, instead of vegetables its objects. Not everything in life can be seen as 0s and 1s (gosh! philosophy), but pickling helps us achieve that since it converts any kind of complex data to 0s and 1s (byte streams). The resulting byte stream can also be converted back into Python objects by a process known as Unpickling.

Read More: 

Why Pickle?

As we are dealing with binary, the data is not written but dumped and furthermore, the data is not read, it is loaded. For instance, when you play a game like ‘Dave’ and you reach a certain level, you would want to save it right? As you know there are various attributes to this game like, health, gems collected, etc.

So when you save your game, say at level 7 when you have one heart for health and thirty hundred points, an object is created from a class Dave with these values. When you click the ‘Save’ button, this object is serialized and saved or in other words pickled. Needless to say, when you restore a saved game, you will be loading data from its pickled state thus unpickling it.

The real-world uses of Pickling and Unpickling are widespread as they allow you to easily send data from one server to another, and store it in a file or database.

WARNING: Never unpickle data received from an untrusted source as this may pose some serious security risks. The Pickle module is not capable of knowing or raising errors while pickling malicious data.

Pickling and Unpickling can be used only if the corresponding module Pickle is imported. You can do this by using the following command:

import pickle

Pickle at Work

Now let’s see a simple example of how to pickle a dictionary.

import pickle
emp = {1:"A",2:"B",3:"C",4:"D",5:"E"}
pickling_on = open("Emp.pickle","wb")
pickle.dump(emp, pickling_on)
pickling_on.close()

Note the usage of “wb” instead of “w” as all the operations are done using bytes. At this point, you can go and open the Emp.pickle file in the current working directory using a Notepad and see how the pickled data looks.

So, now that the data has been pickled, let’s work on how to unpickle this dictionary.

pickle_off = open("Emp.pickle","rb")
emp = pickle.load(pickle_off)
print(emp)

Now you will get the employees dictionary as we initialized earlier. Note the usage of “rb” instead of “r” as we are reading bytes. This is a very basic example, be sure to try more on your own.

If you want to get a byte string containing the pickled data instead of a pickled representation of obj, then you need to use dumps. Similarly to read pickled representation of objects from byte streams you should use loads.

Understanding Python Pickling with Example

import pickle
  
def storeData():
    # initializing data to be stored in db
    Omkar = {'key' : 'Omkar', 'name' : 'Omkar Pathak',
    'age' : 21, 'pay' : 40000}
    Jagdish = {'key' : 'Jagdish', 'name' : 'Jagdish Pathak',
    'age' : 50, 'pay' : 50000}
  
    # database
    db = {}
    db['Omkar'] = Omkar
    db['Jagdish'] = Jagdish
      
    # Its important to use binary mode
    dbfile = open('examplePickle', 'ab')
      
    # source, destination
    pickle.dump(db, dbfile)                     
    dbfile.close()
  
def loadData():
    # for reading also binary mode is important
    dbfile = open('examplePickle', 'rb')     
    db = pickle.load(dbfile)
    for keys in db:
        print(keys, '=>', db[keys])
    dbfile.close()
  
if __name__ == '__main__':
    storeData()
    loadData()

Output:

omkarpathak-Inspiron-3542:~/Documents/Python-Programs$ python P60_PickleModule.py
Omkar => {'age': 21, 'name': 'Omkar Pathak', 'key': 'Omkar', 'pay': 40000}
Jagdish => {'age': 50, 'name': 'Jagdish Pathak', 'key': 'Jagdish', 'pay': 50000}

Data Stream Format | Protocol Formats of the Python pickle Module

The data stream format is referred to as the protocol which specifies the output format of the pickled data. There are several protocol versions that are available. You must be aware of the protocol version to avoid compatibility issues.

  • Protocol version 0 – the original text-based format that is backward compatible with earlier versions of Python.
  • Protocol version 1 –  an old binary format that is also compatible with earlier versions of Python.
  • Protocol version 2 – introduced in Python 2.3 and provides efficient picking of classes and instances,
  • Protocol version 3 – introduced in Python 3.0 but it is not backward compatible.
  • Protocol version 4 – added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations.

Note that the protocol version is saved as a part of the pickle data format. However, to unpickle data in a specific protocol, there are provisions to specify it while using the dump() command.

To know the protocol used, use the following command after importing the pickle library. This will return the highest protocol being used.

pickle.HIGHEST_PROTOCOL

Python Pickling Exceptions

Some of the common exceptions to look out for:

  1. Pickle.PicklingError: This exception is raised when you are trying to pickle an object that doesn’t support pickling.
  2. Pickle.UnpicklingError: This exception is raised when a file contains corrupted data.
  3. EOFError: This exception is raised when the end of the file is detected.

Functions of Pickle Module

  1. pickle.dump(obj, file, protocol=None, *, fix_imports=True, buffer_callback=None): Return the pickled representation of the object obj to the open file object ‘file’. This is similar to Pickler(file, protocol).dump(obj).
  2. pickle.dumps(obj, protocol=None, *, fix_imports=True, buffer_callback=None): Return the pickled representation of the object obj as anbytesobject, rather than writing it to a file.
  3. pickle.load(file, *, fix_imports=True, encoding=”ASCII”, errors=”strict”, buffers=None): Read the pickled representation of an object from the open file object file and return the reconstituted object hierarchy defined therein. This is similar to Unpickler(file).load().
  4. pickle.loads(data, /, *, fix_imports=True, encoding=”ASCII”, errors=”strict”, buffers=None): Return the reconstituted object hierarchy of the pickled representation data of an object. data must be a bytes-like object.

Pickling Files

In order to use pickle, initiate by importing it in python.

import pickle

For example, here in this tutorial, we will be pickling a simple dictionary. A list of key : value elements is a dictionary and you will save it to a file and then load it again. Also, the dictionary can be declared as follows:

dogs_dict = { 'Ozzy': 3, 'Filou': 8, 'Luna': 5, 'Skippy': 10, 'Barco': 12, 'Balou': 9, 'Laika': 16 }

Now, we have to pickle this dictionary, for that first require to define the file name you will write it to, in this case, dogs is the file name. Remember that the file does not have an extension.

To open the file, simply use the open() function. The name of the file will be the first argument. The second argument is'wb', where w means that you’ll be scripting to the file, and b means binary mode. This implies that the date will be written in the byte object form. a TypeError: must be str, not bytes will be returned, if you forget the b. Sometimes, you may come across a slightly different notation; w+b, but not to worry, it offers the same functionality.

filename = 'dogs'
outfile = open(filename,'wb')

After opening the file for writing, you can usepickle.dump()which accepts two arguments: the object you want to pickle and the file to which the object has to be saved. In this instance, the former will be dogs_dict, while the latter will be outfile.

Remember to useclose()! to close the file.

pickle.dump(dogs_dict,outfile)
outfile.close()

Inside the Python pickle Module

Basically, the pythonpickle module includes four methods:

  1. pickle.dump(obj, file, protocol=None, *, fix_imports=True, buffer_callback=None)
  2. pickle.dumps(obj, protocol=None, *, fix_imports=True, buffer_callback=None)
  3. pickle.load(file, *, fix_imports=True, encoding="ASCII", errors="strict", buffers=None)
  4. pickle.loads(bytes_object, *, fix_imports=True, encoding="ASCII", errors="strict", buffers=None)

Advantages of Using Python Pickling

  1. Helps in saving complicated data.
  2. Quite easy to use, doesn’t require several lines of code, and hence not bulky.
  3. Saved data is not so readable hence provides some data security.

Disadvantages

  1. Non-Python programs may not be able to reconstruct pickled Python objects.
  2. Security risks in unpickling data from malicious sources.

Picklable and Unpicklable Types

After learning and understanding about the python pickle module from the above sections. You know that it can serialize a lot of types compare to the JSON module. But, everything is not picklable. Let’s see the unpicklable objects list that consists of database connections, opened network sockets, running threads, and others.

If you discover yourself encountered with an unpicklable object, then there are a few things that you can perform. The first choice is to use a third-party library such as dill.

The dill module enlarges the pickle capabilities. As per the official documentation, it grants you serialize a few common types such as functions with yields, nested functions, lambdas, and many more.

pickle dump Example

# Save a dictionary into a pickle file.
import pickle

   favorite_color = { "lion": "yellow", "kitty": "red" }
  
   pickle.dump( favorite_color, open( "save.p", "wb" ) )

Conclusion

Pickling is acknowledged as an advanced topic so keep practicing and learning to get a hang of it. Be sure to have a look at these interesting topics related to Pickling – Pickler, Unpickler, CPickle. Happy Pythoning!

Leave a Reply

Your email address will not be published. Required fields are marked *