Memory management in python

Yes, you heard it right “memory management in python”. You must be thinking: why do I need to manage the memory in any high-level language like python? The obvious answer is “No”, you don’t have to do memory management in python but you can take care of how variables and objects are stored which can make your program consume less memory and work more efficiently. I.e: how can we make a class consume less memory, how to make sure that garbage collector clears the unrequired memory as soon as possible. Before we start, having a little knowledge of what is memory bloat/leak can give a better insight into why this topic is really important. Take a look at this great article about memory bloat on rails, and what problems it can cause. To begin with, we need to understand how python objects are stored.

How Python objects are stored in memory

Before we dive deep into it, we need to keep in mind that python is an Object-Oriented Language which means everything in python is an object. If you are coming from a background where you have worked in C/C++ or in Java, you know that in C/C++ we must first declare the variable which reserves the space in the memory according to the data type specified and then the value is stored in it. Python really does not have anything like variables; instead it has “names”. A name is anything by which we refer to a value/object. An object/value can have lots of names.

>>> X = 56
>>> Y = 56

Here X and Y both are names which refer to Integer objects 56.

So, what happens when we created these objects? 

Any object in python is always derived from “PyObject” (base struct for everything in CPython) which stores the bare minimal information. Let me try to put it in the simplest possible way: it stores Object Type, reference count, and value. Reference count means how many names are pointing to the exact same object. So, for the above example, things look like as shown in the diagram below.

Type

Integer

Refcount

2

Value

56

As both X and Y are referring to the same value 56, the reference count for Integer object 56 is 2. This reference counting is one of the major helpers for the python garbage collector. When the reference count of any object becomes 0, that object is immediately removed from the memory by the Python garbage collector.  

Now let is consider one more complex example to understand references.

>>> A = [1, 2, 3]
>>> B = A

Now B is an alias for the object A. There are a couple of ways to know if names are referring to the same object in the memory or not. Either we can use inbuilt id() function to see if the id of the objects is the same or not, or the second way is to use the “is” operator which will return a Boolean. In the above case, we should get the same id for A and B using id() function, and the “is” operator should return True. If we relate id() function to C then ids are nothing but the memory address of the object.

>>> id(A), id(B)
(2176039028552, 2176039028552)
>>> A is B
True

Now this means: if we change anything with A, that change will reflect in B too.

>>> A[1] = 5
>>> B
[1, 5, 3]

Now let’s try this again with Integers.

>>> X = 56
>>> Y = 56
>>> X is Y
True
>>> X = 89
>>> X, Y
(89, 56)

Wait!!!! What happened here? Isn’t the value of Y supposed to change to 89? Well, there seems to be something more happening under the hood than we expected. This phenomenon is caused because of the integer caching policy of the Python. Python caches the integer values starting from -5 to 256. In the next blog we will see what’s exactly happening here and what are some of the exceptions.

Python object at lower level

If you are coming from a C/C++ background I have something extra for you which can give a deep insight into how python works. If you are using traditional python implementation, it is implemented in C and it is known as CPython. Now let’s see how PyObject is implemented in it.

Typedef struct _object {
    _PyObject_HEAD_EXTRA
    Py_ssize_t ob_refcnt;
    struct _typeobject *ob_type;
} PyObject;

This is straight from the source of CPython which can be found here. According to the developers of it

Objects are structures allocated on the heap.  Special rules apply to the use of objects to ensure they are properly garbage-collected. Objects are never allocated statically or on the stack; they must be accessed through special macros and functions only.

This also tells us why python is slower than many languages. As any object allocation is almost always done on the heap at runtime only, it significantly increases the runtime overhead.

Let’s see what this structure actually does. _PyObject_HEAD_EXTRA is a macro which defines a pointer to support doubly-linked list of all live heap objects. Which means this pointer will actually point to the objects where we have stored the actual object.

Py_ssize_t ob_refcnt This variable is pretty self-explanatory. This is used to store the reference count of a variable. 

Last and most important part of this struct is struct _typeobject *ob_type. In CPython, even types are stored in the form of an object. There is an explanation given by the developers in the source file which reminds me of Inception.

An object has a ‘type’ that determines what it represents and what kind of data it contains.  An object’s type is fixed when it is created. Types themselves are represented as objects; an object contains a pointer to the corresponding type object.  The type itself has a type pointer pointing to the object representing the type ‘type’, which contains a pointer to itself!).

If I put it in simple words, anything in python is derived from the PyObject which means any object defining type is also a Pyobject. As we have seen above PyObject has to have a pointer to point to a type, so where should a type object point to? And the most obvious answer is to itself. Let’s try to visualise this scenario. (only relevant information is shown below)

To access any stored object when we have a pointer of PyObject we need to cast it to a longer(in inheritance we would refer to this as child class or derived class) type which we can get from the stored type pointer. Once we cast this pointer to the new object we can access the object data and information.

References:

https://docs.python.org/3/c-api/memory.html

https://medium.com/@tyastropheus/tricky-python-i-memory-management-for-mutable-immutable-objects-21507d1e5b95

https://www.youtube.com/watch?v=F6u5rhUQ6dU

https://wsvincent.com/python-wat-integer-cache/