Python Behind the Scenes (Part 1)
Python is interpreted, Everything in Python is an object, Python is dynamically typed.
In our Python programming journey, we come across these statements quite a few times. In this 2-part blog series, I will be touching on these points to try and get behind the scenes of Python.
Coming from a background of C programming, Python’s way of doing things, so to speak, has always fascinated me. As I was coding in Python, I would always try to imagine how things are working under the hood. In Part 1, I will do my best to present a simple behind-the-scenes picture of how Python works to execute your program.
Allow me to progress from C to Python in this blog. I will try to keep it simple.
The following is a short depiction of how a typical program looks like when loaded in memory.
Pic 1 (courtesy - https://microcontrollerslab.com/wp-content/uploads/2019/08/Stack-and-Heap-Memory-allocation.jpg)
· Text/code segment contains program instructions. Essentially the Instruction Register on your processor at any time gets loaded with instructions from this segment
· Initialized data segment contains initialized global and static variables
· Uninitialized data segment(bss) contains uninitialized global variables
· Heap – an area from where memory can be dynamically allocated. Addresses from the heap memory can be stored in pointers/reference variables and passed around the program as and when required (note this point as this plays a role in how memory is managed in Python, more on this in Part 2)
Stack – is where the current function’s variables (local variables, parameters) and some information from the calling function like the return address is stored. At any given point in time, the stack pointer register on the processor points to the top of the stack.Instructions use stack pointer and offsets to access a particular stack (local) variable
A C program’s memory layout directly translates to the above shown picture because of compilation->linking->loading.
C is a compiled language.
Essentially a C compiler produces an object code file (.o on Unix and .obj on Windows) which are platform specific (dependent on underlying OS and architecture).
After linking with any libraries either statically or dynamically at load time (Pic 2), what we end up with is a program in memory, with ready to execute machine instructions understandable by the underlying platform (as in Pic 1)
Python is both a compiled and interpreted language.
So, what really occurs when we run a python program as follows – python app.py? As we can see, there is no intermediate compilation or build stage. What we are doing is directly invoking the Python interpreter with the python command and loading and executing it. In other words, the memory layout discussed in the previous section is that of the interpreter program.
The most widely used as well as the original Python interpreter is CPython, written in C! This is what you would be using if you installed from python.org.
Note – from here on the terms ‘interpreter’, ‘Python Virtual Machine’ and ‘PVM’ are interchangeably used. The implementation details discussed are specific to CPython
Wait, so whatever happened to our Python program app.py?!
Let’s dive in.
After the interpreter loads and performs initializations (loading and configuring the built-in Python modules, any imports we may have done in our program etc), it then compiles our Python program app.py to a platform-independent bytecode. These are not actual machine instructions. The actual machine instructions executing are only those of the interpreter as of now.
Bytecode are instructions understood by the interpreter, sort of how x86 instructions are understood by the x86 processor or the PPC instructions are decoded by the PPC processor, only this is not hardwired, instead programmed into the interpreter. This is the reason we also refer to the interpreter as the Python Virtual Machine or PVM (similar to the Java Virtual Machine that executes Java code). For the eager, you could check out what this looks like, just type python -m dis app.py on your terminal.
Here is an example –
def sum(x,y): return x+y z = sum(10, 20)
python -m dis test.py yields -
20 0 LOAD_CONST 0 (<code object sum at 0x000002631844C500, file ".\test.py", line 20>)
2 LOAD_CONST 1 ('sum')
4 MAKE_FUNCTION 0
6 STORE_NAME 0 (sum)
23 8 LOAD_NAME 0 (sum)
10 LOAD_CONST 2 (10)
12 LOAD_CONST 3 (20)
14 CALL_FUNCTION 2
16 STORE_NAME 1 (z)
18 LOAD_CONST 4 (None)
Disassembly of <code object sum at 0x000002631844C500, file ".\test.py", line 20>:
21 0 LOAD_FAST 0 (x)
2 LOAD_FAST 1 (y)
So far, we have the bytecode which cannot be directly executed. Next is the interpretation stage.
The interpreter/PVM/CPython (in the context of this blog), executes the bytecode instructions, similar to how the real machine’s instructions are decoded and executed by the underlying processor. This is the reason, the interpreter is called a virtual machine. The Python virtual machine, the Java virtual machine, or even .Net framework’s Common Language Runtime are all process virtual machines. Simply defined, they provide a virtual execution environment for the original source code to execute.
Two main features of how CPython (the Python interpreter) executes our Python programs are listed below –
1. It creates code objects for every source code block in our Python program – module or function. This perfectly ties into Everything is an object in Python. This is why, even a funtion we write in Python, comes with attributes and even callable methods, just like objects. CPython maintains all meta information in its data structures for each object, including those for code and functions.
2. Just like how functions are executed on the stack in actual programs/processes running in memory, CPython simulates a call stack or a stack frame for every Python function/code block getting executed. This way it can keep track of things like what bytecode instruction to return to once this function returns.
Behind the scenes of a Python program’s execution is another very intelligent program, the Python interpreter, doing two things – implementing the semantics of the Python language and providing a virtual execution environment for the original Python program.
I hope, this helped provide a better picture of what is happening under the hood when you run your Python program 😊
In Part 2, I will be touching on the very interesting topic of how Python does its memory management.