何謂『碼物件』耶?
Code Objects
Code objects are a low-level detail of the CPython implementation. Each one represents a chunk of executable code that hasn’t yet been bound into a function.
PyCodeObject
- The C structure of the objects used to describe code objects. The fields of this type are subject to change at any time.
- PyTypeObject
PyCode_Type
- This is an instance of
PyTypeObject
representing the Pythoncode
type.
- int
PyCode_GetNumFree
(PyCodeObject *co) - Return the number of free variables in co.
- PyCodeObject*
PyCode_New
(int argcount, int kwonlyargcount, int nlocals, int stacksize, int flags, PyObject *code,PyObject *consts, PyObject *names, PyObject *varnames, PyObject *freevars, PyObject *cellvars,PyObject *filename, PyObject *name, int firstlineno, PyObject *lnotab) - Return a new code object. If you need a dummy code object to create a frame, use
PyCode_NewEmpty()
instead. CallingPyCode_New()
directly can bind you to a precise Python version since the definition of the bytecode changes often.
- PyCodeObject*
PyCode_NewEmpty
(const char *filename, const char *funcname, int firstlineno) - Return a new empty code object with the specified filename, function name, and first line number. It is illegal to
exec()
oreval()
the resulting code object.
文本讀來,彷彿墬身迷霧裡!
單靠一點微光,物象難分明呦☻
29.12. inspect
— Inspect live objects
Source code: Lib/inspect.py
The inspect
module provides several useful functions to help get information about live objects such as modules, classes, methods, functions, tracebacks, frame objects, and code objects. For example, it can help you examine the contents of a class, retrieve the source code of a method, extract and format the argument list for a function, or get all the information you need to display a detailed traceback.
There are four main kinds of services provided by this module: type checking, getting source code, inspecting classes and functions, and examining the interpreter stack.
……
code | co_argcount | number of arguments (not including keyword only arguments, * or ** args) |
co_code | string of raw compiled bytecode | |
co_cellvars | tuple of names of cell variables (referenced by containing scopes) | |
co_consts | tuple of constants used in the bytecode | |
co_filename | name of file in which this code object was created | |
co_firstlineno | number of first line in Python source code | |
co_flags | bitmap of CO_* flags, read more here |
|
co_lnotab | encoded mapping of line numbers to bytecode indices | |
co_freevars | tuple of names of free variables (referenced via a function’s closure) | |
co_kwonlyargcount | number of keyword only arguments (not including ** arg) | |
co_name | name with which this code object was defined | |
co_names | tuple of names of local variables | |
co_nlocals | number of local variables | |
co_stacksize | virtual machine stack space required | |
co_varnames | tuple of names of arguments and local variables |
宛如登山無地圖!
誰知雲深是何處?
所以前行最好有嚮導也☆
dis
module allows disassembly of Python code into the individual instructions executed by the Python interpreter (usually cPython) for each line. Passing a module, function or other piece of code to the dis.dis
function will return a human-readable representation of the underlying, disassembled bytecode. This is useful for analyzing and hand-tuning tight loops or perform other kinds of necessary, fine-grained optimizations.Basic Usage
The main function you will interact with when wanting to disassemble Python code is dis.dis
. It takes either a function, method, class, module, code string, generator or byte sequence of raw bytecode and prints the disassembly of that code object to stdout
(if no explicit file
argument is specified). In the case of a class, it will disassemble each method (also static and class methods). For a module, it disassembles all functions in that module.
Let’s see this in practice. Take the following code:
import dis class Foo(object): def __init__(self): pass def foo(self, x): return x + 1 def bar(): x = 5 y = 7 z = x + y return z def main(): dis.dis(bar) # disassembles `bar` dis.dis(Foo) # disassembles each method in `Foo`
This will print:
14 0 LOAD_CONST 1 (5) 3 STORE_FAST 0 (x) 15 6 LOAD_CONST 2 (7) 9 STORE_FAST 1 (y) 16 12 LOAD_FAST 0 (x) 15 LOAD_FAST 1 (y) 18 BINARY_ADD 19 STORE_FAST 2 (z) 17 22 LOAD_FAST 2 (z) 25 RETURN_VALUE Disassembly of __init__: 8 0 LOAD_CONST 0 (None) 3 RETURN_VALUE Disassembly of foo: 11 0 LOAD_FAST 1 (x) 3 LOAD_CONST 1 (1) 6 BINARY_ADD 7 RETURN_VALUE
Also, we can disassemble an entire module from the command line using python -m dis module_file.py
. Either way, at this point, we should probably discuss the format of the disassembly output. The columns returned are the following:
- The original line of code the disassembly is referencing.
- The address of the bytecode instruction.
- The name of the instruction.
- The index of the argument in the code block’s name and constant table.
- The human-friendly mapping from the argument index (4) to the actual value or name being referenced.
For (4), it is important to understand that all code objects in Python, that is, isolated code blocks like functions, have internal name and constant tables. These tables are simply lists, where the constant table would hold constants such as string literals, numbers or special values such as None
that appear at least once in the code block, while the name table will hold a list of variable names. These variable names are then, further, keys into a dictionary mapping such symbols to actual values. The reason why instruction arguments are indices into tables and not the values stored in those tables is so that arguments can have uniform length (always two bytes). As you can imagine, storing variable-length strings in the bytecode directly makes advancing a program counter a great deal more complex.
……
Interpreting Bytecode
Disassembled bytecode instructions are already quite low-level (a.k.a. cool). However, we can go even deeper and understand the byte code itself – i.e. the binary or hexadecimal representation of the instructions in compiled and assembled bytecode. For this, let’s define a function and mess a little more with its __code__
property:
def function(): x = 5 l = [1, 2] return len(l) + x
Through function.__code__
we can gain access to the code object associated with the function. Furthermore, function.__code__.co_code
returns the actual bytecode:
In [1]: function.__code__.co_code Out[1]: b'd\x01\x00}\x00\x00d\x02\x00d\x03\x00g\x02\x00}\x01\x00t\x00\x00|\x01\x00\x83\x01\x00|\x00\x00\x17S'
Yes! Bytes! Just what I like for breakfast. But what can we actually make of these delicious bites of bytecode? Well, we know that these bytes specify instructions, some taking arguments and some not. Each instruction will occupy a single byte and arguments (such as the indices into the name and constants table) will occupy further bytes. Furthermore, fortunately enough, the dis
module (as well as the opcode
module) provides an opname
table and an opmap
map. The former is a simple list, laid out such that indexing it with the opcode of an instruction will return the name (mnemonic) of that instruction. The latter, dis.opmap
, maps instruction mnemonics to their bytecode numbers:
In [1]: dis.opname[69] Out[1]: 'GET_YIELD_FROM_ITER' In [2]: dis.opmap['LOAD_CONST'] Out[2]: 100
So, if we know the byte value describing a certain instruction, we now know how to get the instruction name. All that’s left is interpreting the arguments of these instructions. For this we need to know whether or not the instruction takes arguments in the first place. To get this information, we can make use of the dis.hasconst
, dis.hasname
, dis.hasjrel
and dis.hasjabs
and others. Each of these are lists in the dis
module that contain the bytecodes either taking a a constant argument, a name argument, relative/absolute jump target or other kind of parameter. For example,dis.hasnargs
is also such a list, containing all opcodes related to function calls, such as CALL_FUNCTION
, CALL_FUNCTION_VAR
(for functions taking *args
) or CALL_FUNCTION_KW
(for functions taking**kwargs
). It is noteworthy that if an instruction takes arguments at all, it can only take a single argument occupying exactly 16 bits (two bytes).