漫步瀏覽山城景緻,不覺間走出廓外。眼前視野一片開闊,正可以欣賞對舉論辨也︰
Dynamic Typing: What the Compiler Doesn’t Know
One thing you’ve probably heard is that Python is a “dynamic” language—particularly that it’s “dynamically typed”. The work we’ve done to this point sheds some light on this description.
One of the things “dynamic” means in this context is that a lot of work is done at run time. We saw earlier that the Python compiler doesn’t have much information about what the code actually does. For example, consider the short function mod
below. mod
takes two arguments and returns the first modulo the second. In the bytecode, we see that the variables a
and b
are loaded, then the bytecode BINARY_MODULO
performs the modulo operation itself.
>>> def mod(a, b): ... return a % b >>> dis.dis(mod) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 1 (b) 6 BINARY_MODULO 7 RETURN_VALUE >>> mod(19, 5) 4
Calculating 19 %
5 yields 4—no surprise there. What happens if we call it with different arguments?
>>> mod("by%sde", "teco") 'bytecode'
What just happened? You’ve probably seen this syntax before, but in a different context:
>>> print("by%sde" % "teco") bytecode
Using the symbol %
to format a string for printing means invoking the instruction BINARY_MODULO
. This instruction mods together the top two values on the stack when the instruction executes—regardless of whether they’re strings, integers, or instances of a class you defined yourself. The bytecode was generated when the function was compiled (effectively, when it was defined) and the same bytecode is used with different types of arguments.
The Python compiler knows relatively little about the effect the bytecode will have. It’s up to the interpreter to determine the type of the object that BINARY_MODULO
is operating on and do the right thing for that type. This is why Python is described as dynamically typed: you don’t know the types of the arguments to this function until you actually run it. By contrast, in a language that’s statically typed, the programmer tells the compiler up front what type the arguments will be (or the compiler figures them out for itself).
The compiler’s ignorance is one of the challenges to optimizing Python or analyzing it statically—just looking at the bytecode, without actually running the code, you don’t know what each instruction will do! In fact, you could define a class that implements the __mod__
method, and Python would invoke that method if you use %
on your objects. So BINARY_MODULO
could run any code at all!
Just looking at the following code, the first calculation of a % b
seems wasteful.
def mod(a,b): a % b return a %b
Unfortunately, a static analysis of this code—the kind of you can do without running it—can’t be certain that the first a % b
really does nothing. Calling __mod__
with%
might write to a file, or interact with another part of your program, or do literally anything else that’s possible in Python. It’s hard to optimize a function when you don’t know what it does! In Russell Power and Alex Rubinsteyn’s great paper “How fast can we make interpreted Python?”, they note, “In the general absence of type information, each instruction must be treated as INVOKE_ARBITRARY_METHOD
.”
Conclusion
Byterun is a compact Python interpreter that’s easier to understand than CPython. Byterun replicates CPython’s primary structural details: a stack-based interpreter operating on instruction sets called bytecode. It steps or jumps through these instructions, pushing to and popping from a stack of data. The interpreter creates, destroys, and jumps between frames as it calls into and returns from functions and generators. Byterun shares the real interpreter’s limitations, too: because Python uses dynamic typing, the interpreter must work hard at run time to determine the correct behavior of a program.
I encourage you to disassemble your own programs and to run them using Byterun. You’ll quickly run into instructions that this shorter version of Byterun doesn’t implement. The full implementation can be found at https://github.com/nedbat/byterun—or, by carefully reading the real CPython interpreter’s ceval.c
, you can implement it yourself!
Peephole Optimizer
Peephole optimizer: optimize Python code objects. It is implemented with the BytecodeBlocks
class.
It is based on the peephole optimizer of CPython 3.6 which is written in C.
…
Example
Code:
import dis from bytecode.peephole_opt import PeepholeOptimizer code = compile('print(5+5)', '<string>', 'exec') print("Non-optimized:") dis.dis(code) print() code = PeepholeOptimizer().optimize(code) print("Optimized:") dis.dis(code)
Output of Python 3.6 patched with the PEP 511 with python -o noopt
(to disable the builtin peephole optimizer):
Non-optimized: 1 0 LOAD_NAME 0 (print) 3 LOAD_CONST 0 (5) 6 LOAD_CONST 0 (5) 9 BINARY_ADD 10 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 13 POP_TOP 14 LOAD_CONST 1 (None) 17 RETURN_VALUE Optimized: 1 0 LOAD_NAME 0 (print) 3 LOAD_CONST 0 (10) 6 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 9 POP_TOP 10 LOAD_CONST 1 (None) 13 RETURN_VALUE
……
Optimizations
Optimizations implemented in the peephole optimizer:
- Constant folding
- unary operations: +a, -a, ~a
- binary operations:
- a + b, a – b, a * b, a / b, a // b, a % b, a ** b
- a << b, a >> b, a & b, a | b, a ^ b
- BUILD_TUPLE: convert to a constant
- Replace BUILD_TUPLE <n> + UNPACK_SEQUENCE <n> and BUILD_LIST <n> + UNPACK_SEQUENCE <n> with ROT_TWO (2 arguments) or ROT_THREE+ROT_TWO (3 arguments). For BUILD_LIST, if inputs are LOAD_CONST, rewrite LOAD_CONST in the reverse order.
- BUILD_LIST + COMPARE_OP(in/not in): convert list to tuple
- BUILD_SET + COMPARE_OP(in/not in): convert set to frozenset
- COMPARE_OP:
- replace
not(a is b)
witha is not b
- replace
not(a is not b)
witha is b
- replace
not(a in b)
witha not in b
- replace
not(a not in b)
witha in b
- replace
- Remove NOP instructions
- Dead code elimination
- Remove unreachable code after a final operation (
Instr.is_final()
) - Remove unreachable blocks (
Block
)
- Remove unreachable code after a final operation (
- Replace UNARY_NOT+POP_JUMP_IF_FALSE with POP_JUMP_IF_TRUE
- Optimize jumps
- Replace unconditional jumps to RETURN_VALUE with RETURN_VALUE
- Replace jumps to unconditional jumps with jumps to the final target
- Remove unconditional jumps to the following block
For tuples, constant folding is only run if the result has 20 items or less.
By design, only basic optimizations can be implemented. A peephole optimizer has a narrow view on the bytecode (a few instructions) and only a very limited knownledge of the code.
反思聚物比類之所以重要耶?