Rock It 《ML》JupyterLab 【丁】Code《七》語義【六】解讀‧三外

漫步瀏覽山城景緻,不覺間走出廓外。眼前視野一片開闊,正可以欣賞對舉論辨也︰

Dynamic Typing: What the Compiler Doesn’t Know

One thing you’ve probably heard is that Python is a “dynamic” language—particularly that it’s “dynamically typed”. The work we’ve done to this point sheds some light on this description.

One of the things “dynamic” means in this context is that a lot of work is done at run time. We saw earlier that the Python compiler doesn’t have much information about what the code actually does. For example, consider the short function mod below. mod takes two arguments and returns the first modulo the second. In the bytecode, we see that the variables a and b are loaded, then the bytecode BINARY_MODULO performs the modulo operation itself.

>>> def mod(a, b):
...    return a % b
>>> dis.dis(mod)
  2           0 LOAD_FAST                0 (a)
              3 LOAD_FAST                1 (b)
              6 BINARY_MODULO
              7 RETURN_VALUE
>>> mod(19, 5)
4

Calculating 19 % 5 yields 4—no surprise there. What happens if we call it with different arguments?

>>> mod("by%sde", "teco")
'bytecode'

What just happened? You’ve probably seen this syntax before, but in a different context:

>>> print("by%sde" % "teco")
bytecode

Using the symbol % to format a string for printing means invoking the instruction BINARY_MODULO. This instruction mods together the top two values on the stack when the instruction executes—regardless of whether they’re strings, integers, or instances of a class you defined yourself. The bytecode was generated when the function was compiled (effectively, when it was defined) and the same bytecode is used with different types of arguments.

The Python compiler knows relatively little about the effect the bytecode will have. It’s up to the interpreter to determine the type of the object that BINARY_MODULO is operating on and do the right thing for that type. This is why Python is described as dynamically typed: you don’t know the types of the arguments to this function until you actually run it. By contrast, in a language that’s statically typed, the programmer tells the compiler up front what type the arguments will be (or the compiler figures them out for itself).

The compiler’s ignorance is one of the challenges to optimizing Python or analyzing it statically—just looking at the bytecode, without actually running the code, you don’t know what each instruction will do! In fact, you could define a class that implements the __mod__ method, and Python would invoke that method if you use % on your objects. So BINARY_MODULO could run any code at all!

Just looking at the following code, the first calculation of a % b seems wasteful.

def mod(a,b):
    a % b
    return a %b

Unfortunately, a static analysis of this code—the kind of you can do without running it—can’t be certain that the first a % b really does nothing. Calling __mod__ with% might write to a file, or interact with another part of your program, or do literally anything else that’s possible in Python. It’s hard to optimize a function when you don’t know what it does! In Russell Power and Alex Rubinsteyn’s great paper “How fast can we make interpreted Python?”, they note, “In the general absence of type information, each instruction must be treated as INVOKE_ARBITRARY_METHOD.”

Conclusion

Byterun is a compact Python interpreter that’s easier to understand than CPython. Byterun replicates CPython’s primary structural details: a stack-based interpreter operating on instruction sets called bytecode. It steps or jumps through these instructions, pushing to and popping from a stack of data. The interpreter creates, destroys, and jumps between frames as it calls into and returns from functions and generators. Byterun shares the real interpreter’s limitations, too: because Python uses dynamic typing, the interpreter must work hard at run time to determine the correct behavior of a program.

I encourage you to disassemble your own programs and to run them using Byterun. You’ll quickly run into instructions that this shorter version of Byterun doesn’t implement. The full implementation can be found at https://github.com/nedbat/byterun—or, by carefully reading the real CPython interpreter’s ceval.c, you can implement it yourself!

 

揣摩那『一動』『一靜』旨趣!

Peephole Optimizer

Peephole optimizer: optimize Python code objects. It is implemented with the BytecodeBlocksclass.

It is based on the peephole optimizer of CPython 3.6 which is written in C.

Example

Code:

import dis
from bytecode.peephole_opt import PeepholeOptimizer

code = compile('print(5+5)', '<string>', 'exec')
print("Non-optimized:")
dis.dis(code)
print()

code = PeepholeOptimizer().optimize(code)
print("Optimized:")
dis.dis(code)

Output of Python 3.6 patched with the PEP 511 with python -o noopt (to disable the builtin peephole optimizer):

Non-optimized:
  1           0 LOAD_NAME                0 (print)
              3 LOAD_CONST               0 (5)
              6 LOAD_CONST               0 (5)
              9 BINARY_ADD
             10 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             13 POP_TOP
             14 LOAD_CONST               1 (None)
             17 RETURN_VALUE

Optimized:
  1           0 LOAD_NAME                0 (print)
              3 LOAD_CONST               0 (10)
              6 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
              9 POP_TOP
             10 LOAD_CONST               1 (None)
             13 RETURN_VALUE

……

Optimizations

Optimizations implemented in the peephole optimizer:

  • Constant folding
    • unary operations: +a, -a, ~a
    • binary operations:
      • a + b, a – b, a * b, a / b, a // b, a % b, a ** b
      • a << b, a >> b, a & b, a | b, a ^ b
    • BUILD_TUPLE: convert to a constant
    • Replace BUILD_TUPLE <n> + UNPACK_SEQUENCE <n> and BUILD_LIST <n> + UNPACK_SEQUENCE <n> with ROT_TWO (2 arguments) or ROT_THREE+ROT_TWO (3 arguments). For BUILD_LIST, if inputs are LOAD_CONST, rewrite LOAD_CONST in the reverse order.
    • BUILD_LIST + COMPARE_OP(in/not in): convert list to tuple
    • BUILD_SET + COMPARE_OP(in/not in): convert set to frozenset
    • COMPARE_OP:
      • replace not(a is b) with a is not b
      • replace not(a is not b) with a is b
      • replace not(a in b) with a not in b
      • replace not(a not in b) with a in b
  • Remove NOP instructions
  • Dead code elimination
    • Remove unreachable code after a final operation (Instr.is_final())
    • Remove unreachable blocks (Block)
  • Replace UNARY_NOT+POP_JUMP_IF_FALSE with POP_JUMP_IF_TRUE
  • Optimize jumps
    • Replace unconditional jumps to RETURN_VALUE with RETURN_VALUE
    • Replace jumps to unconditional jumps with jumps to the final target
    • Remove unconditional jumps to the following block

For tuples, constant folding is only run if the result has 20 items or less.

By design, only basic optimizations can be implemented. A peephole optimizer has a narrow view on the bytecode (a few instructions) and only a very limited knownledge of the code.

 

反思聚物比類之所以重要耶?