Rock It 《ML》JupyterLab 【丁】Code《七》語義【六】解讀‧二

為了深入理解

操作語義學

操作語義學計算機科學中的一個概念,它是使得電腦程式在數學上更加嚴謹的一種手段。其它類似的手段包括提供形式語義學,包括公理語義學指稱語義

一個計算機語言的操作語義描述一段合理的程序是怎樣被理解為一系列計算機步驟的。這些步驟就是這個程序的意義。在函數程式語言中一段終結性的序列在最後一步的返回程序的值。(由於一個程序可能是非非決定的,一般來說一個程序能夠有許多不同的計算步驟和許多不同的返回值。)

操作語義最早被用來定義Algol 68的語義。下面這句話引用修正的ALGOL 68報告:

一個使用嚴格語言編寫的程序的意義是通過一個假設的計算機來執行該程序的組成部分時完成的行動來解釋的。(Algol68,第二章)

丹納·司科特是第一個在今天的這個定義下使用操作語義這個概念的(Plotkin04b)。以下是司科特關於形式語義學的講稿,其中他提到了語義的「操作」觀點。

把目光注意使得語義在更『抽象』和更『清晰』可以,但是假如把操作方面完全忽略的話這個計劃毫無用處。(Scott70

 

的論旨,且讓我們將目光從 Python AST

compile(source, filename, mode, flags=0, dont_inherit=False, optimize=-1)

Compile the source into a code or AST object. Code objects can be executed by exec() or eval(). source can either be a normal string, a byte string, or an AST object. Refer to the ast module documentation for information on how to work with AST objects.

The filename argument should give the file from which the code was read; pass some recognizable value if it wasn’t read from a file ('<string>' is commonly used).

The mode argument specifies what kind of code must be compiled; it can be 'exec' if source consists of a sequence of statements, 'eval' if it consists of a single expression, or 'single' if it consists of a single interactive statement (in the latter case, expression statements that evaluate to something other than None will be printed).

The optional arguments flags and dont_inherit control which future statements affect the compilation of source. If neither is present (or both are zero) the code is compiled with those future statements that are in effect in the code that is calling compile(). If the flags argument is given and dont_inherit is not (or is zero) then the future statements specified by the flags argument are used in addition to those that would be used anyway. Ifdont_inherit is a non-zero integer then the flags argument is it – the future statements in effect around the call to compile are ignored.

Future statements are specified by bits which can be bitwise ORed together to specify multiple statements. The bitfield required to specify a given feature can be found as the compiler_flag attribute on the _Featureinstance in the __future__ module.

The argument optimize specifies the optimization level of the compiler; the default value of -1 selects the optimization level of the interpreter as given by -O options. Explicit levels are 0 (no optimization; __debug__ is true), 1 (asserts are removed, __debug__ is false) or 2 (docstrings are removed too).

This function raises SyntaxError if the compiled source is invalid, and ValueError if the source contains null bytes.

If you want to parse Python code into its AST representation, see ast.parse().

Note

When compiling a string with multi-line code in 'single' or 'eval' mode, input must be terminated by at least one newline character. This is to facilitate detection of incomplete and complete statements in thecode module.

Warning

It is possible to crash the Python interpreter with a sufficiently large/complex string when compiling to an AST object due to stack depth limitations in Python’s AST compiler.

Changed in version 3.2: Allowed use of Windows and Mac newlines. Also input in 'exec' mode does not have to end in a newline anymore. Added the optimize parameter.

Changed in version 3.5: Previously, TypeError was raised when null bytes were encountered in source.

 

轉注到

bytecode

Python source code is compiled into bytecode, the internal representation of a Python program in the CPython interpreter. The bytecode is also cached in .pyc files so that executing the same file is faster the second time (recompilation from source to bytecode can be avoided). This “intermediate language” is said to run on a virtual machine that executes the machine code corresponding to each bytecode. Do note that bytecodes are not expected to work between different Python virtual machines, nor to be stable between Python releases.

A list of bytecode instructions can be found in the documentation for the dis module.

 

至於派生為何用『位元組碼』呢?或可讀一下維基百科詞條

Bytecode

Bytecode, also termed portable code or p-code, is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (normally numeric addresses) that encode the result of compiler parsing and performing semantic analysis of things like type, scope, and nesting depths of program objects.

The name bytecode stems from instruction sets that have one-byte opcodes followed by optional parameters. Intermediate representations such as bytecode may be output by programming language implementations to ease interpretation, or it may be used to reduce hardware and operating system dependence by allowing the same code to run cross-platform, on different devices. Bytecode may often be either directly executed on avirtual machine (a p-code machine i.e., interpreter), or it may be further compiled into machine code for better performance.

Since bytecode instructions are processed by software, they may be arbitrarily complex, but are nonetheless often akin to traditional hardware instructions: virtual stack machines are the most common, but virtual register machines have been built also.[1][2] Different parts may often be stored in separate files, similar to object modules, but dynamically loaded during execution.

Execution

A bytecode program may be executed by parsing and directly executing the instructions, one at a time. This kind of bytecode interpreter is very portable. Some systems, called dynamic translators, or just-in-time (JIT) compilers, translate bytecode into machine code as necessary at runtime. This makes the virtual machine hardware-specific, but doesn’t lose the portability of the bytecode. For example, Java and Smalltalk code is typically stored in bytecoded format, which is typically then JIT compiled to translate the bytecode to machine code before execution. This introduces a delay before a program is run, when bytecode is compiled to native machine code, but improves execution speed considerably compared to interpreting source code directly, normally by around an order of magnitude (10x).[3]

Because of its performance advantage, today many language implementations execute a program in two phases, first compiling the source code into bytecode, and then passing the bytecode to the virtual machine. There are bytecode based virtual machines of this sort for Java, Python, PHP,[4] Tcl, mawk and Forth (however, Forth is seldom compiled via bytecodes in this way, and its virtual machine is more generic instead). The implementation of Perl and Ruby 1.8 instead work by walking an abstract syntax tree representation derived from the source code.

More recently, the authors of V8[5] and Dart[6] have challenged the notion that intermediate bytecode is needed for fast and efficient VM implementation. Both of these language implementations currently do direct JIT compiling from source code to machine code with no bytecode intermediary.[7]

 

有一些初步認識也。

然後朝向

disDisassembler for Python bytecode

Source code: Lib/dis.py


The dis module supports the analysis of CPython bytecode by disassembling it. The CPython bytecode which this module takes as an input is defined in the file Include/opcode.h and used by the compiler and the interpreter.

CPython implementation detail: Bytecode is an implementation detail of the CPython interpreter. No guarantees are made that bytecode will not be added, removed, or changed between versions of Python. Use of this module should not be considered to work across Python VMs or Python releases.

Changed in version 3.6: Use 2 bytes for each instruction. Previously the number of bytes varied by instruction.

Example: Given the function myfunc():

def myfunc(alist):
    return len(alist)

the following command can be used to display the disassembly of myfunc():

>>> dis.dis(myfunc)
  2           0 LOAD_GLOBAL              0 (len)
              2 LOAD_FAST                0 (alist)
              4 CALL_FUNCTION            1
              6 RETURN_VALUE

(The “2” is a line number).

 

大步前進呦☆