gpython: a Python interpreter written in Go "batteries not included"
Gpython is a Python 3.4 interpreter written in Go. This is the story of how it came to be, how it works and where it is going.
This includes a quick run through how an interpreted language like Python/Gpython works with a dip into the Virtual Machine (VM), lexing source, parsing it and compiling to byte code.
Genesis
In 2013 I had a health problem which meant I needed to take 3 months off work, so I needed a project to take my mind off things. I came up with the ridiculous idea of porting the Python interpreter to Go as a project which would take at least 3 months and would keep me occupied. It also had the nice property that it was completely self contained - just me and 380,000 lines of C and 483,000 lines of Python so I could pick it up and put it down according to how I was feeling.
So with that overambitious plan decided, I worked out how I was going to attack the project:
- Implement some Python objects, just enough for…
- Implement a 100% compatible byte code interpreter
- Implement the lexer
- Implement the parser
- Implement the compiler
- …
- PROFIT!
I ended up completing step 1 & 2 in the 3 months I was off work and the other steps a lot later ;-)
1. Python objects
Before we get to the Virtual Machine (VM) we need to define some Python objects to act upon. Everything in Python is an object. Python objects are implemented as C structures with a pointer to a type, so this suggested that the Python object should be implemented as an interface in Go.
|
|
Where Type
is defined like this:
|
|
If you are familiar with Python you’ll see some things you recognise
there, the name of the object, its docstring, a dictionary etc. Note
that in Python types are also objects hence the ObjectType
.
Python objects are reference counted and garbage collected when not in use. Gpython uses the go garbage collector rather than re-implement this.
Strings
Because Gpython objects are implemented by an interface, we can use native go types to implement them. Here is how the string type is implemented.
|
|
That makes interoperability between Gpython and Go really easy. If
you want a Gpython string you just cast it with py.String("gpython
string")
and likewise a type assertion gets a native string from a
Python object.
Methods
Objects wouldn’t be much use without methods. Here Gpython takes a hybrid approach to methods.
There are two ways of defining methods:
- As specially named methods on objects
- As items in the
Type
’s dictionary (Dict
)
Here is how the __len__
method is defined on strings:
|
|
2. The VM
Now we’ve defined a few objects we are ready to start implementing the byte code interpreter.
Unlike most of Python, the VM isn’t well documented. It is implemented in ceval.c. This contains a lot of highly optimised C code!
The Python byte code is 8 bit op-codes which implement a stack
machine. All things on the stack are Python objects. The Python
stack is part of the Frame
object which you can see below. There is
one Frame
per function, so if you call a function, Python makes a
new Frame
.
|
|
A typical byte code operation might be BINARY_ADD
. This pops two
values off the stack, adds them together and pushes the result back on
the stack.
|
|
Opcodes can have a 32 bit integer parameter which is encoded with
variable length extension op codes. This is used for program control
for example in the JUMP_ABSOLUTE
instruction which is for jumping to
different parts of the program.
|
|
VM complete!
Skipping lightly over a few details in the VM now we’ve reached the point we can run Python programs!
Sort of…
Well we have to get Python to compile the Python source code into byte code for us and then we can run the byte code with Gpython so we aren’t self hosting yet… but we are getting there.
Next step is compiling and running our own code. This is broken down into:
- Lexing
- Parsing
- Compiling
3. Lexing python
Lexing is the process of taking a stream of input (UTF-8 in gpython’s case) and turning it into tokens. The tokens are traditionally written in UPPER_CASE and are numeric values.
Gpython’s lexer is a from scratch implementation of the excellent specification.
For example we turn the input Python code:
|
|
Into a slice of LexToken
:
|
|
Unusually for a lexer, the Python lexer encodes the white space at the
start of lines with the INDENT
and DEDENT
tokens. You can see
also the lexer records the position of each symbol for error reporting
in the ast.Pos
type.
4. Parsing Python
Once the input source has been tokenised it is ready to feed into a parser. The parser takes the stream of lexed tokens and outputs an Abstract Syntax Tree. The AST is a tree representation of the source code suitable for compilation.
Gpython’s parser is implemented with goyacc. This will be immediately familiar to anyone who has used yacc or bison but it probably will seem a bit strange if you haven’t used a parser generator before.
The parser is implemented in
parser.y
which is a mixture of goyacc directives and go code. This is
processed by the goyacc tool into the go code (using go generate
to
control the process).
Here is how the for
statement is defined. You can see the tokens
expected in the input FOR
, IN
and the literal :
. An exprlist
is a sequence of expressions eg a
or a, b
. A testlist
is an
expression for the for
statement to range over. suite
is the body
of the for loop and the optional_else
(not shown) defines what to do
with the else:
part of the for
loop. The $N
refer to the input
line so $2
is the exprlist
.
|
|
As an example this input Python code:
|
|
Would produce this AST. You can see the for
the two parameters a,
b
the iterable b
and the body pass
and the else
clause with
break
in.
|
|
Python has an AST defined as part of the language and gpython follows that to build the tree of objects that represents an input program.
5. Compile
Now we have the AST we can compile it!
Compiling is the process of transforming the AST into byte codes which can be run directly by the VM. In a C compiler the output would be machine language, but here we output Python byte codes instead.
In gpython this is actually a two step process. The compiler outputs an internal assembler format which has labels and then that internal assembler format is assembled into byte code.
The compiler in gpython was mostly written from scratch, however I did get quite a bit of assistance looking at the Python source code for the tricky bits. Having used exactly the same AST representation as Python was helpful here. The assembler was written entirely from scratch.
Given that the Python VM is a stack machine it is comparatively easy to implement the compiler - we don’t have to allocate registers or keep track of memory or anything like that.
Here is a very cut down version of the expression compiler which shows
how the binary operators get compiled. An Expr
node represents an
arbitrarily complicated expression, eg 1+2
or f(x) + 7*y
. It
typically contains other Expr
nodes. Expr
nodes can be variables
or constants or other Expr
nodes.
|
|
Let’s see how we might compile the expression a+b
which would be represented by this AST:
|
|
Running that through Expr
would first recurse to evaluate
node.Left
(a
) which would compile LOAD_GLOBAL 'a'
then it would
recurse to evaluate node.Right
(b
) which would compile
LOAD_GLOBAL 'b'
finally it would compile the op code BINARY_ADD
to
add the two items on the stack and replacing them with a+b
to
produce something like this:
1 2 3 |
1 0 LOAD_GLOBAL 0 (a) 3 LOAD_GLOBAL 1 (b) 6 BINARY_ADD |
You can use Python’s dis module to explore bytecodes further.
Achievement unlocked?
Now gpython can Lex, Parse, Compile and Run Python code. Job done?
Well no unfortunately. What makes Python great is the fact that there are a lot of well written modules included in the standard library. Unfortunately for gpython a great number of these are written in C rather than Python which means porting them is really hard work.
Hence the tag line “batteries not included” :-(
It was at that point that the gpython project ground to a halt for several years. It contained a working Python interpreter, but very few modules and I just couldn’t bring myself to release it like that.
However after chatting to various gophers I decided to throw it over
the wall release it anyway, warts and all, and I did so in 2018.
Sebastien Binet very kindly offered to have it as part of the go-python organisation so it could live with other Go and Python crossover projects.
Performance
Performance was never a goal of gpython, but I often get asked how fast is it?
Here is the ubiquitous pystone benchmark
1 2 3 |
$ gpython examples/pystone.py Pystone(1.1) time for 50000 passes = 1.6536743640899658 This machine benchmarks at 30235.698808522993 pystones/second |
vs python3.4
1 2 3 |
$ python3.4 examples/pystone.py Pystone(1.1) time for 50000 passes = 0.312686 This machine benchmarks at 159905 pystones/second |
So gpython is 5 times slower than Python 3.4 on the pystone benchmark.
However gpython is a lot faster than Python for long number
operations - I think because the go big.Int
implementation is faster
than the Python one.
1 2 |
$ gpython examples/pi_chudnovsky_bs.py chudnovsky_gmpy_bs: digits 100000 time 0.6917450428009033 |
1 2 |
$ python3.4 examples/pi_chudnovsky_bs.py chudnovsky_gmpy_bs: digits 100000 time 3.047863006591797 |
Gpython is 4.4x faster than Python on this cherry picked example ;-)
Gpython in your browser
go1.11 came with a web assembly backend so I decided to port gpython to the web. It turned out to be quite straight forward, so here you can try gpython in your browser!
There is a go/wasm implementation and a gopherjs implementation each with their own bugs!
Future
I’d like to fill out the modules for gpython. I think gpython could re-use some of the work done by the grumpy project to increase its coverage here.
I’d also like to explore adding go routines and channel support to gpython but I don’t plan to introduce the global interpreter lock (GIL).
If anyone would like to help with any of that the gpython team is
always looking for contributors - pull requests here
please :-) You can chat with
the go-python community (of which gpython is part) at
go-python@googlegroups.com
or on the Gophers Slack in the
#go-python
channel.
Thank you for reading and I hope you enjoyed that meander through gpython! If you have any questions then please contact me at @njcw or nick@craig-wood.com