gpython: a Python interpreter written in Go "batteries not included"
Gpython is a Python 3.4 interpreter written in Go. This is the story of how it came to be, how it works and where it is going.
This includes a quick run through how an interpreted language like Python/Gpython works with a dip into the Virtual Machine (VM), lexing source, parsing it and compiling to byte code.
In 2013 I had a health problem which meant I needed to take 3 months off work, so I needed a project to take my mind off things. I came up with the ridiculous idea of porting the Python interpreter to Go as a project which would take at least 3 months and would keep me occupied. It also had the nice property that it was completely self contained - just me and 380,000 lines of C and 483,000 lines of Python so I could pick it up and put it down according to how I was feeling.
So with that overambitious plan decided, I worked out how I was going to attack the project:
- Implement some Python objects, just enough for…
- Implement a 100% compatible byte code interpreter
- Implement the lexer
- Implement the parser
- Implement the compiler
I ended up completing step 1 & 2 in the 3 months I was off work and the other steps a lot later ;-)
1. Python objects
Before we get to the Virtual Machine (VM) we need to define some Python objects to act upon. Everything in Python is an object. Python objects are implemented as C structures with a pointer to a type, so this suggested that the Python object should be implemented as an interface in Go.
Type is defined like this:
If you are familiar with Python you’ll see some things you recognise
there, the name of the object, its docstring, a dictionary etc. Note
that in Python types are also objects hence the
Python objects are reference counted and garbage collected when not in use. Gpython uses the go garbage collector rather than re-implement this.
Because Gpython objects are implemented by an interface, we can use native go types to implement them. Here is how the string type is implemented.
That makes interoperability between Gpython and Go really easy. If
you want a Gpython string you just cast it with
string") and likewise a type assertion gets a native string from a
Objects wouldn’t be much use without methods. Here Gpython takes a hybrid approach to methods.
There are two ways of defining methods:
- As specially named methods on objects
- As items in the
Type’s dictionary (
Here is how the
__len__ method is defined on strings:
2. The VM
Now we’ve defined a few objects we are ready to start implementing the byte code interpreter.
Unlike most of Python, the VM isn’t well documented. It is implemented in ceval.c. This contains a lot of highly optimised C code!
The Python byte code is 8 bit op-codes which implement a stack
machine. All things on the stack are Python objects. The Python
stack is part of the
Frame object which you can see below. There is
Frame per function, so if you call a function, Python makes a
A typical byte code operation might be
BINARY_ADD. This pops two
values off the stack, adds them together and pushes the result back on
Opcodes can have a 32 bit integer parameter which is encoded with
variable length extension op codes. This is used for program control
for example in the
JUMP_ABSOLUTE instruction which is for jumping to
different parts of the program.
Skipping lightly over a few details in the VM now we’ve reached the point we can run Python programs!
Well we have to get Python to compile the Python source code into byte code for us and then we can run the byte code with Gpython so we aren’t self hosting yet… but we are getting there.
Next step is compiling and running our own code. This is broken down into:
3. Lexing python
Lexing is the process of taking a stream of input (UTF-8 in gpython’s case) and turning it into tokens. The tokens are traditionally written in UPPER_CASE and are numeric values.
For example we turn the input Python code:
Into a slice of
Unusually for a lexer, the Python lexer encodes the white space at the
start of lines with the
DEDENT tokens. You can see
also the lexer records the position of each symbol for error reporting
4. Parsing Python
Once the input source has been tokenised it is ready to feed into a parser. The parser takes the stream of lexed tokens and outputs an Abstract Syntax Tree. The AST is a tree representation of the source code suitable for compilation.
Gpython’s parser is implemented with goyacc. This will be immediately familiar to anyone who has used yacc or bison but it probably will seem a bit strange if you haven’t used a parser generator before.
The parser is implemented in
which is a mixture of goyacc directives and go code. This is
processed by the goyacc tool into the go code (using
go generate to
control the process).
Here is how the
for statement is defined. You can see the tokens
expected in the input
IN and the literal
is a sequence of expressions eg
a, b. A
testlist is an
expression for the
for statement to range over.
suite is the body
of the for loop and the
optional_else (not shown) defines what to do
else: part of the
for loop. The
$N refer to the input
$2 is the
As an example this input Python code:
Would produce this AST. You can see the
for the two parameters
b the iterable
b and the body
pass and the
else clause with
Python has an AST defined as part of the language and gpython follows that to build the tree of objects that represents an input program.
Now we have the AST we can compile it!
Compiling is the process of transforming the AST into byte codes which can be run directly by the VM. In a C compiler the output would be machine language, but here we output Python byte codes instead.
In gpython this is actually a two step process. The compiler outputs an internal assembler format which has labels and then that internal assembler format is assembled into byte code.
The compiler in gpython was mostly written from scratch, however I did get quite a bit of assistance looking at the Python source code for the tricky bits. Having used exactly the same AST representation as Python was helpful here. The assembler was written entirely from scratch.
Given that the Python VM is a stack machine it is comparatively easy to implement the compiler - we don’t have to allocate registers or keep track of memory or anything like that.
Here is a very cut down version of the expression compiler which shows
how the binary operators get compiled. An
Expr node represents an
arbitrarily complicated expression, eg
f(x) + 7*y. It
typically contains other
Expr nodes can be variables
or constants or other
Let’s see how we might compile the expression
a+b which would be represented by this AST:
Running that through
Expr would first recurse to evaluate
a) which would compile
LOAD_GLOBAL 'a' then it would
recurse to evaluate
b) which would compile
LOAD_GLOBAL 'b' finally it would compile the op code
add the two items on the stack and replacing them with
produce something like this:
1 2 3
1 0 LOAD_GLOBAL 0 (a) 3 LOAD_GLOBAL 1 (b) 6 BINARY_ADD
You can use Python’s dis module to explore bytecodes further.
Now gpython can Lex, Parse, Compile and Run Python code. Job done?
Well no unfortunately. What makes Python great is the fact that there are a lot of well written modules included in the standard library. Unfortunately for gpython a great number of these are written in C rather than Python which means porting them is really hard work.
Hence the tag line “batteries not included” :-(
It was at that point that the gpython project ground to a halt for several years. It contained a working Python interpreter, but very few modules and I just couldn’t bring myself to release it like that.
However after chatting to various gophers I decided to
throw it over
the wall release it anyway, warts and all, and I did so in 2018.
Sebastien Binet very kindly offered to have it as part of the go-python organisation so it could live with other Go and Python crossover projects.
Performance was never a goal of gpython, but I often get asked how fast is it?
Here is the ubiquitous pystone benchmark
1 2 3
$ gpython examples/pystone.py Pystone(1.1) time for 50000 passes = 1.6536743640899658 This machine benchmarks at 30235.698808522993 pystones/second
1 2 3
$ python3.4 examples/pystone.py Pystone(1.1) time for 50000 passes = 0.312686 This machine benchmarks at 159905 pystones/second
So gpython is 5 times slower than Python 3.4 on the pystone benchmark.
However gpython is a lot faster than Python for long number
operations - I think because the go
big.Int implementation is faster
than the Python one.
$ gpython examples/pi_chudnovsky_bs.py chudnovsky_gmpy_bs: digits 100000 time 0.6917450428009033
$ python3.4 examples/pi_chudnovsky_bs.py chudnovsky_gmpy_bs: digits 100000 time 3.047863006591797
Gpython is 4.4x faster than Python on this cherry picked example ;-)
Gpython in your browser
go1.11 came with a web assembly backend so I decided to port gpython to the web. It turned out to be quite straight forward, so here you can try gpython in your browser!
I’d like to fill out the modules for gpython. I think gpython could re-use some of the work done by the grumpy project to increase its coverage here.
I’d also like to explore adding go routines and channel support to gpython but I don’t plan to introduce the global interpreter lock (GIL).
If anyone would like to help with any of that the gpython team is
always looking for contributors - pull requests here
please :-) You can chat with
the go-python community (of which gpython is part) at
or on the Gophers Slack in the
Thank you for reading and I hope you enjoyed that meander through gpython! If you have any questions then please contact me at @njcw or email@example.com