Writing Go assembly functions with PeachPy
What is PeachPy
PeachPy is a Python-based framework for writing modules in assembly. It automates away some of the details and allows you to use Python to generate repetitive assembly code sequences.
PeachPy supports writing modules that you can use directly from Go for x86-64. (It also supports NaCl and syso modules, but I won’t cover those in this post.)
This post is going to be mostly about what you need to know about integrating PeachPy and less a tutorial about PeachPy specifically.
All the code for this post is at github.com/dgryski/peachpy-examples.
A Simple Example
Let’s start with a simple function: one that takes two integers and returns their sum.
|
|
Next, we’ll add a function stub with a go:generate
directive:
|
|
When we run go generate
, we get the following assembly code:
|
|
Let’s go back through the PeachPy code piece-by-piece.
|
|
First, we declare our two arguments. The types we declare here are used to determine where and how the arguments are passed to the function, and for the auto-generated function prototype. The Go the function prototype is only in a comment in the generated assembly file. There is no capability to match these arguments to the actual function stub, so a mismatch here won’t be detected.
|
|
This line declares a function add
with two arguments (f1, f2)
and with a
return value of type uint64_t
.
|
|
Next, we acquire two registers. In this case, we don’t care what the actual registers are. PeachPy will assign two for us.
|
|
These are PeachPy pseudo-instructions which will load the arguments into the
chosen registers. This is where details are hidden allowing portable assembly.
With the Go calling convention the arguments are passed on the stack, and so
PeachPy generates the appropriate MOV
instructions. If PeachPy generates
code for the regular C calling convention, those will load the arguments from
the appropriate registers instead.
|
|
This last block does the addition and then sets the return value. Here again
RETURN
is a PeachPy pseudo-instruction which will generate the appropriate
store into the stack for Go’s calling convention.
Note also that the operands are Intel syntax order (destination first), rather than the Plan9/AT&T order (source first). PeachPy generates the correct assembly syntax depending on the target.
We can then call this as a normal Go function.
|
|
Working with structs
This example will talk about passing a struct by reference to a function.
First, in order to be able to access any of the fields, you need to know where
their offset relative to the base address of the struct. The unsafe.Offsetof
function will do this for you, but so can Dominik Honnef’s structlayout
tool.
For example, given
|
|
We can query the offsets of all the fields with:
1 2 3 4 5 |
$ structlayout github.com/dgryski/peachpy-examples/struct foo foo.zot uint16: 0-2 (size 2, align 2) foo.bar uint16: 2-4 (size 2, align 2) padding: 4-8 (size 4, align 0) foo.qux uint64: 8-16 (size 8, align 8) |
And here’s our PeachPy code:
|
|
This time we have a few differences. First, the argument type is declared as a
ptr()
. This is the equivalent of a C’s void *
Go’s uintptr
.
Second, when declaring the function itself, there is only a single argument
which, as PeachPy is expecting a tuple as the second argument, we have to write
as (f, )
.
The body of the function starts off the same. We load the address of the
struct (the first argument) into reg_f_base
.
Next, we declare a temporary register v
to store the sum.
Our next two instructions load the value of bar
(a word-sized value at offset
2) and add qux
(offset 8)
Finally, we set v
as the return value from the function.
|
|
Our stub file includes an additional directive before the function declaration.
The //go:noescape
directive tells the compiler that the pointer passed to
this function does not escape to the heap or into the return values. Without
this, the compiler would have no choice but to allocate the struct on the heap.
Now, the compiler’s escape analysis can tell it to potentially be allocated on
the stack instead (as it will be with our main
function). This can be a
significant win. On the other hand, stack allocation can break the alignment
needed by certain SSE instructions, requiring the use of the unaligned version
instead.
Here is our main
:
|
|
And running it we get the expected result:
1
|
sum = 50200 |
Working with slices
There are two ways to work with slices.
You can pass the pass the address of the first element and work with that but then your pure-Go version and your asm version will have different parameter lists. This matches more closely with how C would call your code and makes it easier to use your assembly code for both C and Go.
On the other hand, since you can’t do pointer-arithmetic in Go, your pure-Go version (you are writing pure-Go versions of your assembly routines, right?) will still need to be passed a slice. Which means you’re going to have code that looks like
1 2 3 4 5 6 |
var total uint64 if useGo { total = addGo(x) } else { total = addAsm(&x[0], len(x)) } |
But if you’re just writing Go code, it’s nicer if they’re the same. (You can still call your slice-expecting-assembly routine from C – you just need to pass the length twice to pretend to be the capacity).
Let’s look at an example. Here are the arguments to the function:
|
|
A slice is passed as three arguments: a pointer to the data, the length, and
the capacity. Technically, len
and cap
are signed ints. However, we’re
going to use size_t
instead, even though it’s unsigned. (The signedness here
is only used for cosmetic purposes when generating the prototype comment for
the generated assembly and doesn’t affect the instructions at all. Also, using
size_t
is semantically nicer than using ptrdiff_t
which is an appropriately
sized signed integer. PeachPy only uses the size of the integer when
generating stack offsets. If you want to be exact and you know you’re only
going to be running on amd64, you could use the exact int64_t
instead.
Here our function declaration mentions the three arguments we’re expecting. We
can also see the use of PeachPy’s with Loop()
construct to sum up all the
elements of the slice.
|
|
While our assembly stub just lists a single slice:
|
|
The go vet
tool will check that the assembly arguments for a slice are named
correctly. If you have an argument s
which is a slice, the three arguments
to your assembly function should be s_base
, s_len
, and s_cap
.
Working with strings
A string is just a slice without a capacity, so you only have to declare two arguments: a data pointer and a length.
Debugging
Delve supports single-stepping through assembly code, which is nice. It doesn’t yet support viewing any of the extended registers.
Since PeachPy supports the standard C calling convention, you can also call your assembly routines from C and debug it with any of the standard Linux debuggers without having to worry about the Go runtime specifics.
For example, the slice code above we can generate a sysv elf object file with
|
|
and call it from C with
|
|
Testing
Testing assembly code is no different from testing regular code. However, one suggestion is to always have a pure-Go version of whatever your assembly routine is doing. This can be used on platforms, but also as a base-version to compare your optimized implementation with. If you use build-tags to choose an implementation at build-time, you’ll probably need to rename the functions when fuzzing so you can have both functions in your test binary.
The inputs can be from your regular test suite, or explored via go-fuzz. For more on using go-fuzz for comparing implementations, please see DNS Parser, meet Go fuzzer
Finally, you can also leverage PeachPy and do your testing in Python.
This will require constructing argument lists via Python’s ctypes
module.
A simple example to test our code which works with slices:
|
|
By having your tests in your python code, they will be run each time PeachPy generates the assembly code.
Resources
The official docs for the Go assembler at https://golang.org/doc/asm. They’re useful to read, but remember that PeachPy will be taking care of the many of the details for you regarding the syntax and calling convention.
The PeachPy sources
Finally, at GolangUK 2016, Michael Munday gave a talk on Dropping Down: Go Functions in Assembly.