Andrei Tudor Călin
Dec 6, 2018 6 min read

On the tension between generic code and special cases

The io.Reader and io.Writer interfaces appear in practically all Go programs, and represent the fundamental building blocks for dealing with streams of data. An important feature of Go is that the abstractions around objects such as sockets, files, or in-memory buffers are all expressed in terms of these interfaces. When a Go program speaks to the outside world, it almost always does so through io.Readers and io.Writers, irrespective of the specific platform or communication medium it uses. This universality is a key factor in making code that deals with streams of data composable and re-usable¹.

This post examines the design and implementation of io.Copy, a function which connects a Reader to a Writer in perhaps the simplest way possible: it transfers data from one to the other.

In the general case², io.Copy allocates a buffer, then alternates reading from the source reader into the buffer with writing from the buffer to the destination writer. This works well for many cases, and is certainly correct from a semantic point of view.

That being said, what if for some particular choice of reader and writer, we could do better? How could we teach Copy about it?

Code that uses high level abstractions such as Reader and Writer must often answer questions like these, and must deal with this tension. In general, different platforms, programming languages, or even libraries deal with this question in different ways.

Let’s examine the case of io.Copy in particular, in the hope of acquiring more general wisdom.

One possible try: teaching Copy about specific types

Imagine a Copy that looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


package hypotheticalio

import "bytes"

func Copy(dst Writer, src Reader) (int64, error) {
	switch s := src.(type) {
	case *bytes.Buffer:
		n, err := dst.Write(s.Bytes())
		return int64(n), err
	default:
		// generic code path
	}
}

Notice how our hypothetical io package now imports bytes so that it can use the Buffer type in the type switch. This prohibits bytes from ever importing io, because Go does not allow circular imports. Perhaps we do not notice the problem just yet, and we move on.

Time goes by, and we discover even more special cases worth considering:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


package hypotheticalio

import (
	"bytes"
	"net"
	"os"
)

func Copy(dst Writer, src Reader) (int, error) {
	switch s := src.(type) {
	case *bytes.Buffer:
		n, err := dst.Write(s.Bytes())
		return int64(n), err
	case *net.TCPConn:
		return platformSpecificThings(dst, s)
	case *os.File:
		return differentPlatformSpecificCode(dst, s)
	default:
		// generic code path
	}
}

The code for Copy is changing a lot, even though the meaning of the code has not changed at all. Not only that, but Copy now concerns itself with platform-specific bits, and it knows about operating systems, networking, and so on. It used to be nice and generic, but it is now a difficult to maintain mess of special cases.

It seems like something has gone wrong. This Copy does accommodate both special cases and generic code, but it pays a terrible price to do so, and it imposes terrible restrictions upon the rest of the world.

Perhaps a better try: decoupling Copy from the world using interfaces

Instead of teaching Copy about specific types, the io package introduces two interfaces: ReaderFrom and WriterTo.

A ReaderFrom can be thought of as an object that consumes the data from a Reader into itself. By contrast, a WriterTo can be thought of as an object that pushes the data out of itself into a Writer.

Conceptually, a data transfer from an object to another occurs in both cases, but the way the transfer is expressed makes all the difference. Copy doesn’t need to know anything specific about the types it is working with anymore. If one of them implements ReaderFrom or WriterTo, Copy calls that method, and performs no other work. Copy now looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


package io

func Copy(dst Writer, src Reader) (int64, error) {
	if wt, ok := src.(WriterTo); ok {
		return wt.WriteTo(dst)
	}
	if rt, ok := dst.(ReaderFrom); ok {
		return rt.ReadFrom(src)
	}
	// generic code path
}

Something interesting has happened: compared to the hypothetical scenario from before, Copy now has very little reason to ever change. It is completely generic once again. Not only that, but it can delegate to pieces of code which do have more specific knowledge of types just as well as it did before.

Nothing comes for free, though, and this loose coupling has its own cost. Capabilities are no longer known statically to Copy through specific types, but must be discovered dynamically at runtime, using type assertions.

Interestingly, instead of manifesting itself through messy code, high maintenance costs and prohibitive import restrictions, the tension between generic code and special cases now manifests itself through the loss of compile time information. For package such as io, which is imported by the whole world, this certainly seems like a trade worth making.

Callers can specialize io.Copy by themselves, without changing the function itself. All they need to do is implement io.ReaderFrom or io.WriterTo. The standard library does this in many places. For example:

*bytes.Buffer has both a WriteTo, which drains the buffer into an io.Writer, and a ReadFrom which fills the buffer from an io.Reader
*net.TCPConn has a ReadFrom, which may use sendfile(2) (or a similar interface) on most platforms
the net/http implementation of ResponseWriter has a ReadFrom which may make use of the aforementioned sendfile(2) special case

It is important to note that these are all optimizations which should not affect the semantics of programs in any way. As such, the worst thing that can happen to clients of package io is that a specific optimization might not kick in. Let’s examine one such case. Consider the following wrapper type:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


type CountingWriter struct {
	W io.Writer
	N int64
}

func (cw *CountingWriter) Write(b []byte) (int, error) {
	n, err := cw.W.Write(b)
	cw.N += int64(n)
	return n, err
}

When used as an io.Writer, *CountingWriter hides the properties of the underlying Writer from callers. As such, code that relies on detecting capabilities at runtime, such as io.Copy, will only see an io.Writer when it looks at a *CountingWriter.

Callers that nevertheless want a specific feature of the underlying Writer to be used in such cases must accomodate for it themselves, by discovering the interesting capabilities and using types with more specific wrapper methods. This can be prohibitively difficult in certain cases³.

Furthermore, note how io.ReaderFrom and io.WriterTo do not appear in the signature of io.Copy. They appear in the documentation instead: a far weaker contract.

Closing thoughts

One way or another, the fundamental tension between generic code and special cases appears in any code that deals with abstractions. To accomodate both, the nature of Go interfaces enables one specific kind of loose coupling between components, but this method is not without its subtle costs. Even so, the end result can remain elegant and easy to maintain.

compare Go to platforms which exhibit red-blue issues ^[return]
see the source code here ^[return]
observe the combinatorical explosion which this library solves ^[return]

« Go and WebAssembly: running Go programs in your browser How to Send and Receive SMS: Implementing a GSM Protocol in Go »

Gopher Academy Blog

On the tension between generic code and special cases

One possible try: teaching Copy about specific types

Perhaps a better try: decoupling Copy from the world using interfaces

Closing thoughts

Explore →