Using a JSON File as a Database Safely in Go
There are definitely problems with using a json file as a database, but sometimes the simplicity of no extra dependencies makes it an attractive option. The two biggest problems are performance and managing concurrent reads and writes.
We can’t do much about performance, but with Go, managing concurrent reads and writes is a breeze!
Below is a walk through of a method for managing file access so that a json file can safely be used as a database.
Building a “TODO” service backed by a json file database
The general pattern I’ll be implementing is: set up a channel and push read/write jobs onto it. Meanwhile, run a goroutine which will consume those jobs ensuring exclusive access to the json file. (Remember, Go’s philosophy towards concurrency is: Don’t communicate by sharing memory; share memory by communicating. This is why we’re going to pipe requests to the component responsible for accessing the json db file rather then screwing around with lock files or synchronization.)
Follow along below where I will put it all together, or jump here for the finished product: a full REST service for managing your todos, backed by a json file database.
db.json
Here’s what my database looks like. It’s just a map of id => Todos. (I’m using a uuid
for the Id.)
1 2 3 4 5 6 7 8 9 10 |
{ "1a6e9148-ebe5-4bf0-9675-b76f9fab7b72": { "id": "1a6e9148-ebe5-4bf0-9675-b76f9fab7b72", "value": "Hello World" }, "3e39df85-9851-4ce9-af0c-0dd831e3b970": { "id": "3e39df85-9851-4ce9-af0c-0dd831e3b970", "value": "Hello World2" } } |
Todo
And here’s the api model we’ll be marshalling / unmarshalling it with:
1 2 3 4 |
type Todo struct { Id string `json:"id"` Value string `json:"value" binding:"required"` } |
main.go
In the entry point we set up our job channel, start our job processor so we’re ready when the jobs
start rolling in, and then initialize a TodoClient
which insulates us from the details of the job channel.
1 2 3 4 5 6 7 8 9 10 |
db := "./db.json" // create channel to communicate over jobs := make(chan Job) // start watching jobs channel for work go ProcessJobs(jobs, db) // create client for submitting jobs / providing interface to db client := &TodoClient{Jobs: jobs} |
Job Processor
This is the the hub of our database. ProcessJobs
is run as a goroutine so it just hangs out running in an infinite for loop waiting for work in the form of a Job
. A job’s Run
method is where the work happens: it takes in the database data (all of it! remember, this is never going to be performant with a ton of data, so let’s just make things easy on ourselves and only operate on our database in its entirety) and returns the updated database data. The Job Processor then writes the modified database model back to disc before moving on to the next job. (There’s also a shortcut in place where if Run
returns nil, that means nothing was modified so we can skip the write.)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
type Job interface { ExitChan() chan error Run(todos map[string]Todo) (map[string]Todo, error) } func ProcessJobs(jobs chan Job, db string) { for { j := <-jobs // Read the database todos := make(map[string]Todo, 0) content, err := ioutil.ReadFile(db) if err == nil { if err = json.Unmarshal(content, &todos); err == nil { // Run the job todosMod, err := j.Run(todos) // If there were modifications, write them back to the database if err == nil && todosMod != nil { b, err := json.Marshal(todosMod) if err == nil { err = ioutil.WriteFile(db, b, 0644) } } } } j.ExitChan() <- err } } |
Read Todos Job
Here’s one of our jobs for interacting with the database. This job simply implements the Job
interface and adds in a “todos” channel so we can also return data. Since the job processor is in charge of accessing the db file, all the Run
function does is pass the todos map to the todo
response channel and return nil
since there were no modifications.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
// Job to read all todos from the database type ReadTodosJob struct { todos chan map[string]Todo exitChan chan error } func NewReadTodosJob() *ReadTodosJob { return &ReadTodosJob{ todos: make(chan map[string]Todo, 1), exitChan: make(chan error, 1), } } func (j ReadTodosJob) ExitChan() chan error { return j.exitChan } func (j ReadTodosJob) Run(todos map[string]Todo) (map[string]Todo, error) { j.todos <- todos return nil, nil } |
Todo Client
This is the piece which the rest of your application will interact with. It encapsulates the mess associated with pushing jobs and waiting for a response and signal to come through on the error channel. It also maps the raw database model into a more reasonable result (a slice in this case.)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
// client for submitting jobs and providing a repository-like interface type TodoClient struct { Jobs chan Job } func (c *TodoClient) GetTodos() ([]Todo, error) { arr := make([]Todo, 0) // Create and submit the job job := NewReadTodosJob() c.Jobs <- job // Wait for processing to complete if err := <-job.ExitChan(); err != nil { return arr, err } // Collect the found Todos todos <-job.todos // Convert Map to Slice for _, value := range todos { arr = append(arr, value) } return arr, nil } |
Exposing it to the web
At this point, you might be thinking: “The only reason we have to worry about concurrent writes is because you put the read/write operations in a goroutine. A single routine would provide safe reads and writes too.”
But as soon as we turn this into a web service, all bets are off. Below I layer in an http server (using the Gin framework) to utilize our TodoClient
and illustrate the example.
main.go
Same as before, but now there’s a /todo
endpoint for getting all todos
The full example is more built out with a POST, GET by id, PUT, and DELETE
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
db := "./db.json" // create channel to communicate over jobs := make(chan Job) // start watching jobs channel for work go ProcessJobs(jobs, db) // create dependencies client := &TodoClient{Jobs: jobs} handlers := &TodoHandlers{Client: client} // configure routes r := gin.Default() r.GET("/todo", handlers.GetTodos) // start web server r.Run(":8080") |
Handlers
And last but not least, we leverage the TodoClient
to get some data… safely!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
type TodoHandlers struct { Client *TodoClient } // Get all todos as a slice func (h *TodoHandlers) GetTodos(c *gin.Context) { todos, err := h.Client.GetTodos() if err != nil { log.Print(err) c.JSON(500, "Internal Server Error") return } c.JSON(200, todos) } |
Final thoughts
Why is this better than managing lock files, synchronizing access, and more generally sharing the database across goroutines? Because those techniques impose limitations on your design. If instead you compose your app such that modifications to the shared resource are communicated to a single component (the ProcessJobs
goroutine) responsible for modifying the resource, you’ve eliminated the need for sharing direct access to the database. Again, this is the Go concurrency philosophy: Don’t communicate by sharing memory; share memory by communicating
Thanks for following along! Check out the full example.
p.s. What led me down this path? We use Composer for package management in our PHP stack and need to run a private package repo for tracking internal code. Satis is an excellent tool for building private Composer package repositories, but it is just a static site generator that gets its repository list from a json config file. I used the techniques above to create Satis-Go which exposes the config file as a REST api and performs the static generation of the package repository index when modifications are made. From there, adding in an admin ui and incorporating web-hooks for triggering re-indexing was straight forward.