Building the Library of Babel in Go

Building the Library of Babel in Go

- 21 mins

Most of my Go experience before this project was through IaC with Pulumi. Useful work, but it barely scratches the surface of the language. I have been writing Rust for a while, and I found Go’s concurrency model genuinely odd at first. Channels and goroutines felt strange coming from Rust’s ownership model. The syntax took getting used to too. But the more I used it, the more it grew on me. Go is not trying to be Rust. It is simpler, more opinionated, and that simplicity turns out to be a feature.

This project was an excuse to go deeper.

The Library of Babel

The Library of Babel is a short story by Jorge Luis Borges. The premise is a library that contains every possible 410-page book. Not just the books that have been written, but every combination of characters that could exist. Your biography, before you have lived it. This documentation. Complete nonsense. Every text, in every order, somewhere on a shelf.

Jonathan Basile built a working web implementation of this. A few months ago, this video by Merlin on YouTube was randomly recommended to me. I watched it and immediately fell into a rabbit hole about deterministic randomness.

The interesting part is not that the library contains everything. It is how it does it. The content of any page is not stored anywhere. It is computed on demand, deterministically, from its address. Search for a piece of text, and you get back a location in the library where that exact text appears, generated from the search itself as a seed. The library does not remember your query. It just consistently gives you the same page for the same input every time.

That idea stuck with me. I wanted to build it.

The Character Set and Base29 Encoding

The original Borges concept uses a small alphabet. Most implementations converge on a 29-character set: lowercase a to z, a space, a comma, and a period. That gives you enough to write something coherent, and 29 is a prime number, which has nice properties for base conversion. Some implementations go further: this TypeScript implementation by tdjsnelling extends the charset to 32 characters. I chose to stick with the original 29. I did borrow the addressing system from that project though, which I found quite intuitive.

charset := " abcdefghijklmnopqrstuvwxyz,."

Each page in the library is 3,200 characters (40 lines of 80 characters). To find a page, you convert its content into a single base-29 number. That number, decomposed into its hierarchical parts, gives you the address. The conversion treats each character as a digit in base 29: space = 0, a = 1, b = 2, and so on through to period = 28.

Going the other way (address back to page content) is repeated division by 29, collecting remainders, then mapping each remainder back to a character. The library is just a very large number space dressed up as rooms and shelves.

The thing that makes search work is using the search text itself as a seed to generate a deterministic position. When you search for “hello”, the library does not scan billions of pages looking for the word. It hashes “hello” with SHA256, seeds an RNG from that hash, generates a full 3,200-character page with “hello” planted at a position within it, then converts that page to its base-29 address.

func (l Library) seedPageChars(text string, variant int) string {
    input := fmt.Sprintf("%s\x00%d", strings.ToLower(text), variant)
    textHash := sha256.Sum256([]byte(input))
    textSeed := int64(binary.BigEndian.Uint64(textHash[:8]))
    rng := rand.New(rand.NewSource(textSeed))

    maxPosition := charsPerPage - len(text)
    position := rng.Intn(maxPosition + 1)

    pageChars := make([]byte, charsPerPage)
    for i := range charsPerPage {
        pageChars[i] = l.charset[rng.Intn(len(l.charset))]
    }

    copy(pageChars[position:], []byte(strings.ToLower(text)))
    return string(pageChars)
}

seedPageChars is called by generateBase29Number, which walks the page character by character and builds the actual base-29 number:

func (l Library) generateBase29Number(text string, variant int) (*big.Int, error) {
    result := big.NewInt(0)
    pageChars := l.seedPageChars(text, variant)

    for _, char := range pageChars {
        result.Mul(result, l.base)
        index := l.charToIndex[char]
        result.Add(result, big.NewInt(int64(index)))
    }

    return result, nil
}

Each character is treated as a digit: multiply the running total by 29, add the character’s index. After 3,200 characters you have a single very large integer that uniquely encodes the page. That number is what gets decomposed into the hexagon, wall, shelf, book, and page coordinates.

The variant parameter is how multiple results work. Searching with variant 0 gives you one address, variant 1 gives you a different one. The same text plus a different variant integer produces a different seed, a different page, a different address. Every text “appears” in the library an essentially unlimited number of times, just at different coordinates.

How Many Results?

This is where the math got interesting, and I will be honest: I relied on AI a lot here because combinatorics is not really my forte.

The number of locations where a given text appears should scale inversely with the length of that text. A single character like “a” appears everywhere. The phrase “to be or not to be” appears much less frequently. A full paragraph is vanishingly rare.

I modeled this with exponential decay:

const exponentialDecayRate = 1.10

func (l Library) GetOccurrenceCount(text string) int {
    textLen := len(text)
    maxCount := 1_000_000_000 // 1 billion for single characters
    baseCount := maxCount / int(math.Pow(exponentialDecayRate, float64(textLen-1)))
    baseCount = max(1, baseCount)

    hash := sha256.Sum256([]byte(strings.ToLower(text)))
    seed := int64(binary.BigEndian.Uint64(hash[:8]))
    rng := rand.New(rand.NewSource(seed))

    variation := max(1, baseCount/4)
    adjustment := rng.Intn(2*variation) - variation

    return max(1, baseCount+adjustment)
}

Starting at 1 billion for a single character, each additional character divides the result by 1.10. By the time you reach a moderately long phrase, you are down to a handful of results. The ±25% variation seeded from the text hash gives each query a consistent but unique feel. The tradeoff is that the billion cap is completely hardcoded. It is a reasonable approximation of “effectively infinite”, not something derived from first principles. Good enough for this.

The Addressing System

Every page in the library has a five-part address: <hexagon>.<wall>.<shelf>.<book>.<page>. The structure mirrors Borges’ description: hexagonal rooms with four walls, each holding five shelves, each shelf holding 32 books, each book with 410 pages.

Converting a base-29 number to an address is a series of modular divisions:

func newFromBase29Number(n *big.Int) *Location {
    temp, quotient := new(big.Int).Abs(n), new(big.Int)

    page := new(big.Int)
    quotient, page = quotient.DivMod(temp, big.NewInt(pagesPerBook), page)
    temp.Set(quotient)

    book := new(big.Int)
    quotient, book = quotient.DivMod(temp, big.NewInt(booksPerShelf), book)
    temp.Set(quotient)

    shelf := new(big.Int)
    quotient, shelf = quotient.DivMod(temp, big.NewInt(shelvesPerWall), shelf)
    temp.Set(quotient)

    wall := new(big.Int)
    quotient, wall = quotient.DivMod(temp, big.NewInt(wallsPerHexagon), wall)
    temp.Set(quotient)

    return &Location{
        Hexagon: quotient.Text(36),
        Wall:    int(wall.Int64()),
        Shelf:   int(shelf.Int64()),
        Book:    int(book.Int64()),
        Page:    int(page.Int64()) + 1,
    }
}

The hexagon identifier is whatever is left after stripping out wall, shelf, book, and page. It can be an astronomically large number, so it is stored as a base-36 string (digits and lowercase letters). An address ends up looking something like 3q7z.2.1.14.207. The hierarchy is reversible: you can reconstruct the original base-29 number from any valid address, which is how browsing works.

Pagination

One thing the original concept glosses over is what pagination even means when a search for “a” returns close to a billion results.

The variant approach makes this tractable. Variants are sequential integers starting from 0. Pagination is just offset and limit over that sequence:

func (l Library) SearchPaginated(text string, offset, limit int) ([]*Location, error) {
    totalCount := l.GetOccurrenceCount(text)

    if offset < 0 {
        return nil, errors.New("offset cannot be negative")
    }
    if limit <= 0 {
        return nil, errors.New("limit must be positive")
    }
    if offset >= totalCount {
        return []*Location{}, nil
    }

    endIndex := min(offset+limit, totalCount)
    locations := make([]*Location, 0, endIndex-offset)

    for variant := offset; variant < endIndex; variant++ {
        bigInt, err := l.generateBase29Number(text, variant)
        if err != nil {
            return nil, fmt.Errorf("error generating location for variant %d: %w", variant, err)
        }
        locations = append(locations, newFromBase29Number(bigInt))
    }

    return locations, nil
}

Each page of results is a slice of variant integers. No state, no cursors, no database. You can reproduce any page of results for any query deterministically.

Hacks and Loose Ends

There are a few places in the code where I left honest markers of imperfection.

The significant one is in base29NumberToString. The browse API accepts any valid address, and hexagon identifiers can be any integer from 1 up to the maximum big.Int. If you manually navigate to a small address like 1.0.0.0.1, the corresponding base-29 number is tiny, and decoding it back to characters produces a single-character page. A page with one character is not ideal. The hack is to take whatever characters the conversion produced, hash them, seed an RNG from that hash, and fill the remaining space. It preserves determinism (the same small address always fills the same way) but it is not elegant.

Concurrency and the Worker Pool

SearchStream was my first real exercise in Go concurrency. Rather than blocking until all results are computed, it returns a channel immediately and lets the caller consume locations as they are generated. The full implementation looks like this:

func (l Library) SearchStream(ctx context.Context, text string) (<-chan *Location, error) {
    totalCount := l.GetOccurrenceCount(text)
    locationChan, workerChan := make(chan *Location), make(chan int, 100)
    numWorkers := runtime.NumCPU()
    var wg sync.WaitGroup

    for range numWorkers {
        wg.Go(func() {
            for {
                select {
                case <-ctx.Done():
                    return
                case variant, ok := <-workerChan:
                    if !ok {
                        workerChan = nil
                        continue
                    }
                    bigInt, err := l.generateBase29Number(text, variant)
                    if err != nil {
                        continue
                    }
                    location := newFromBase29Number(bigInt)
                    select {
                    case locationChan <- location:
                    case <-ctx.Done():
                        return
                    }
                }
            }
        })
    }

    go func() {
        defer close(workerChan)
        for variant := range totalCount {
            select {
            case workerChan <- variant:
            case <-ctx.Done():
                return
            }
        }
    }()

    go func() {
        defer close(locationChan)
        wg.Wait()
    }()

    return locationChan, nil
}

There are three moving parts.

The dispatcher goroutine sends variant integers (0, 1, 2…) into a buffered workerChan. The buffer size of 100 means the dispatcher can stay ahead of the workers without blocking. It stops early if the context is cancelled.

The worker goroutines (one per CPU) pull variants from workerChan, compute a location, and send it to locationChan. Each of the two send operations checks ctx.Done() so that a cancelled context unblocks workers whether they are waiting to receive a job or waiting to send a result.

The closer goroutine waits for all workers to finish via sync.WaitGroup, then closes locationChan. The consumer knows the stream is exhausted when the channel closes.

Goroutine Leaks

The more interesting case is when you only want a subset of results. The test below reads exactly 100 locations then stops:

func TestLibrarySearchStream(t *testing.T) {
    library := New()
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()

    results, err := library.SearchStream(ctx, searchText)
    if err != nil {
        t.Errorf("search stream failed: %v", err)
    }

    locations := []*Location{}
    limit := 100
    for range limit {
        location := <-results
        locations = append(locations, location)
    }
    cancel()

    if l := len(locations); l != limit {
        t.Errorf("expected %d locations, got %d", limit, l)
    }
}

Once the loop exits, we call cancel() explicitly rather than relying on defer. Both work, but calling it immediately after collecting the results we need is the right instinct: it signals the workers to stop as soon as possible rather than waiting until the function returns.

Without that cancellation, you get a goroutine leak. Here is the scenario. locationChan is unbuffered. Once the consumer stops reading, the next worker trying to send a location will block forever on this:

select {
case locationChan <- location: // blocked: nobody reading
case <-ctx.Done():             // never fires: context not cancelled
    return
}

All workers stall. wg.Wait() never returns. locationChan is never closed. Everything is stuck.

Cancelling the context is what breaks the deadlock. The case <-ctx.Done() branch fires in every worker, they all return, wg.Wait() completes, and locationChan closes. Whether you call cancel() explicitly after collecting what you need or rely on defer, the important thing is that it always runs.

The dispatcher has the same guard. If the context is cancelled while the dispatcher is still enqueuing variants, it exits immediately rather than blocking on a full workerChan:

select {
case workerChan <- variant:
case <-ctx.Done():
    return
}

The Nil Channel Trick

There is one subtlety in how workers handle a closed workerChan. When the dispatcher has sent all variants and its goroutine exits, it closes workerChan. A worker receives the zero value with ok = false, then does something that looks odd:

case variant, ok := <-workerChan:
    if !ok {
        workerChan = nil
        continue
    }

It sets its local copy of workerChan to nil and loops. A receive on a nil channel blocks forever. On the next iteration, the worker’s select effectively becomes:

select {
case <-ctx.Done():
    return
case variant, ok := <-nil: // blocks forever, never selected
    ...
}

The worker is now parked, waiting only for context cancellation. This is intentional. The alternative would be for the worker to return immediately when workerChan closes, but that would decrement the WaitGroup before the worker has necessarily finished sending its last computed location to locationChan. Staying alive until ctx.Done() ensures the worker fully drains before the WaitGroup completes and locationChan is closed.

Coming from Rust, this all felt foreign. In Rust the compiler enforces the invariants: ownership, Send/Sync bounds, lifetime constraints. In Go you reason about it yourself and encode the contract in patterns like this one. The channel syntax took time to internalize, especially select across multiple channels. I leaned heavily on Learning Go by Jon Bodner throughout this project, particularly for the concurrency chapter and general idiomatic patterns. It is a well-written book and I would recommend it to anyone coming from a statically typed language who finds Go’s conventions a bit opaque at first. The mental model holds up once it clicks. Channels are typed queues, goroutines are lightweight threads, and the patterns (worker pool, fan-out, context cancellation) are consistent everywhere you look in the ecosystem.

The CLI and Web App

You can interact with the library through a CLI or the web app at babel.c12i.xyz. For the CLI, I used kong. My bias towards it is straightforward: coming from Rust, I am used to defining CLI commands as structs and letting the framework derive the parser from struct field tags, which is exactly how clap works. kong is the closest equivalent in Go, and that familiarity made it an easy call over something like cobra.

For the web layer, I used Gin. It is fast, the routing API is clean, and the choice did not need much deliberation. The UI is mostly AI-generated. I gave Claude some rough requirements for a search form, results list, and browse view with previous/next navigation, and iterated from there. The interesting parts of this project were in the library core, not the frontend.


This was a genuinely enjoyable side project. It pushed me further into Go than IaC work ever did, gave me a working understanding of goroutines and channels, and produced something I find kind of beautiful: an infinite library you can search, that requires no storage, generates every page on demand, and always gives you the same answer for the same question.

The code is open source at github.com/c12i/babel-go. The web app is live at babel.c12i.xyz. Go check what the library has to say about your name.

comments powered by Disqus
rss facebook twitter github gitlab youtube mail spotify leetcode lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora keybase