2025-07-27

A solution to jub0bs’s inlinability challenge

I just stumbled upon a very nice Go puzzle from jub0bs, I invite you to give it a go before reading the rest of this post.

Here is the link: https://jub0bs.com/posts/2025-04-30-inlinability-challenge/

I decided to try it myself, and I came up with what I think is a valid solution, with a lower inlining cost (74 vs the author’s 78). I also think it’s different enough from the author’s solution to be worthy of note.

The original implementation’s bug, and the idea

NB: In this section I ramble a bit about how I came up with the intuition of a solution, feel free to skip to the next section for the actual solution.

The first thing I did after cloning the author’s code was to read the tests. One of them immediately jumped out at me. The a tolerated amount of OWS and nothing else test tests that the string "\t " (2 OWS) can be successfully trimmed down to the empty string when n==1. It seemed to imply that it was OK to divide an OWS-only string into two halves of up to n OWS. I found that very strange, given that the initial implementation of TrimOWS uses two successive functions to trim first from the left, then from the right of the string.

	s, ok := trimLeftOWS(s, n)
	if !ok {
		return "", false
	}
	s, ok = trimRightOWS(s, n)
	if !ok {
		return "", false
	}

I expected TrimOWS to fail because from the trimLeftOWS function’s point of view, "\t " should start with 2 OWS and that should be enough to return an error, without trimRightOWS even running. Splitting the OWS equally between left and right would require synchronization that I just didn’t see in the code.

At that point we could also wonder, what do we want TrimOWS to return in that case? Well, the contract of the function is a bit ambiguous, it says

If no more than n bytes of OWS are found at the start of s and no more than n bytes of OWS are found at the end of s, it returns the trimmed result and true

It doesn’t say whether these OWS from the start and end can be the same or not, if there are no non-OWS in between to unambiguously separate them. It’s a specification hole, and I think there are two reasonable options:

We double-count, i.e. "\t " is 2 OWS at the start and 2 OWS at the end, In that case TrimOWS("\t ", 1) should return false.
We avoid double-counting and divide the OWS equally from both sides, i.e. "\t " is 1 OWS at the start and 1 OWS at the end. In that case TrimOWS("\t ", 1) should return "", true.

Given the test case I mentioned above, I’d wager the author intended option #2. But then logically, the string "\t \t " (4 OWS) could be cleanly divided into 2 starting OWS and 2 ending OWS, and thus TrimOWS("\t \t ", 2) should also return "", true, right? The author’s implementation instead returns false here. This inconsistency is why I would call it a bug.

It turns out, this odd behavior is caused by a logic/off-by-one error in the trimLeftOWS function, whereby one extra OWS can be trimmed without returning false as long as the string becomes empty.

// these correctly return _, false because "a" is preceded by one-too-many OWS
trimLeftOWS(" a", 0)
trimLeftOWS("  a", 1)
trimLeftOWS("   a", 2)
// these should return _, false as well, but they return "", true instead
trimLeftOWS(" ", 0)
trimLeftOWS("  ", 1)
trimLeftOWS("   ", 2)

This bug means we allow strings with n+1 OWS, while the contract of option #2 should allow strings with 2*n OWS. In the case of n==1, that works because 1+1 == 2*1, but this fails for any higher values of n. It’s not caught in tests since there are no cases for n!=1.

The fix of the underlying OB1 error is very simple:

@@ -24,12 +24,12 @@ func TrimOWS(s string, n int) (string, bool) {
 func trimLeftOWS(s string, n int) (string, bool) {
        var i int
        for len(s) > 0 {
-               if i > n {
-                       return "", false
-               }
                if !isOWS(s[0]) {
                        break
                }
+               if i >= n {
+                       return "", false
+               }
                s = s[1:]
                i++
        }

However, by fixing it, it assumes we follow the contract option #1, disallowing strings that contain (only) more than n OWS.

ANYWAY, why am I boring you with all of this? That bug itself is not very important, but it gave me the feeling that contract option #2 was the intended one, and in that case, the implementation with 2 loops, trimming one side after the other, is not very practical in order to implement that behavior. It we want to remove OWS from both sides at the same time, it makes more sense to use a single loop. Having only one loop also seemed like a good idea to keep the inlining cost down. Therefore, that’s what I went with.

My solution

Here is a very straightforward one-loop solution:

func TrimOWS(s string, n int) (string, bool) {
	for ; n >= 0; n-- {
		noMoreOWS := true
		if len(s) > 0 && isOWS(s[0]) {
			s = s[1:]
			noMoreOWS = false
		}
		if len(s) > 0 && isOWS(s[len(s)-1]) {
			s = s[:len(s)-1]
			noMoreOWS = false
		}
		if noMoreOWS {
			return s, true
		}
	}
	return "", false
}

It iterates up to n times, removing OWS from both sides every time, until there’s no more OWS; or if n is reached, it returns false. Quite simple and readable, I think.

This passes all the tests, has no (implicit) bounds checks (we do them explicitly with the len(s) > 0)… Is it inlinable?

$ go build -gcflags '-m=2' . 2>&1 | grep TrimOWS
./ows.go:9:6: can inline TrimOWS with cost 80 as: func(string, int) (string, bool) { for loop; return "", false }

Yes! Just barely! Can we do better?

Further optimised

At that point, I was pretty happy to have found a solution, so I read the rest of the author’s post, and implemented some of the hints he gave.

First, since we’re targeting Go 1.22+, we can use a range-over-integer:

-	for ; n >= 0; n-- {
+	for range n + 1 {

This alone brings the inlining cost down to 77, already beating the author’s 78.

Second, we can use named returns. The author uses “naked” returns which is just a special case of named returns, but here we can benefit from actually naming the returned boolean so we can use it in the function’s body:

func TrimOWS(s string, n int) (s2 string, ok bool) {
	s2 = s
	for range n + 1 {
		ok = true
		if len(s2) > 0 && isOWS(s2[0]) {
			s2 = s2[1:]
			ok = false
		}
		if len(s2) > 0 && isOWS(s2[len(s2)-1]) {
			s2 = s2[:len(s2)-1]
			ok = false
		}
		if ok {
			return
		}
	}
	return
}

That’s my ultimate solution, which has an inlining cost of 74.

Benchmarks

Of course that solution has distinct performance characteristics. Since it removes OWS from both sides at once, we can expect it to be slower in cases where there is too much leading OWS, where the two-loop version would have just done half the work before returning false. It would also probably be slower with higher n values and asymmetrical OWS since it would continuously check isOWS on both ends even if one is not OWS.

But in other cases it can actually be faster! Let’s compare with the author’s solution:

goos: linux
goarch: amd64
pkg: headers
cpu: AMD Ryzen 7 3700X 8-Core Processor             
                                                      │ author_solution │             my_solution             │
                                                      │     sec/op      │   sec/op     vs base                │
TrimOWS/empty-16                                            1.797n ± 1%   1.794n ± 3%        ~ (p=0.591 n=10)
TrimOWS/no_OWS-16                                           3.083n ± 2%   2.319n ± 1%  -24.77% (p=0.000 n=10)
TrimOWS/internal_OWS-16                                     3.083n ± 1%   2.308n ± 4%  -25.11% (p=0.000 n=10)
TrimOWS/leading_and_trailing_OWS-16                         4.104n ± 3%   3.631n ± 1%  -11.52% (p=0.000 n=10)
TrimOWS/a_tolerated_amount_of_OWS_around_non-OWS-16         4.107n ± 0%   3.623n ± 1%  -11.81% (p=0.000 n=10)
TrimOWS/a_tolerated_amount_of_OWS_and_nothing_else-16       3.590n ± 1%   2.827n ± 1%  -21.27% (p=0.000 n=10)
TrimOWS/non-OWS_whitespace-16                               3.333n ± 0%   3.943n ± 2%  +18.27% (p=0.000 n=10)
TrimOWS/too_much_leading_OWS-16                             2.253n ± 2%   3.672n ± 1%  +62.98% (p=0.000 n=10)
TrimOWS/too_much_trailing_OWS-16                            4.377n ± 1%   3.908n ± 1%  -10.73% (p=0.000 n=10)
TrimOWS/too_much_leading_and_trailing_OWS-16                2.267n ± 1%   3.626n ± 1%  +59.95% (p=0.000 n=10)
geomean                                                     3.081n        3.065n        -0.50%

On average it’s very close, and it has less extreme worst-case values. I wouldn’t say it’s necessarily better in general, but at least for the n==1 case it seems fine.

Thanks to jub0bs for the wonderful challenge!

<- Previous post
Fixing Waveshare e-paper display partial updates

Next post ->
Generating previews of RAW images in Go, fast