Generating previews of RAW images in Go, fast

I’m currently toying with making an image viewer in Go, because I’m unhappy with Eye-of-Gnome’s slowness and lack of support for Fujifilm’s RAW format (.RAF files). Yes, I know eog’s being replaced by loupe which is much faster, but it still doesn’t support RAFs, and I don’t like its keyboard shortcuts. Anyway, I just wanted to try to make one for fun.

This article is not about the viewer itself but about one small part of it that I learned along the way: how to open and generate previews of RAF image files.

The image in this article is taken from the awesome raw.pixls.us project which contains RAW files for many camera models, all with CC0/Public domain licensing. Feel free to download a sample X-T5 RAF file here to replicate my findings, or better yet, try to do the same with the RAW format of your favorite camera brand!

(Not) decoding the RAW data

That may or may not seem obvious, but to generate a preview of a RAW image, you absolutely do not want to decode (demosaic) the raw data itself, for 2 main reasons.

The first reason is speed, since demosaicing is very CPU intensive. I won’t even try to find a way to do that in Go as we’ll avoid it entirely anyway. But in order to have some baseline, let’s try demosaicing with simple_dcraw which is included in libraw:

$ time simple_dcraw DSCF0021.RAF

real    0m4.214s
user    0m32.729s
sys     0m1.001s

More than 4 seconds of wall time and 8x that in CPU time. Ouch.

For the second reason, we can just take a look at the image that that command generated:

dcraw-demosaiced image showing a heavy green tint due to wrong white balance

That’s obviously not right… RAW files are, well, raw, and they need a whole bunch of post-processing to get a usable image, not just demosaicing but also white balance, contrast adjustments, cropping…

Even using a more complete pipeline with something like darktable-cli won’t give us an output that looks good out of the box, and it will only be even slower than the plain libraw conversion.

Getting the embedded image

Luckily we don’t have to decode the raw data. Raw files contain an embedded JPEG image (or several, as we’ll see later) for the very purpose of previewing the file. This JPEG has the camera processing applied and looks correct. At least, that’s true of Fujifilm’s RAFs but I expect most, if not all, other camera manufacturers do the same.

To learn the structure of a RAF, we can take a look at the awesome Fileformats wiki.

According to the wiki, the JPEG image offset and length are 2 big-endian uint32 numbers starting at 16 + 4 + 8 + 32 + 4 + 20 = 84 = 0x54

Let’s see what we have:

$ hexdump -C DSCF0021.RAF | head
00000000  46 55 4a 49 46 49 4c 4d  43 43 44 2d 52 41 57 20  |FUJIFILMCCD-RAW |
00000010  30 32 30 31 46 46 31 37  39 35 30 32 58 2d 54 35  |0201FF179502X-T5|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 30 31 30 30  |............0100|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000050  00 00 00 00 00 00 00 94  00 3d 49 d3 00 3d 4b 0c  |.........=I..=K.|
00000060  00 00 56 f4 00 3d a2 00  02 54 39 a0 00 00 00 02  |..V..=...T9.....|
00000070  02 54 39 a0 00 00 00 00  00 00 00 00 00 00 00 00  |.T9.............|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000090  00 00 00 00 ff d8 ff e1  ff a8 45 78 69 66 00 00  |..........Exif..|

Reading 2 uint32(BE) starting at 0x54, we have:

And if we take a look at offset 0x94, what do we have? ff d8 ff ..., yep, that’s a JPEG! Let’s extract it:

$ < DSCF0021.RAF tail -c +$((0x94 + 1)) | head -c $((0x3d49d3)) > embedded.jpg
$ file embedded.jpg
embedded.jpg: JPEG image [...] 4416x2944, components 3

We got ourselves a nice 4416x2944 (13 Mpixels), 4 MB JPEG. That’s actually a very high-res preview!

Now let’s try to use that in a Go program. For compatibility with RAW files from a wide range of manufacturers, I think the best would be to use libraw bindings to parse the file and get the preview image. But for the purposes of this blog post, let’s concentrate on Fuji RAFs and do it as simply as possible.

Generating a preview in Go

Let’s say we want to open the RAF file and find the embedded preview image. Since it’s larger than needed, we’ll also resize it to a more manageable 300x200 px preview.

Here’s some code that can do that. In order to keep it as short as possible, I’ll keep the error handling minimal, sorry if it’s not very idiomatic Go code.

package main

import (
	"encoding/binary"
	"fmt"
	"image"
	"io"
	"os"
	"time"

	"github.com/pixiv/go-libjpeg/jpeg"
	"golang.org/x/image/draw"
)

const TARGET_WIDTH, TARGET_HEIGHT = 300, 200

func checkErr(err error) {
	if err != nil {
		fmt.Println(err)
		os.Exit(1)
	}
}

func findPreviewImage(filename string) io.Reader {
	fd, err := os.Open(filename)
	checkErr(err)

	var hdr struct {
		_      [0x54]byte
		Offset uint32
		Length uint32
	}
	err = binary.Read(fd, binary.BigEndian, &hdr)
	checkErr(err)

	_, err = fd.Seek(int64(hdr.Offset), io.SeekStart)
	checkErr(err)

	return io.LimitReader(fd, int64(hdr.Length))
}

func decodeJPEG(r io.Reader) image.Image {
	img, err := jpeg.Decode(r, &jpeg.DecoderOptions{
		ScaleTarget: image.Rect(0, 0, TARGET_WIDTH*2, TARGET_HEIGHT*2),
	})
	checkErr(err)
	return img
}

func resize(img image.Image, width, height int) image.Image {
	dst := image.NewRGBA(image.Rect(0, 0, width, height))
	draw.BiLinear.Scale(dst, dst.Rect, img, img.Bounds(), draw.Over, nil)
	return dst
}

func main() {
	t0 := time.Now()
	rdr := findPreviewImage(os.Args[1])
	img := decodeJPEG(rdr)
	t1 := time.Now()
	fmt.Println("decoded embedded image:", img.Bounds().Max, "took:", t1.Sub(t0))

	resized := resize(img, TARGET_WIDTH, TARGET_HEIGHT)
	t2 := time.Now()
	fmt.Println("resized image:", resized.Bounds().Max, "took:", t2.Sub(t1))
}
$ time ./genpreview DSCF0021.RAF
decoded embedded image: (4416,2944) took: 236.011982ms
resized image: (300,200) took: 169.488725ms

real    0m0.408s
user    0m0.398s
sys     0m0.007s

Nice, it works. But… 240 ms to read and decode the preview? And 170 ms to resize? Sure, the whole process takes less than half a second and that’s much better than demosaicing, but if you have to generate hundreds of previews, it quickly adds up.

Can we go faster?

Changing the library

The thing is, Go’s standard image library, being in pure Go, is not as efficient as some highly-optimized C libraries. Some would even say that it’s slow.

We can replace the standard image lib with libjpeg(-turbo) with the github.com/pixiv/go-libjpeg bindings. It’s an easy change:

@@ -4,11 +4,11 @@
 	"encoding/binary"
 	"fmt"
 	"image"
-	"image/jpeg"
 	"io"
 	"os"
 	"time"
 
+	"github.com/pixiv/go-libjpeg/jpeg"
 	"golang.org/x/image/draw"
 )
 
@@ -40,7 +40,7 @@
 }
 
 func decodeJPEG(r io.Reader) image.Image {
-	img, err := jpeg.Decode(r)
+	img, err := jpeg.Decode(r, &jpeg.DecoderOptions{})
 	checkErr(err)
 	return img
 }
$ time ./genpreview DSCF0021.RAF
decoded embedded image: (4416,2944) took: 63.597905ms
resized image: (300,200) took: 172.552176ms

real    0m0.239s
user    0m0.226s
sys     0m0.006s

Cool, we cut ~170 ms of decode time with a 2-line change. And by adding a dependency to an external library, sure, but I think it’s worth it.

Can we go faster?

Partial decode

The JPEG format encodes blocks of 8x8 pixels by transposing them into the frequency domain with a discrete cosine transform (DCT). This allows a neat trick if we want to decode an image in a size smaller than its original size: we can just skip decoding the higher frequencies. If we only need an image with 1/8th the resolution of the original, we can even skip computing the inverse DCT altogether and just use the DC (constant) component of each 8x8 block.

libjpeg implements these kinds of scaled decoding optimizations and we can activate them by setting a target size in the jpeg.DecoderOptions. The library will automatically compute a scaling ratio that speeds up decoding while guaranteeing a resulting image at least the target size.

Note however that scaled decoding can introduce artefacts and it’s not as good as bilinear resize. The recommended technique is thus to scale-decode to a slightly larger size than required and then resize to the final size. Let’s set a 2x factor over our desired final target size.

 func decodeJPEG(r io.Reader) image.Image {
-	img, err := jpeg.Decode(r, &jpeg.DecoderOptions{})
+	img, err := jpeg.Decode(r, &jpeg.DecoderOptions{
+		ScaleTarget: image.Rect(0, 0, TARGET_WIDTH*2, TARGET_HEIGHT*2),
+	})
 	checkErr(err)
 	return img
 }
$ time ./genpreview DSCF0021.RAF
decoded embedded image: (1104,736) took: 44.648346ms
resized image: (300,200) took: 12.396289ms

real    0m0.060s
user    0m0.056s
sys     0m0.005s

Now we’re cooking! You can see that we decoded the image at 1/4 resolution. Not only did it improve the decoding time slightly, but even better, by feeding it a smaller image to begin with, the resizing step became more than 10x faster!

Bonus: even faster (but smaller)

A funny thing is that just as the RAF file contains a (slightly) smaller JPEG for preview, that embedded JPEG also embeds an even smaller JPEG! It is really small though… Like, 160x120 px. But if that’s all you need, then getting that doubly-embedded JPEG is going to be the fastest way to get any kind of preview of the RAW file.

The thumbnail is stored in the EXIF data of the larger JPEG preview image. To retrieve it in Go, we can keep our findPreviewImage function from before, parse its EXIF data with the github.com/rwcarlsen/goexif library, and simply call the dedicated JpegThumbnail method.

func getThumbnailFromJPEG(r io.Reader) image.Image {
	ex, err := exif.Decode(r)
	checkErr(err)
	thumb, err := ex.JpegThumbnail()
	checkErr(err)
	img, err := jpeg.Decode(bytes.NewBuffer(thumb), &jpeg.DecoderOptions{})
	checkErr(err)
	return img
}

func main() {
	t0 := time.Now()
	rdr := findPreviewImage()
	img := getThumbnailFromJPEG(rdr)
	t1 := time.Now()
	fmt.Println("decoded embedded thumbnail image:", img.Bounds().Max, "took:", t1.Sub(t0))
}
$ time ./getthumbnail DSCF0021.RAF
decoded embedded thumbnail image: (160,120) took: 1.555741ms

real    0m0.006s
user    0m0.001s
sys     0m0.003s

And that’s after dropping the page cache (sudo sh -c 'echo 1 >/proc/sys/vm/drop_caches'). With the RAW file in cache, the process only takes around 400 µs.