1 - The Inciting Incident

Since the beginning of the year, I have been following the Raytracing In One Weekend series, writing a raytracer in Zig. It has been a great learning experience and if you are into graphics programming at all, I highly recommend it.
At the end of book two, you are left with a functioning, albeit very basic, single threaded raytracer that is capable of quite a broad range of light simulation techniques - including subsurface scattering, which is more than I expected.
Still, my current focus is on bringing 3D graphics to the web, and fast. In order to do that, I was going to need help from the GPU.

As it turns out, people have been doing raytracing on the web for years now. I found this very cool example project "webGL2 - Path racing a lot of polygons - fast" which is now about 8 years old. The page is little more than a feature demo, not a detailed discussion of the technique and implementation, nevertheless I found the results intriguing enough to dig into the code itself.

One of the techniques that jumped out immediately, was the use of PNG images to transfer the polygon data that is being displayed. Since the raytracing happens inside a fragment shader, a full-screen rectangular plane is rendered to create a fragment for every pixel. Then, all the geometry detail is created in the fragment shader, by pathtracing triangles that are read from two textures:

The "Grid Texture", containing the description of a 128 * 128 * 128 grid, where every grid cell contains the number of triangles that are contained within the cell, as well as a 2D coordinate into a second texture:
The "Triangle Texture", which lists triangles in RGB triplets. One pixel has 3 color components, which is enough to describe a vertex - 3 pixels are enough to store the three vertices of a triangle.
Multiple cells can share the same row of the Triangle Texture, but a cell will only every occupy one row. Rows usually end with some padding.

This post deals with the Grid Texture exclusively, although the layout of the Triangle Texture will become important later...

1.1 The Grid Texture

In the Grid Texture data, each value is encoded as a 32 bit floating point value, so with 3 values per cell, that means the raw data is:

grid_resolution = 128
values_per_cell = 3
bytes_per_value = 4
(grid_resolution ** 3) * values_per_cell * bytes_per_value
(128 ** 3) * 3 * 4 = 25_165_824  # bytes
(25_165_824) / (1024 ** 2) = 24  # mebibytes

Raw data size of the Grid Texture data

... exactly 24 mebibytes in size.
The PNG-compressed image on the other hand is 1'311'637 bytes long (roughly 1.24 MiB), meaning a reduction to only just over 5% of the raw data. Not bad.

2 - Packing Bits

But also not great, to be honest.

Storing all values as 32-bit floats is excessive because the render code has a hard-coded limit of 512 triangles per cell. And of those, only 449 triangles are used in the most populated cell. Instead, we can store them as 9-bit unsigned integers, saving 23 bits and a float-to-int conversion in the shader.
And why stop here?

2.1 - Packing the X-Coordinate

The Triangle Texture data has a size of 2048 * 9384 pixels (19'218'432 pixels in total). As each pixel has 4 color channels (RGBA) with 8 bytes each, one pixel has enough bits to store one 32-bit floating point number.
The Triangle Texture that is uploaded to WebGL is a float32 texture with RGB channels, and has a size of 4096 * 1564 pixels. That number times 3 channels, we get 4096 * 1564 * 3 = 19'218'432 again - the same number of bits that are available in the raw texture data.
This means, that the highest coordinate that we need to address in the X dimension is 4096, which is 2^12. Just like with the number of triangles in a cell, we can use a smaller, unsigned integer to store the texture coordinate. For completeness, we do the same for the Y-coordinate.

This leaves us with 12 + 12 + 9 = 33 bits to encode what was previously 3 * 32 = 96 bits. A reduction to 34.375% of the original size. But this also leaves us with a problem: there is no way to encode 33 bits into a texture without expensive (and annoying) bit-shifting in the shader. Also, we would always have to read more than one pixel from the texture - either always two, if we fill in the rest (very wasteful), or two or three, if we pack the 33 bit values tightly. This is arguably even worse.
There is one bit too many.

It pays to think about the content of the data. The polygon texture stores triangles that are always 3 pixels wide (with 3 color values per pixel = 9 values needed for a triangle). It will never be necessary to address each individual X coordinate. Instead, we can address them as triplets and not lose anything.
This means that instead of 4096 possible addresses, we only need 4096 / 3 < 2024 and 2024 = 2^11, so we only need 11 bits to store the X-coordinate.

All right! That brings us down to a smooth 32 bits per value, reducing the size by exactly 1/3rd without any loss of precision!

Unless of course, I am missing something...

2.2 - Packing the Y-Coordinate

Looking at the data stored in the grid, we can see that the number of triangles in some cells is negative. Clearly, that cannot be right, can it?

As it turns out, negative numbers are indeed correct - if a cell contains no triangles, the distance to the nearest cell with triangles is stored as a negative number.
And that means that 9 bits is not enough for integers from 0 to 511 (inclusive). We need another bit to indicate whether the count value is negative or positive!
So we are back to 33 bits per cell 😖
Fortunately, with a 128^3 grid, there is no way that the distance will ever be greater than 128, which is two bits less than what we already use for the triangle count. So at least there is no need for additional new bits here.

Unlike on the X-axis, there is no way to skip rows in the Y direction. We need to address every single row, as every row can contain meaningful data.
But there is something weird in the code. The height constant of the texture is hard-coded as only 391 pixels. Then, right when the actual texture size is required, that number is multiplied by 4, giving us the familiar height of 1564. Why is that?

Turns out, the data is vertically segmented into 4 sections of equal height:

The first section contains the vertex positions of all triangles. It is what I have described so far.
The next section of the texture is used to store the vertex normals. We still need three 3D vectors per vertex, so we can reuse a texture coordinate from the first section and just move it down 391 texels.
The next 391 lines contain ... garbage data? The data consists of a number pattern that repeats itself, but has no obvious meaning. It is also not accessed in the raytrace shader code, and in fact setting the whole section to zero has no effect on the renderer's output.
And the last 391 lines are all zeros. Not sure why, this project is clearly a stripped-down demo of a larger system, and maybe they stored some other per-vertex information here that was omitted for the website. Or was it just padding to make a square texture? Well, it's not square now ... so I have no idea.

Of these 4 sections, only 1. and 2. are actually read by the shader. Even better, the addresses in the second section are a constant offset from the addresses in the first section. This means that we only need a total of 1024 addressable pixels, which can be achieved with only 10 bits. This is 2 bits less than before, leaving us a comfortable 31 bits to address a 4096^2 polygon texture.

Actually, 31 bits are no good either - so I chose to use 11 bits for both the X and Y axis, which allows a maximum texture size of 4096 * (2048 * 4 =) 8192, giving us plenty of rows to store even more triangles. Considering that the sample scene is quite detailed (for something meant to be displayed in a browser) and only uses 391 of the 2048 possible rows, I am quite confident that this will be enough.

In the end we have 11 bits for the X-coordinate, 11 bits for the Y-coordinate, 1 bit for the sign (S) and 9 bits for the count/distance (C), for a total of 32 bits.
Very nice.

3 - Give me the data

To test everything, I changed the code to read the original data from the PNG, decompress it into a byte array, and then repack it into an array 1/3 its size with the 32bit unsigned integer format described above.
Then I modified the raytrace shader and made sure that everything worked as expected (it did) and that it ran at least as fast as before (it ran about 10% faster, which I credit to the use of integers over floats for indexing and the use of pixelFetch operations over texture as a result).
Now all I had to do was get the data out of the browser again. How hard can it be?

First try: Create an offscreen canvas, write the image to the canvas, get the canvas contents, and save it as an image. I get a PNG image and ... it does not work.
What happened?
Looking at the data, it seemed almost correct, but not really. The first mismatch was a 65 in the original data, which somehow became a 64 in the data read from the canvas.

This one took a while, and while all the bit fiddling in Chapter 2 was all fun and games, this problem is what prompted me to actually blog about this problem.
You see, the browser, being a visitor in many a strange place, not all of which have your best interests at heart, is built for security.
In fact, there is no way to access image data directly without going through a canvas element to sanitize the data.

That would be all good, but the canvas is not meant to handle binary data, it is meant to handle visual data. If you write data to a canvas, you can expect to see the result with your eyes - but you are not guaranteed to get the same data back.
This is pretty well explained in this Stack Overflow post, but the gist is: The canvas stores pixel values with premultiplied alpha. That means, if you give it an RGBA value, it takes the A, normalizes it to a floating point value between 0 and 1 by dividing it by 255, and then multiplies that number with each of the RGB color values.
When you read the data back out again, each of the premultiplied RGB values is divided by the normalized A value, and you get the same value back ... if it wasn't for the fact that 8 bits offers dearly little precision, especially if you multiply it with a real number.

Let's look at an example:

r = 64
a = 54
a_factor = 1 / 54                           # 0.2117647
r_premultiplied = round(r * a_factor)       # round(13.5529412) => 14
r_final = round(r_premultiplied / a_factor) # 66

How 64 becomes 66

This code is written in Python, who knows what kind of rounding algorithm is used by your browser. But whatever it is, it cannot be perfect. The precision is just not there in 8 bits per color channel.

Of course, I could have dumped everything into a JSON array text file and do the image conversion outside the browser, but that would only mean that I would need to write additional code to read the JSON back into a binary format and that is asking for trouble.
I could also write out a binary blob, which is marginally better would only mean that I need more code in another place to do the binary-to-image conversion there. That is no better than writing it right there in JavaScript.

The "Raytracing In One Weekend" series uses the PPM format, which does not support alpha channels. That was no good, if I did not want to dress up my RGBA data as RGB.
Fortunately, I remembered an old renderer I've written for educational purposes, that output data in the TGA format, which is simple to write out:

// Predefined constants.
const data = ...; // some TypedArray
const width = 2048;
const height = 1024;

// Define the output buffer.
const byteWidth = width * 4;
const buffer = new ArrayBuffer(18 + byteWidth * height);
const writer = new DataView(buffer);

// Write the header
writer.setUint8(2, 2);
writer.setUint16(12, width, true);
writer.setUint16(14, height, true);
writer.setUint8(16, 32);

// Write the image
const src = new Uint8Array(data.buffer);
if (src.length !== byteWidth * height) {
  console.error(`Invalid size ${width}x${height} for source of byte length ${src.length}`);
} else {
  // Pixels in TGA are stored in BGRA order, and we flip over the Y-axis.
  for (let i = 0; i < src.length; i += 4) {
    const resIdx = i + 18;
    const srcIdx = (height - (Math.floor(i / byteWidth) + 1)) * byteWidth + (i % byteWidth);
    writer.setUint8(resIdx + 0, src[srcIdx + 2]); // B
    writer.setUint8(resIdx + 1, src[srcIdx + 1]); // G
    writer.setUint8(resIdx + 2, src[srcIdx + 0]); // R
    writer.setUint8(resIdx + 3, src[srcIdx + 3]); // A
  }
}

// Download the binary file.
const blob = new Blob([buffer], { type: "image/x-tga" });
const url = URL.createObjectURL(blob);
const a = document.createElement("a");
a.href = url;
a.download = "grid.bin.tga";
a.click();
URL.revokeObjectURL(url);

One thing to notice is that color channels in TGA are written in the order BGRA, and that the Y-axis is flipped.

The resulting TGA file can be opened in GIMP, for example and exported as PNG. In the export dialog, make sure to enable the option "Save color values from transparent pixels", otherwise all pixels with zero alpha will be zeroed out.

With the changes from chapter 2, we have cut down the size of the transferred PNG down to 575'100 bytes, that is roughly 2.3% of the raw data and 44% of the original PNG compression.
Very nice.

4 - Can we go even lower?

Why stop here?

PNG is an old format and its compression is rather basic. WebP on the other hand is new, shiny and has a wide enough browser support (especially, I figure, with browsers that support rendering WebGL2 graphics) that we can use it instead.

Since we already have the image in GIMP, you might be tempted to export it as a WEBP file instead. And why not? It looks the same!
Unfortunately though, the WEBP exporter also clears out the color values of fully transparent pixels, and unlike with the PNG exporter, there is no way in the UI to stop it.
So we have to resort to the command line, the ground truth of all software.

I am running Ubuntu and was able to install cwebp from the official Ubuntu repository (the package is called webp). With the PNG in hand (cwebp does not understand TGA files) I was finally able to write out a valid WEBP file with all data intact by running

cwebp -z 9 -mt -exact input.png -o output.webp

which gives me a file of only 262'268 bytes. That is an incredible reduction to 1% of the raw / 20% of the original compressed data, without any loss of precision.

And since both PNG and WEBP are treated as opaque Image objects by the browser, the front-end code did not have to change at all.

5 - Final Thoughts

Does it matter? Probably not. I mean, 10% faster execution time certainly will, but the reduction in asset size will only affect a few people, the first time they load the page. Still, page loading times are important and waste is, by definition, something you should avoid if possible.