RGBA-Pal and PNG-8

Motivation

Some time ago, I was reading the PNG specification, and I noticed something.  With indexed-color (palette-style) PNG images, individual palette entries can have arbitrary opacity values.  Typically, at least from what I'd seen, an indexed-color PNG image might have one palette entry that's completely transparent, and the rest of the entries are completely opaque.  This means a pixel in the image is either transparent, or one of 255 fully opaque colors, like a GIF image with a transparent color.  The only images I'd seen with translucent pixels (either for anti-aliasing the edge of an opaque object against the background, or for actual translucent parts of the image) were stored as full-color RGB plus an alpha channel.  Furthermore, Paint Shop Pro 8 (the latest, most feature-rich image editor I have access to) offers PNG export in indexed-color mode with optionally one transparent color, or RGB or grayscale with optionally an alpha channel.  As far as I knew, no software made use of multiple translucent palette entries when writing PNG images.

Time went by, and I found myself producing quite a few images which contained few colors, but required translucent parts.  These consisted mostly of maps, diagrams, and simple illustrations, and sometimes they were nearly a megapixel in size.  Saving these as full-color RGBA images resulted in file sizes around 100kB in some cases, which is annoyingly big for people on slow Internet connections.  (Yes, this is still a concern: not everyone has broadband access at home, mobile Internet access can be very slow in rural areas, and there are still other situations where a person's only means of accessing the Internet offers limited bandwidth.)  The last straw was a weather map overlay, as part of a weather radar page I made to be convenient for my own mobile access.  I didn't want the highway markers to be fully opaque, as they would obscure too much of the radar image, and the full RGBA version of the overlay was just too big.  I came up with a version that used dithering to achieve the desired result, and more elegantly than PSP's built-in error diffusion dithering.  Although I'd arrived at a satisfactory result, I was tired of employing workarounds like that.

Enter RGBA-Pal

I briefly looked into libpng, but I'm still a bit intimidated by some aspects of coding in C++.  I had done well with that language in high school, I never managed to figure out Visual Studio at home, and the standard gcc is a few steps further from my comfort zone.  Plus, I couldn't see in the documentation how the actual pixel data would be accessed and manipulated.  This echoed a previous failed experiment with a DLL version of libpng modified to work with Visual Basic.

I can't remember when I first found out about pyPNG, but when I looked into it again it seemed quite accessible.  I'm not exactly an experienced python programmer, but there's enough reference material online that I can make it work.  I've always been good at considering exactly how a computer would accomplish tasks, so with the right reference material I can adapt to almost any procedural programming language.

Stage one: load the RGBA image

pyPNG makes it fairly easy to read in the pixel data from a PNG image.  It will even present that data in a few different forms — in my case, as an 8-bit-per-channel RGB+Alpha image, even if that's not how it was originally saved.  (A Grayscale+Alpha image will be converted to RGBA for me; indexed-color images, or images without transparency, will also be converted to RGBA, but I don't see more than very limited use cases for having my program read such images.)  The pyPNG module does this in a way that can be optimized for streaming, but my algorithm needs to iterate over the data a few times, so it loads the whole image in memory. 

Stage two: convert colors and build the nybtree

The image is converted to a linear RGBA color space, and a copy is made in CIE L*a*b*A 1976 color space as well.  (In case you were wondering, internal calculations use "premultiplied alpha" with a bias, essentially storing the color as if it's composed onto a medium gray.)  The next step is done within the same pixel loop, so I'm counting it as the same stage.

It seems the most common color quantization technique is the octree method.  I did a bit of research, and decided I could adapt this to a four-dimensional color space (to account for the alpha channel).  In the octree method, the color cube is subdivided into octants, and each octant is further subdivided into still more octants until every unique color in the image has its own subdivision.  Since I'm dealing with four dimensions, my algorithm will divide a hypercube into sixteen smaller hypercubes, and so on.  Instead of an octree, I'm dealing with a hexadectree, but that word is too bulky.  A nybble is four bits of data, enough to select from 16 choices, so I call it a nybtree.  Anyway, my nybtree is built from the L*a*b*A representation of the image data, because perceptual uniformity is important in this step.

So the nybtree starts with just one node, representing the entire L*a*b*A gamut.  Each pixel is registered in the nybtree, which consists of the following logic.  The deepest node representing a range of colors including the current pixel is found.  If that node hasn't yet collected any pixels, its statistics are updated to include the color of the current pixel.  Otherwise, if that node's statistics reflect a single color (possibly from multiple pixels), and the current pixel is also that color, its statistics are added.  If it's a different color from the node's existing statistics, the node is subdivided, with the existing statistics moved to the appropriate child node, and the current pixel is then registered into its appropriate child node. 

Stage three: prune the nybtree

Since each node in the nybtree represents a potential palette entry, and we can't save a PNG image with thousands of palette entries, we need to prune the tree a bit.  Each node carries statistics: notably the number of pixels represented, the average pixel color (technically the total color energy of all represented pixels) and the standard deviation of the pixel colors (technically the total squared difference from the average of all the represented pixels).  Generally by "color" I mean the whole 4-dimensional value, including opacity.  In the process of pruning, nodes of the nybtree which have children (representing hypercubes that were subdivided into smaller hypercubes) get all their child nodes combined into them (effectively reintegrating the subhyphercubes).  The statistics are combined.  This is done until the number of nodes without children is less than or equal to the number of palette entries desired.  Since combining nodes leads to higher error statistics on the combined node, the nodes with the smallest error statistics are pruned first.  The error statistics are naturally weighted by population, so this results in more colors being allocated to parts of the color cube that are well-used in the image. 

Stage four: build the palette

Every remaining nybtree node which represents at least one pixel now becomes a palette entry, but the RGBA color of this palette entry isn't initially known.  Here's where the image pixels are iterated again.  Each pixel's L*a*b*A value is evaluated to determine which nybtree node represents it; then that pixel's RGBA value is registered in a histogram for the corresponding palette entry.  These histograms are used to calculate the median RGBA value of the pixels collected for each palette entry, and that median color becomes the value of the palette entry.  I use the median instead of the mean, to ensure that large areas of the image in one flat color are represented by exactly that color.

Stage five: paint the image

My program iterates over every pixel for a third time, finally matching each one to its nearest (in linear RGBA space) palette entry.  The error is then distributed to pixels yet to be evaluated, if the user has specified dithering.  My error diffusion kernel is a custom one, based on Floyd-Steinberg but adjusted to favor small horizontal artifacts rather than vertical artifacts, because of the nonlinear horizontal blurring that I've observed with CRT monitors.  I could have done this color matching by distance in L*a*b*A space, but to correctly estimate the dithering error I would want to still calculate that in linear RGBA space, which means converting subsequent pixel colors plus accumulated error from previous pixels from RGB to L*a*b* on the fly, and I doubt the color matching improvement is significant enough to warrant the extra calculation.

Saving the image is essentially one line of code, so I'm not going to write it up as an additional section in this document.

Use Cases

The primary intended use of this utility is to convert a full RGBA PNG image into an indexed-color PNG file with a smaller file size but similar visual appearance.  However, I believe I've implemented a high-quality color reduction algorithm that's also suitable for images with no transparency at all; in fact, if all the pixels are of the same opacity, the algorithm degrades to a high-quality octree reduction algorithm.  In other words, this program is suitable for reducing grayscale, grayscale+alpha, RGB, and RGB+alpha images to indexed-color images.  However, for some of these use cases, other software may do an adequate job much faster.

As of version 1r1462, the command line to use RGBA-Pal is:
python rgba-pal.py InputFile NumColors DitherAmount OutputFile
All arguments are optional, but if one is omitted, the remaining arguments must be omitted as well.  The default arguments are:
test.png 256 0.0 InputFile+8.png
This code was written for and tested in Python 2.7. I have no idea if it works in Python 3.x; I think I read where pyPNG can work with 3.x but there's an extra step somewhere.

Downloads

Version 1r1462 (2014 August 24)

To Do

Copyright / License

Copyright ©2014. Free to use as-is. All other rights reserved until I decide on a license.