Images in space

A common project attempted by programmers of a certain naiveté is classification. For example, classifying images based on their characteristics. A lot of applied AI is about classifying stuff. The applications are things like a search engine for similar images, or automatic facial recognition, or identification of astronomical phenomena in the sky (or medical phenomena in MRIs and other images).

I’ve made a few amateur attempts at this, largely centred around a “filesys” database in which I regularly record metadata for every file in my computer system. Having such data can be helpful for finding misplaced files, or estimating the size of various things, or confirming that files are uncorrupted (by comparing stored MD5s). But mostly it’s for curiosity value.

Eventually the database started recording image data : a thumbnail, and several image features. There was no need to at the time, but I had in mind yet another classification project.

So today I have 135932 images in the database. Not all are present on the system, because some will be from previous snapshots, and will since have departed.

The features collected for each image are:

ravg Average red value
gavg Average green value
bavg Average blue value
savg Average saturation value
lavg Average luminosity value
rsd Standard deviation of red value
gsd S.d. of green value
bsd S.d. of blue value
ssd S.d. of saturation value
lsd S.d. of luminosity
rlavg Average difference between red and luminosity
glavg Average difference between green and luminosity
blavg Average difference between blue and luminosity

Not a lot of analysis was put into selecting these, but they were simple to implement. Once sufficient data was accumulated, some analysis could be done so see how useful they are. These are all global properties (a single value for each image), but local properties may turn out to be helpful too (such as properties for each quadrant of an image). Such future features can be recovered, to an extent, since the thumbnails are retained. The main problem there is that the thumbnail generation is done by adaptive sampling, which reduces the noise in the image — features that depend on noise will be quite different between the thumbnail and the original.

Once a reasonable chunk of data is collected, we then need to analyse it somehow! In the absence of an established plan for doing so, we can at least try to visualise it. Each image is a point in 13-dimensional space. This is difficult to visualise. It can also be difficult to analyse: the curse of dimensionality means that it is harder to find clusters of images close to each other in space. So if the aim of your analysis is to find similar images, then you have to contend with the fact that each 10% increase in a search radius will increase the potential number of matches by 240%. A minor inaccuracy in the distance between images has a massive effect on the inaccuracy of results.

Luckily there is a common technique for reducing high-dimensional data into something simpler, but which retains as much of the original characterisation as possible: Principal Components Analysis.

The idea is that you find the 13 axes — the principal components — in the spatial data which most characterise an image. In practice you choose an axis based on how much variance the points have when projected along it. The axes will not necessarily line up with a single feature; they will be vectors in the 13-dimensional space.

Once the principal components are determined, we can select the first 2 of them to use as the axes of a 2-dimensional visualisation. The two components in the case of the image data are:

Strength ravg gavg bavg savg lavg rsd gsd bsd ssd lsd rlavg glavg blavg
0.294 0.458 0.063 -0.173 -0.219 0.517 -0.089 -0.284 0.169 0.018 0.160 -0.324 0.457 -0.018
0.261 0.478 0.074 -0.144 0.025 0.019 0.029 0.515 -0.393 -0.025 0.160 -0.305 -0.353 0.295

These are not themselves easily visualised; the value for each feature roughly indicates how useful that feature is in classifying images, and how much the axes aligns with it.  Projecting the image points on these axes, and representing each by a single pixel, we can generate an visualisation of the set of images:

It almost looks like a nebula. Since the axes chosen have been combinations of all dimensions, the pure colours have been spread out in a circle around the outside. These are mostly outliers. Unsurprisingly a lot of images have a blend of colours and are presented as greyish pixels in the middle. Assuming that the 13 features chosen are useful, a visualisation using characteristics other than simple image colour might yield more interesting results.

The next three principal components are:

Strength ravg gavg bavg savg lavg rsd gsd bsd ssd lsd rlavg glavg blavg
0.059 0.474 0.076 -0.135 0.264 -0.482 0.118 -0.251 0.221 0.003 0.161 0.373 0.103 0.358
0.036 -0.258 -0.249 -0.931 -0.008 -0.053 -0.025 0.006 -0.019 0.035 0.001 0.000 0.000 -0.000
0.031 0.470 0.071 -0.152 0.024 0.017 0.020 -0.007 -0.001 0.001 -0.480 0.256 -0.207 -0.634

The strength of these is a lot weaker, which means there is a lot of variation on the remaining axes, which will be lost when we project on these ones. A visualisation using these as the colour values for each point is:

Again, it’s rather astronomical in appearance. Given that this type of visualisation is a matter of putting scattered dots on a black background, that is perhaps not unexpected.

A way of squeezing an additional dimension into the visualisation would be to create a 3D model. Adjustable rotation allow a human to move it around to find the best angle to view a particular cluster of points.

Apart from projecting from high-dimensional space into something easily visualised in 2 or 3 dimensions, Principal Components Analysis also makes clustering easier. Clustering is a way of grouping the images by similarity. I have not attempted this, but it is a logical future step towards building an image search engine.

Another next step is to see indicate human-identified groups on the visualisation. Take, for instance, all my photos of St Paul’s Cathedral. Do they appear near each other in 13-dimensional image space? Given the features I’ve used so far, I would be pleasantly surprised if that was the case. But features for texture (stone) or overall image shape (pointy and dome-like) might make it more likely.

This entry was posted in Math, Programming and tagged , , . Bookmark the permalink.

1 Response to Images in space

  1. Pingback: Image PCA animation | EJRH

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s