A common project attempted by programmers of a certain naiveté is classification. For example, classifying images based on their characteristics. A lot of applied AI is about classifying stuff. The applications are things like a search engine for similar images, or automatic facial recognition, or identification of astronomical phenomena in the sky (or medical phenomena in MRIs and other images).
I’ve made a few amateur attempts at this, largely centred around a “filesys” database in which I regularly record metadata for every file in my computer system. Having such data can be helpful for finding misplaced files, or estimating the size of various things, or confirming that files are uncorrupted (by comparing stored MD5s). But mostly it’s for curiosity value.
Eventually the database started recording image data : a thumbnail, and several image features. There was no need to at the time, but I had in mind yet another classification project.
So today I have 135932 images in the database. Not all are present on the system, because some will be from previous snapshots, and will since have departed.
The features collected for each image are:
|ravg||Average red value|
|gavg||Average green value|
|bavg||Average blue value|
|savg||Average saturation value|
|lavg||Average luminosity value|
|rsd||Standard deviation of red value|
|gsd||S.d. of green value|
|bsd||S.d. of blue value|
|ssd||S.d. of saturation value|
|lsd||S.d. of luminosity|
|rlavg||Average difference between red and luminosity|
|glavg||Average difference between green and luminosity|
|blavg||Average difference between blue and luminosity|
Not a lot of analysis was put into selecting these, but they were simple to implement. Once sufficient data was accumulated, some analysis could be done so see how useful they are. These are all global properties (a single value for each image), but local properties may turn out to be helpful too (such as properties for each quadrant of an image). Such future features can be recovered, to an extent, since the thumbnails are retained. The main problem there is that the thumbnail generation is done by adaptive sampling, which reduces the noise in the image — features that depend on noise will be quite different between the thumbnail and the original.
Once a reasonable chunk of data is collected, we then need to analyse it somehow! In the absence of an established plan for doing so, we can at least try to visualise it. Each image is a point in 13-dimensional space. This is difficult to visualise. It can also be difficult to analyse: the curse of dimensionality means that it is harder to find clusters of images close to each other in space. So if the aim of your analysis is to find similar images, then you have to contend with the fact that each 10% increase in a search radius will increase the potential number of matches by 240%. A minor inaccuracy in the distance between images has a massive effect on the inaccuracy of results.
Luckily there is a common technique for reducing high-dimensional data into something simpler, but which retains as much of the original characterisation as possible: Principal Components Analysis.
The idea is that you find the 13 axes — the principal components — in the spatial data which most characterise an image. In practice you choose an axis based on how much variance the points have when projected along it. The axes will not necessarily line up with a single feature; they will be vectors in the 13-dimensional space.
Once the principal components are determined, we can select the first 2 of them to use as the axes of a 2-dimensional visualisation. The two components in the case of the image data are:
These are not themselves easily visualised; the value for each feature roughly indicates how useful that feature is in classifying images, and how much the axes aligns with it. Projecting the image points on these axes, and representing each by a single pixel, we can generate an visualisation of the set of images:
It almost looks like a nebula. Since the axes chosen have been combinations of all dimensions, the pure colours have been spread out in a circle around the outside. These are mostly outliers. Unsurprisingly a lot of images have a blend of colours and are presented as greyish pixels in the middle. Assuming that the 13 features chosen are useful, a visualisation using characteristics other than simple image colour might yield more interesting results.
The next three principal components are:
The strength of these is a lot weaker, which means there is a lot of variation on the remaining axes, which will be lost when we project on these ones. A visualisation using these as the colour values for each point is:
Again, it’s rather astronomical in appearance. Given that this type of visualisation is a matter of putting scattered dots on a black background, that is perhaps not unexpected.
A way of squeezing an additional dimension into the visualisation would be to create a 3D model. Adjustable rotation allow a human to move it around to find the best angle to view a particular cluster of points.
Apart from projecting from high-dimensional space into something easily visualised in 2 or 3 dimensions, Principal Components Analysis also makes clustering easier. Clustering is a way of grouping the images by similarity. I have not attempted this, but it is a logical future step towards building an image search engine.
Another next step is to see indicate human-identified groups on the visualisation. Take, for instance, all my photos of St Paul’s Cathedral. Do they appear near each other in 13-dimensional image space? Given the features I’ve used so far, I would be pleasantly surprised if that was the case. But features for texture (stone) or overall image shape (pointy and dome-like) might make it more likely.