As many of you may know, one great pastime of mine is thought-experiments about robotic gardening.
I’ve bought a mini-itx board for building my robot, so the obvious next step was to think about how the robot should think.
I’ve been interested in Artificial Neural Networks for a few years, and they seem like the right way to go about what I want.
The problem I decided to focus on was this:
Given a photo of what the robot is facing, make it figure out is the photo of something organic, or inorganic.
A very simplistic diagram of how the machine might do this is shown below:
The above shows a very basic neural net. I think it’s called a “feed-forward” net, because each column of units is connected directly to just the adjacent columns (note that the rightmost column is not connected to the leftmost).
In the actual net, the “input” units would correspond to individual pixels of the image. The image is most definitely not to-scale – hundreds of input units would be required, and much more than just two hidden units – possibly two or more layers would be required as well, but you get the picture.
This net, when trained, would give an adequate answer. But then, the question arose – could the same net be used to provide more detail?
ie; What if we want to know if what we’re looking at is a nettle?
Logically, it would be possible to rebuild the network with just that question in mind, but it occured to me that it may be possible to do both at the same time.
The two answers come from the same hidden data. This may end up with a little less accuracy, as the neurons are now providing answers tailored to two different end goals, instead of one.
Looking at the diagram, though, it becomes clear that the “is nettle” unit is not availing itself of all available data. One major point about nettles, is that they’re organic, so there really should be a link between the “is organic” and “is nettle” units. It would drastically aid in accuracy, I believe.
There is a subtle effect which would appear in the above network…
Let’s say that the network is looking at a photo of a brick wall. That photo is then replaced by a photo of a nettle. The units are all updated one at a time, from left column to right column, top to bottom.
A point to note here is that the “is nettle” unit would be updated before the “is organic” unit.
I expect that “is organic” would be very tightly bound to the answer to “is nettle”, so it’s weightings would be pretty high. But, as the “is organic” unit in this case would be still holding to answer to the brick wall question by the time it is polled by “is nettle”, that the “is nettle” unit would most likely not recognise the picture of a nettle for what it was.
Interestingly, it would get it right when the exact same image was put through immediately afterwards.
I think that is similar to how we ourselves take a moment to re-orient ourselves when suddenly changing focus from concentrating on one subject to another.
Expanding on that, I think it would be interesting to have every neuron connected directly to every other neuron. It would lead to some slower results, but I think that it would allow much more accurate results over time.
For example, in video, if ever frame was considered one at a time, with absolutely no memory of what had been viewed the time before, then it may be possible to get drastically different results from each frame. However, if, for example, the previous frame was of a man standing in a field, then with the new connection-based network, the network would be pre-disposed to expect a man standing in a field. I think this may be called “feed back”.
This will be very useful for my robot, as it means I can track actual live video, and not have to rely on just still frames.