So anyway, I moved house (long story short), meaning that I get to think more clearly, as the house is less cluttered, and the route to work involves crossing less roads.
This morning, I was thinking about my current project – I’m writing a recurrent connectionist network so my new robot can learn to recognise things like grass and rubbish (to cut the former, and remove the latter).
The walk was getting tiring, so I was thinking about segways as well, and wondering how easy it might be to make one.
This eventually evolved into an idea for a new transport system – you get a load of little robots (my gardening ones, for example), and get them to form a platform. Then a load more of them form another platform on top. Then, you stand on the top.
The “carpet” would move in the direction you lean. Of course, the speed wouldn’t be too impressive, but it would be better than walking.
When the lower layer encounters a rock on the road or something, it moves around it. The upper layer robots interlock with each other to allow the lower level bots to do this without having too much pressure from above.
When you reach where you are going, the robots then disperse and continue their gardening around the new area.
You could even form a baggage train using this idea – a few carpet networks would follow each other in marching-ant form.
This would be easier to do than to create a robot which does your gardening for you…
I dropped into a charity shop on the way home today and came across a copy of Artifical Intelligence, by Elaine Rich and Kevin Knight for €2. I couldn’t resist it. I spent the rest of my walk reading the connectionist chapter. It described everything very clearly, even though my eyes rolled back in my head and I started gibbering when I came across some maths in it.
It turns out that the model of neural network that I have chosen to build for the recognition engine in my gardening robot is actually closer to a Boltzmann Machine than a Hopfield Network. The difference appears to be that Hopfield Networks give binary outputs, and are therefore kind of jerky in response, while a Boltzmann Machine gives more of an analogue output, which allows fuzzy results (instead of “Yes, that is a cat” in the former, you get “That’s probably a cat” in the latter, which would be more accurate).
Another interesting part of that chapter was its treatment on recurrent networks, which allows a neural net to do things like learn to speak, learn to walk – generally anything which has a list of actions which must be performed in sequence. This is something I have had an interest in since I started thinking about how to make my robot mobile. The first generation of my bot will run on tank treads, but once I am confident that the prototype works, I will be considering insect-like legs, which take up less room, and allow the robot to step over vegetation without damaging it too much.
Stay tuned – I hope to have the first release of my Rekog engine complete by next weekend – I’m getting the hang of KDE programming. That engine will be multi-purpose – it will be a general recognition engine, usable by other people for other purposes (facial recognition, etc); not specifically what I planned it for.
New Scientist has an article about a study which is honing in on particular neurons which fire when a person recognises an image of a person.
What I find surprising about this is that the concept is very simple to understand, but it seems to be taking researchers decades to come to the point – they seem surprised to find single neurons firing, as a single neuron is a very simple organism, so how could it hold an abstract concept?
I’ve been doing a lot of thinking about neural networks recently, as I’m working on a robotic gardening machine, which will eventually be put to good use in my own garden to help with my farming.
During my own thinking on this, I’ve also come to the realisation that one single neuron can hold an entire complex memory. When you think of it, a neuron includes not just itself, but its connections to the neurons around it. It is the connections that give a neuron its “intelligence”. A memory, then, is the sum of a neuron’s connections.
Now, it’s not quite as simple as that… the connections take input from other neurons, which in turn are calculated from further connections. In short, a simple yes/no question is actually quite complex when you try to work it out with neurons, but when you get the answer, you can trace back on the connections and get a very rich “reason” for the solution.
For instance, the article mentions Halle Berry. Now, for me, Halle Berry rings several bells – a very nice golf swing in a certain film I can’t remember the name of being the strongest. So, for me at least, the neuron (or small group of neurons) that recognises Halle also links the recognition strongly to that scene. There is also an image of her face, and for some reason, a Michael Jackson video (did she play an Egyptian queen in a video?).
That’s at least four neurons, each of which, if I think about them, will throw up a load more connections.
I think that the various neurons help to keep the memory strong. In Artificial Neural Networks, changing a single neuron is discouraged if it has strong connections to many others, as that change will affect the results of those other neurons.
I think that this is why mnemonic memory works so well. In Mnemonics, in order to remember a single item, you try to link it with something you already know. For example, in the old Memory Palace method, you imagine a walk through your house, or another familiar place. Each room that you enter, you can associate with a certain thought. For more memories, you can associate individual points of interest in the room – shelves, windows, corners, etc.
For instance, let’s say you are to remember a shopping list of “bananas, lightbulbs, baby food, and clothes pegs”, you could associate it with my own house like this: “I walk into my house. Before I can enter, I need to push a huge inflated banana out of the way. On my left is a lavatory. In that room, the walls are covered in blinking lightbulbs. Further on, I reach the main hall. The floor is cobbled with jars of baby food. I walk over the jars into the sitting room, where my girlfriend is sitting, trying to stick as many clothespegs to her face as possible”.
Now, by associating the front door with a banana, for instance, you are doing a few things – you strengthen connections between your front door and bananas, you also connect bananas with your front door, and the absurdity of the situation impresses the connections further. Later on, when you reach the shopping market, you don’t need to remember what was on your list – you just need to go through your memory palace a room at a time.
What is very important about this is that you have used only two items of memory (your front door, and bananas) to remember a third item – that bananas are on your list.
I wonder – Is the sum of possible memories far greater than the sum of neurons available to you? It seems to me that it’s dependant more on the connections than the neurons.
As many of you may know, one great pastime of mine is thought-experiments about robotic gardening.
I’ve bought a mini-itx board for building my robot, so the obvious next step was to think about how the robot should think.
I’ve been interested in Artificial Neural Networks for a few years, and they seem like the right way to go about what I want.
The problem I decided to focus on was this:
Given a photo of what the robot is facing, make it figure out is the photo of something organic, or inorganic.
A very simplistic diagram of how the machine might do this is shown below:
The above shows a very basic neural net. I think it’s called a “feed-forward” net, because each column of units is connected directly to just the adjacent columns (note that the rightmost column is not connected to the leftmost).
In the actual net, the “input” units would correspond to individual pixels of the image. The image is most definitely not to-scale – hundreds of input units would be required, and much more than just two hidden units – possibly two or more layers would be required as well, but you get the picture.
This net, when trained, would give an adequate answer. But then, the question arose – could the same net be used to provide more detail?
ie; What if we want to know if what we’re looking at is a nettle?
Logically, it would be possible to rebuild the network with just that question in mind, but it occured to me that it may be possible to do both at the same time.
The two answers come from the same hidden data. This may end up with a little less accuracy, as the neurons are now providing answers tailored to two different end goals, instead of one.
Looking at the diagram, though, it becomes clear that the “is nettle” unit is not availing itself of all available data. One major point about nettles, is that they’re organic, so there really should be a link between the “is organic” and “is nettle” units. It would drastically aid in accuracy, I believe.
There is a subtle effect which would appear in the above network…
Let’s say that the network is looking at a photo of a brick wall. That photo is then replaced by a photo of a nettle. The units are all updated one at a time, from left column to right column, top to bottom.
A point to note here is that the “is nettle” unit would be updated before the “is organic” unit.
I expect that “is organic” would be very tightly bound to the answer to “is nettle”, so it’s weightings would be pretty high. But, as the “is organic” unit in this case would be still holding to answer to the brick wall question by the time it is polled by “is nettle”, that the “is nettle” unit would most likely not recognise the picture of a nettle for what it was.
Interestingly, it would get it right when the exact same image was put through immediately afterwards.
I think that is similar to how we ourselves take a moment to re-orient ourselves when suddenly changing focus from concentrating on one subject to another.
Expanding on that, I think it would be interesting to have every neuron connected directly to every other neuron. It would lead to some slower results, but I think that it would allow much more accurate results over time.
For example, in video, if ever frame was considered one at a time, with absolutely no memory of what had been viewed the time before, then it may be possible to get drastically different results from each frame. However, if, for example, the previous frame was of a man standing in a field, then with the new connection-based network, the network would be pre-disposed to expect a man standing in a field. I think this may be called “feed back”.
This will be very useful for my robot, as it means I can track actual live video, and not have to rely on just still frames.