Archive for the 'neural networks' Category

Yesterday’s attempts at ANN training seemed at first to be successful, but I had overlooked one simple put curious flaw - there was training going on all the time, and each test was run 10 times… This means that each neuron was trying different values all the time until it got the right one, then it would be the next neuron’s turn (a bit of a simplified answer, but I don’t know how to describe what was actually happening). This ended up causing the tests to look a lot more successful than they actually were.

This morning, I did a lot of work figuring out how to get around the problem. It turns out the problem is not with the neural network - that appears to be working perfectly. The problem is with the method of training.

Just like with people, you cannot just throw a net into a series of 26 tests which it has never seen before and expect it to learn it any time soon. For any particular neuron, 25 of the tests will have “No” as the answer, and it is too easy for the neuron to just answer “No” to everything and get a 96+% correct answer.

Instead, you need to start with just one test, keep trying until that’s right, then add another test, keep going until they’re both right, then add another test, etc.

Even that was not enough, though - it turns out that the capital letters “B” and “E” are very similar to each other, as are “H” and “M” (at least, in my sequence, they are).

I managed to improve the learning of the neurons by following rules such as these:

  • If a test’s answer is “No” but the neuron says “Yes” (ie; has a returned value above 0), then adjust the weights of the neuron it (correct/punish it).
  • If a test’s answer is “Yes”, and the neuron is anywhere less than 75% certain of Yes, then adjust/reward the neuron.

In all other cases (neuron is certain of Yes and is right, or neuron is vaguely sure of No and is right) leave the neuron’s weights alone.

This has helped to avoid the problem where neurons get extremely confident of Yes or No and are hard to correct when a similar test to a previously learned one comes along (O and Q for example).

It’s still not perfect, but perfection takes time…

Last week, I wrote a neural network that could balance a stick. That was a simple problem which really only takes a single neuron to figure out.

This week, I promised to write a net which could learn to recognise letters.

demo

For this, I enhanced the network a bit. I added a more sensible weight-correction algorithm, and separated the code (ANN code).

I was considering whether hidden inputs were required at all for this task. I didn’t think so - in my opinion, recognising the letter ‘I’ for example, should depend on some information such as “does it look like a J, and not like an M?” - in other words, recognising a letter depends on how confident you are about whether other values are right or wrong.

The network I chose to implement is, I think, called a “simple recurrent network” with stochastic elements. This means that every neuron reads from every other neuron and not itself, and corrections are not exact - there is a small element of randomness or “noise” in there.

The popular choice for this kind of test is a feed-forward network, which is trained through back-propagation. That network requires hidden units, and each output (is it N, it it Q) is totally ignorant of the other outputs, which I think is a detriment.

My network variant has just finished a training run after 44 training cycles. That is proof that the simple recurrent network can learn to recognise simple letters without relying on hidden units.

Another interesting thing about the method I used is how the training works. Instead of throwing a huge list of tests at the network, I have 26 tests, but only a set number of them are run in each cycle depending on how many were gotten right up until then. For example, a training cycle with 13 tests will only be allowed if the network previously successfully passed 12 tests.

There are still a few small details I’d want to be sure about before pronouncing this test an absolute success, but I’m very happy with it.

Next week, I hope to have this demo re-written in Java, and a new demo recognising flowers in full-colour pictures (stretching myself, maybe…).

As always, this has the end goal of being inserted in a tiny robot which will then do my gardening for me. Not a mad idea, I think you’re beginning to see - just a lot of work.

update As I thought, there were some points which were not quiet perfect. There was a part of the algorithm which would artifically boost the success of the net. With those deficiencies corrected, it takes over 500 cycles to get to 6 correct letters. I think this can be improved… (moments later - now only takes 150+ to reach 6 letters)

Last century, when I worked for Orbism, a co-worker, Olivier Ansaldi (now working for Google) showed me a Java applet he was working on which learned how to balance a stick using a neural net.

I decided to try it myself, and wrote a neural net that does it yesterday.

demo (Firefox only)

There is a total of 3 neurons in the net - 1 bias, 1 input (stick angle) and 1 output.

It usually takes about 20 iterations to train the net. Sometimes, it gets trained in such a way that the platform waggles back and forward like a drunk, and sometimes it gets trained so perfectly that it’s damn boring towatch (basically, it’s a platform with a stationary stick on it).

Anyway… for my next trick, I’ll try building a net which can recognise letters and numbers.

Partly the point of this was that it was an itch I wanted to scratch. But also, I wanted to write it in C++ but am better at JavaScript so wrote it in JS to test it first before I attempt a port.

I’m getting interested in my robot gardener idea again, so am building up a net that I can use for it.

Some points about how this differs from “proper” ANNs.

  • Training is done against a single neuron at a time (not important in this case, as there are only three neurons anyway).
  • This net will attach all “normal” neurons to all other neurons. I don’t like the “hidden layer” model of ANNs - I think they’re limited.
  • No back-prop algorithms are used - I don’t trust “perfect” nets and prefer a bit of organic learning in my nets.
  • The net code itself is object-oriented and self-sufficient. It would be possible to take the code and use in another JS project.