27 Aug

rekog

As some people know, I’ve been thinking about robotic gardening for years. It’s about time I started on it.

The robot’s brain will consist of a few modules, some of which are:

  • A recognition engine, for distinguishing between plants I want, plants I don’t want, and other miscellaneous items (sky, dirt, people, etc).
  • A mapping engine. This engine will take multiple images as its input, and build up a 3D internal representation of the world based on those images.
  • The rest. I’ll worry about these when the time comes, but basically, these consist of a scheduling engine, code for controlling the various parts of the robot, and sundry.

I’ve started on the recognition engine, which I am calling rekog, as I’m using the KDE widget set to construct it.

In the end, the finished code will consist of a simple API for importing an image and retrieving a list of probable identities for the image’s contents.

Of course, to get to that end, I need something visual, so I’m building a visual program for importing the images and building test data.

The program will be simple to use. Simply build a list of images, create some “neurons” describing what the images contain (“vegetation”, “sky”, “road”, “grass”, “nettles”, “potatoes”), click “train”, then the engine will be reasonably ready for use with live images.

The “brain” of the engine will be a neural network, or at least, an approximation of what I think is a neural network (I’m not an educated computer scientist, so I may be wrong). In particular, it will be similar to a “Hopfield Network”.

The difference between a Hopfield network and the usual feed-forward perceptron is that a Hopfield network forms a recurrent cycle, which means that each neuron is linked to almost every other neuron (no “layers”), and a “memory” of sorts remains in the system.

An advantage to the memory effect is that an image seen just a moment ago can influence the result of the image you are looking at now. This means that if the engine is definite that the image it’s looking at contains, say, nettles, then there is a greater chance that it will recognise the nettles in the next image which is taken from a foot away.

Anyway – I’m enjoying learning visual programming – I never got the hang of GUI stuff back when I was working in C in the 20th century. The concepts I’ve learned working with Unobtrusive JavaScript and AJAX (two forms of asynchronous programming in web pages) are really helping me get the hang of it.