Category Archives: robotic gardening

very modest bounty

In an effort to get some work done on my robot, I’d like to offer a small bounty for help with some parts. I’m going to start off very small, but hopefully as time advances (and my finances improve), I’ll be able to offer larger sums.

The first bounty will be a very modest €50 for a C console application which can do the following:

  • Accept two filenames from the command line as input, which correspond to two photographs of one scene, taken parallel to each other, separated by about 6 inches. The cameras may not be exactly parallel, and my be slightly twisted or unfocused, relative to each other, so the application should be relatively intelligent about that.
  • From those files, extrapolate 3D shapes and textures, and echo them to STDOUT. The format of the output should be well known. I offer VRML and X3D as ideas, but if there is another more-apt language, then that will be acceptable.
  • Any textures needed should be output as images to the current directory as 1.png, 2.png, etc.
  • The application should run in Linux, and must not use up a huge amount of RAM.
  • The application may use other libraries to achieve the goal (in fact, I’d recommend this, to make maintenance easier), but those libraries must be open source.

The closing date for this is July 1. All solutions offered will be compared for efficiency and accuracy, against a few different models created in POV-Ray, and also against real-life photographs.

If no solutions are submitted, I will announce the project again, with a higher bounty (probably increasing by another 50, or 100).

All applications submitted should have the source-code readily available.

Once an application is accepted, it will immediately be released as open source. You will be asserted as the author of the code, but the ownership of the code will be public.

I know the bounty is kind of low, but it’s really just to drum up interest, and I’m not all that wealthy ;-)

This project is really an experiment. If it is successful, I have many other projects that I want done, which I will also announce bounties for.

syncing and bots

Been a while. I took two weeks off work after I got married, to just relax and play some geetar (practicing my speed soloing with some classical riffs and a metronome).

So, I was just reading through my daily blog list today and noted someone talking about kde.org and opensync.org collaberating to help reduce personal data redundancy.

I have a big problem with personal data redundancy. I use five computers on a regular basis. My laptop, my home server, my work machine (this one), my robot, and the office server. Forget about the bot, as I don’t do any personal computing on it, but on all of the others, I run programs such as Firefox, IE, Konqueror, Thunderbird and Kontact on a regular basis (as well as many others – vim, etc).

Sharing email is not a problem, as I use IMAP. However there are a number of itches that I’m tired of scratching:

  • Only yesterday, I was wishing that Firefox and Konqueror could share the same bookmarks file, as I have a well-categorised list of personal bookmarks on my laptop which I would not like to have to rebuild every time I move to a different browser or machine. Okay, I can share the same bookmark file between various Firefoxes on separate machines, by using NFS or something (which is insecure, slow, and can cause locking problems), but it’s a shame that there does not seem to be a simple way of sharing bookmarks between two browsers on the same machine without having to manually export and import it every time something changes!
  • In work, I do my coding in vim, on the office server, in KDE, viewed via vnc. I used to use Synergy for it, when I was sitting right next to the office server, but we moved everything around here and removed the monitor from teh server, so I need to use VNC. One crap thing about VNC is that I can’t seem to copy from outside it, and paste inside it. This means that if, for example, I am debugging some JavaScript for IE, and need to copy something from the output and paste it into what I’m working on, then I can’t do it! I need to either do a complex notepad and ftp dance, or write it out by hand. It would be nice to just select, copy, and paste. The same applies the other way around. If I come across an interesting blog post in Akregator in the VNC window, I can’t just copy the link and paste it in Firefox or IE on the main workstation.
  • The only other example I can think of is the preferences I have for vim – I have customised PHP and JavaScript folding commands, which I use on all four of my linux boxes, and the three production servers that I manage. That’s seven copies each of two files. That’s a pain to keep uptodate…
  • The only thing I can think of that would work for keeping the files uptodate is to keep those specific files in an exported NFS directory on a trusted computer (trusted not to go bye-bye in the middle of an important job). Unfortunately, NFS is not secure for a list of reasons, and I cannot think of a better way. Anyone?

Aaaanyway… I also spent some time working on my bot.

As people may know, my goal is to build a bot that can do my gardening for me. While many people just point and laugh (in the building here, I’m known as “Luke Skywalker” (actually, it was Anakin that built C3P0, but correcting the misnomer would prove that I’m a little geeky)), I am of the absolute and firm belief that this is possible, and inevitable.

So, what have I done already? Nothing much, I guess. I build a shell which holds an Epia ME6000 Mini-ITX mainboard. This board has a Netgear WG311 wireless card attached, as well as two generic webcams. The whole lot is powered by a 12V acid-lead battery, connected through a PW200M power convertor.

What that means, is that I have a working computer which does not rely on any cable connections for power or networking. At the moment, I guess the coolest thing I can do with it is to plonk it down somewhere in the garden, then go inside and do some bird-watching via the cameras, on a different computer.

I guess the next thing to do is to attach some wheels to it so it will be an actual moving robot! For that, I will need some tank tracks (feel free to buy them for me ;-) ).

Once the tank tracks are attached and working, I will have a remote control robot that I can move with my laptop, and see exactly what it sees.

After that, thinks get difficult… but I’ll get to that!

great bargain in a cheap shop

I dropped into a charity shop on the way home today and came across a copy of Artifical Intelligence, by Elaine Rich and Kevin Knight for €2. I couldn’t resist it. I spent the rest of my walk reading the connectionist chapter. It described everything very clearly, even though my eyes rolled back in my head and I started gibbering when I came across some maths in it.

It turns out that the model of neural network that I have chosen to build for the recognition engine in my gardening robot is actually closer to a Boltzmann Machine than a Hopfield Network. The difference appears to be that Hopfield Networks give binary outputs, and are therefore kind of jerky in response, while a Boltzmann Machine gives more of an analogue output, which allows fuzzy results (instead of “Yes, that is a cat” in the former, you get “That’s probably a cat” in the latter, which would be more accurate).

Another interesting part of that chapter was its treatment on recurrent networks, which allows a neural net to do things like learn to speak, learn to walk – generally anything which has a list of actions which must be performed in sequence. This is something I have had an interest in since I started thinking about how to make my robot mobile. The first generation of my bot will run on tank treads, but once I am confident that the prototype works, I will be considering insect-like legs, which take up less room, and allow the robot to step over vegetation without damaging it too much.

Stay tuned – I hope to have the first release of my Rekog engine complete by next weekend – I’m getting the hang of KDE programming. That engine will be multi-purpose – it will be a general recognition engine, usable by other people for other purposes (facial recognition, etc); not specifically what I planned it for.

why am I trying to build a robot anyway?

People look at me as if I’m insane, when I tell them of my efforts to build a robot capable of automatically growing and caring for vegetables. – or they laugh and say I will still be trying in twenty years time.

To the latter, I just smile, and say “probably”. To the former, though – I don’t understand why people think it is crazy.

This post should hopefully explain my rationale, and maybe even convince you that it’s insane to not at least try it.

Ever since they were invented, computers have been touted as the device which would make everything so much more efficient than before.

But then – so is the same said of every other tool – the plough, for example, helps to till the ground faster, the printing press allows books to be shared faster. The insert-tool-here allows you to insert-job-here faster.

However – in all cases, creating faster and more efficient tools simply allows you to do more work – it doesn’t save you any labour – it just allows you to do more labour in the same time.

Are you working a five hour week? If not, then you are still working around the same lengthy hours that people have done for centuries – you’re just getting more of it done on time.

Efficiency is not the solution! The solution is to remove the need for the work altogether – not to make the work easier to do.

So – here is an alternative idea:

Sit back and relax. Now, think to yourself – why are you working? Is it to get money? What do you need money for?

When I think of it, the reasons for money can be broken into two essential categories and an “optional” category:

  • Essential: housing
  • Essential: food
  • Optional: everything else

The “Optional” category includes such crap as Entertainment, Clothing, Transport, etc. I am not advocating that you should completely ignore those things in your quest for an efficient life – just be aware that they are optional – they are not absolutely necessary for you to be able to live comfortably and without hassle.

When it comes down to it, food is really the most important thing you can spend money on. You can live without entertainment, clothes and a place to live, but you cannot live without food. Let’s ignore the homeless route, though – it’s not comfortable, and we want to be comfortable.

So, assuming we own our own house (work with me here…), the only thing left that you require, that is costing you money, is food. You need to learn to provide food for yourself, without using money.

The easiest way to do this is through gardening (or “farming” – whatever you want to call it). To grow potatoes, for instance, it’s just necessary that you put some potatoes onto some soil, and dump some compost or dead weeds on top of them, and weed it every now and then.

However, that’s just exchanging your office job for a job in the garden – I mean, you could just as well be working for the same amount of hours and buy the potatoes without touching the garden (back to square one). The true way to escape from the drudgery is to have someone else do the job for you.

But – who? If you hire a gardener to do it, then you’ll need to get a job in order to pay the gardener (back to square one again), so you need to somehow get your gardening done from someone that does not require payment.

Let’s take a detour: imagine what your life would be like if you owned your own house, and your food was provided completely free…

You would never need to work, except if you wanted to purchase something. I cannot emphasise how important that is – you would never need to work, unless you wanted to purchase something.

In fact, you could, if you wanted, live out your entire life just chilling out in the back garden of your house, as your robot toiled in the fields. You could learn to enjoy watching the clouds instead of TV. You could learn to get over your need for neat clothes and just let it all hang out. Anything you ever did that could be called “work” would be done purely for the pleasure of it.

If you do feel the need for entertainment, then why not pop around and play a game of cards with someone, or join a band, etc? If you feel the need for clothes, then learn to make your own, or barter for them from someone else.

Once you have everything that you need, the pressure to provided things that you want will disappear.

Sounds fantastic, right?

That’s where my robot comes in.

Robots do not require any form of pay, except in the form of electricity, which can be provided freely anyway, once you’ve installed a wind farm, sterling engine, solar array, or any other alternative renewable electricity source.

I really do not understand why people don’t see it as essential that this dream be brought to fruition – I consider it personally to be the ultimate goal in any civilization to free themselves from all drudgery and allow themselves to do just basically what they want, whenever they want, and without needing to spend 90% of their waking hours hunched over a keyboard.

neurons for memory

New Scientist has an article about a study which is honing in on particular neurons which fire when a person recognises an image of a person.

What I find surprising about this is that the concept is very simple to understand, but it seems to be taking researchers decades to come to the point – they seem surprised to find single neurons firing, as a single neuron is a very simple organism, so how could it hold an abstract concept?

I’ve been doing a lot of thinking about neural networks recently, as I’m working on a robotic gardening machine, which will eventually be put to good use in my own garden to help with my farming.

During my own thinking on this, I’ve also come to the realisation that one single neuron can hold an entire complex memory. When you think of it, a neuron includes not just itself, but its connections to the neurons around it. It is the connections that give a neuron its “intelligence”. A memory, then, is the sum of a neuron’s connections.

Now, it’s not quite as simple as that… the connections take input from other neurons, which in turn are calculated from further connections. In short, a simple yes/no question is actually quite complex when you try to work it out with neurons, but when you get the answer, you can trace back on the connections and get a very rich “reason” for the solution.

For instance, the article mentions Halle Berry. Now, for me, Halle Berry rings several bells – a very nice golf swing in a certain film I can’t remember the name of being the strongest. So, for me at least, the neuron (or small group of neurons) that recognises Halle also links the recognition strongly to that scene. There is also an image of her face, and for some reason, a Michael Jackson video (did she play an Egyptian queen in a video?).

That’s at least four neurons, each of which, if I think about them, will throw up a load more connections.

I think that the various neurons help to keep the memory strong. In Artificial Neural Networks, changing a single neuron is discouraged if it has strong connections to many others, as that change will affect the results of those other neurons.

I think that this is why mnemonic memory works so well. In Mnemonics, in order to remember a single item, you try to link it with something you already know. For example, in the old Memory Palace method, you imagine a walk through your house, or another familiar place. Each room that you enter, you can associate with a certain thought. For more memories, you can associate individual points of interest in the room – shelves, windows, corners, etc.

For instance, let’s say you are to remember a shopping list of “bananas, lightbulbs, baby food, and clothes pegs”, you could associate it with my own house like this: “I walk into my house. Before I can enter, I need to push a huge inflated banana out of the way. On my left is a lavatory. In that room, the walls are covered in blinking lightbulbs. Further on, I reach the main hall. The floor is cobbled with jars of baby food. I walk over the jars into the sitting room, where my girlfriend is sitting, trying to stick as many clothespegs to her face as possible”.

Now, by associating the front door with a banana, for instance, you are doing a few things – you strengthen connections between your front door and bananas, you also connect bananas with your front door, and the absurdity of the situation impresses the connections further. Later on, when you reach the shopping market, you don’t need to remember what was on your list – you just need to go through your memory palace a room at a time.

What is very important about this is that you have used only two items of memory (your front door, and bananas) to remember a third item – that bananas are on your list.

I wonder – Is the sum of possible memories far greater than the sum of neurons available to you? It seems to me that it’s dependant more on the connections than the neurons.

Ramble finished…

thinking about thinking

As many of you may know, one great pastime of mine is thought-experiments about robotic gardening.

I’ve bought a mini-itx board for building my robot, so the obvious next step was to think about how the robot should think.

I’ve been interested in Artificial Neural Networks for a few years, and they seem like the right way to go about what I want.

The problem I decided to focus on was this:

Given a photo of what the robot is facing, make it figure out is the photo of something organic, or inorganic.

A very simplistic diagram of how the machine might do this is shown below:

The above shows a very basic neural net. I think it’s called a “feed-forward” net, because each column of units is connected directly to just the adjacent columns (note that the rightmost column is not connected to the leftmost).

In the actual net, the “input” units would correspond to individual pixels of the image. The image is most definitely not to-scale – hundreds of input units would be required, and much more than just two hidden units – possibly two or more layers would be required as well, but you get the picture.

This net, when trained, would give an adequate answer. But then, the question arose – could the same net be used to provide more detail?

ie; What if we want to know if what we’re looking at is a nettle?

Logically, it would be possible to rebuild the network with just that question in mind, but it occured to me that it may be possible to do both at the same time.

The two answers come from the same hidden data. This may end up with a little less accuracy, as the neurons are now providing answers tailored to two different end goals, instead of one.

Looking at the diagram, though, it becomes clear that the “is nettle” unit is not availing itself of all available data. One major point about nettles, is that they’re organic, so there really should be a link between the “is organic” and “is nettle” units. It would drastically aid in accuracy, I believe.

There is a subtle effect which would appear in the above network…

Let’s say that the network is looking at a photo of a brick wall. That photo is then replaced by a photo of a nettle. The units are all updated one at a time, from left column to right column, top to bottom.

A point to note here is that the “is nettle” unit would be updated before the “is organic” unit.

I expect that “is organic” would be very tightly bound to the answer to “is nettle”, so it’s weightings would be pretty high. But, as the “is organic” unit in this case would be still holding to answer to the brick wall question by the time it is polled by “is nettle”, that the “is nettle” unit would most likely not recognise the picture of a nettle for what it was.

Interestingly, it would get it right when the exact same image was put through immediately afterwards.

I think that is similar to how we ourselves take a moment to re-orient ourselves when suddenly changing focus from concentrating on one subject to another.

Expanding on that, I think it would be interesting to have every neuron connected directly to every other neuron. It would lead to some slower results, but I think that it would allow much more accurate results over time.

For example, in video, if ever frame was considered one at a time, with absolutely no memory of what had been viewed the time before, then it may be possible to get drastically different results from each frame. However, if, for example, the previous frame was of a man standing in a field, then with the new connection-based network, the network would be pre-disposed to expect a man standing in a field. I think this may be called “feed back”.

This will be very useful for my robot, as it means I can track actual live video, and not have to rely on just still frames.

finding the X,Z coordinate of a point visible in two parallel cameras

Okay – forgive the messy maths. It’s been a decade since I wrote any serious trigonometry, so this may be a little inefficient. All-in-all, though, I believe it is accurate.

I thought I would need a third image in order to measure Z, but figured it out eventually.


fig.1

Consider fig.1. In it, the blue rectangle is an object seen by the cameras (the black rectangles).

The red lines are the various viewing “walls” of the camera – the diagonals are the viewing angle, and the horizontal lines are the projected images of the real-life objects. We assume that the photographs are taken by the same camera, which is held parallel in the Z axis.

The green triangle portrays the distance between the camera view points, and the distance to the point we are determining. We cannot determine whether the final measurements will be in centimetres, metres, inches, etc, but we can safely assume that no matter what final measurement unit is used, it will be a simple multiple to correct all calculations arising from this.

The only numbers that we can be sure of are the camera angle C, the width in pixels of the view screen c, and the distance in pixals from the left of the view screens that the projections cross the line ch and j. As the final measurement unit does not matter, we can safely assign d any number we want.

What we are looking for are the distance in d-type units for x and z. We can safely leave y out of this for now. It’s a simple matter to work that out.

Right… here goes… We’ll start with the obvious.

G = (180-C)/2

As there are two equal angles in each of the camera view triangles, the line f is one wall of a right-angled triangle. Therefore, using the trig. rule that Tan is the Opposite divided by the Adjacent,

Tan(G) = f/(c/2)

Which, when re-ordered and neatened, gives us

f=(Tan(G)*c)/2

We can then work out B using the same rule.

Tan(B) = f/(h-(c/2))

Which neatens to

B=ATan(2f/(2h-c))

Using similar logic,

A=ATan(2f/(2k-c))

So, now that we have A and B, we can work out D

D=180-A-B

An because of the rule that angle sines divided by their opposite sides are the same for any triangle,

d/Sin(D) = a/Sin(A)

Therefore

a = (d*Sin(A))/Sin(D)

And via more simple trig,:

z = a*Sin(B)

And

x = a*Cos(B)

Tada!

So, to get z with one equation:

z=((d*Sin(ATan(2*((Tan((180-C)/2)*c)/2)/(2k-c))))/Sin(180-(ATan(2*((Tan((180-C)/2)*c)/2)/(2k-c)))-ATan(2*((Tan((180-C)/2)*c)/2)/(2h-c))))*Sin(ATan(2*((Tan((180-C)/2)*c)/2)/(2h-c)))

Obviously, I’ll optimise that a little before writing it into my program…

first stab at an algorithm to get 3D objects from two images

Update: This post has a few errors in its methods, but describes what I was thinking at the time. Check this out for a more accurate method.

This post is myself putting into words some ideas on my present project – creating an internal 3D model from images.

First off, take two photos of the location you want to map. The photos should be taken from a few steps apart so there is some difference between near and far objects. It is not necessary for the distance to be exact. What isnecessary, is that the cameras be parallel. For that, try aiming the camera through the room, to some distant focal point.

The next step is to find the points where the two images overlap best. See the previous post for how to do this.

Here comes the fun part – we need to come up with a 3D model which somehow matches both of those photos, including their differences.

The difficulties that I forsee here mostly have to do with lighting differences and certainty. A photograph of an object may look different based on time of day or angle or a myriad of other things, so you can’t just do an exact comparison – some amount of error must be allowable. This brings up uncertainty – If two pixels next to each other look alike, they may confuse the build engine.

Anyhow – we’ll press on and build those bridges when we come to them.

The simpler items to create will be those that overlap exactly with the separate viewpoints.

  1. Load up the images, and run them through the overlapper.
  2. Assume the distance between the POVs is equal to the X offset. We don’t need to use real-world measurements for this. Assume one pixel of the original images is equal to one “unit” in the internal world model.
  3. Crop everything which does not overlap.
  4. Assume the camera’s view angle is about 45° – makes the maths simpler…
  5. Assume the average distance (Z) in the overlapped area is equal to the X offset. We can fix this later with a third photo.
  6. For each visible pixel in the overlapped area in image 1, if there is a correlation (allowing for an error of, say, 1%) in image 2 at that exact location
    1. Create a small square (two triangles) in the 3D model at that X/Y coordinate, with distance Z.
    2. The square should be large enough that it would cover the whole pixel. Mark that pixel as “done”.
  7. Optimise the created triangles, merging where possible.

The next part is more difficult – we need to find correlations with pixals that appear in one image but are not at the same point in the other.

  1. For each number (D) between 1 and half the width of the overlap
    1. For each pixel in image1 which is not marked as “done”
      1. If the pixel in image1 matches with reasonable certainty to the pixel D pixels left or right of the corresponding position in image2, then create a 3D block at the calculated distance (simple trigonometry).

There will be pixels that do not match. They can wait for further photos.

Pretty easy, when I put it like that, right? I’m sure there will be some headaches ahead.

finding the points where two similar images overlap

Take a look at this:


So? It’s two photographs that overlap. What’s so special about that?

What’s special is that the overlapping was done by my computer. It’s the first step towards computer vision for my robots.

See, in order for my bots to know where they are visually, they must be able to compare a real-life camera view to a rendered internal model. The above overlap thing is a first step towards that internal model.

What happened to get the overlap coordinates, is that the two separate images were compared to each other at different points. The closest match (according to the algorithm I created) is the overlap you see above.

It is possible, with this method, to determine how far away most things are, making it possible to build up a 3D model of the world through photographs.

The script is written in PHP at the moment (available here). I want to do a bit more debugging and optimisation on it before porting it to C.

The next step will be to build up a simple 3D copy of a room based on photographs.

robot localisation

I’m compiling KDevelop at the moment, in preparation for leaping into my simulation project.

While I’m at it, I’ve been looking into what other people have done to solve some of the problems I’m likely to face for this project.

For example, here is an innovative way to quickly distinguish one large area from another based on histograms.

The method I’ve been thinking of, though, is a little different…

The simulation daemon will need to simulate whatever sensors the real robot has. As I am planning on building the robot with cameras, the daemon must therefore have a ray-tracer built in.

The ray tracer will not need to be extremely correct, as this is, after all, just a training exercise designed to give the robot most of the instincts that it needs.

So anyway – the robot will be initialised with a belief that it is one certain area of the grid map. Imagine this as the machine being turned on at a “home” area. The robot will be familiar with this area and will be virtually certain of what position the camera is in, and what angle, etc.

From that certainty, we can then make up a trainer designed to give the robot a sense of balance and location.

For instance, if the robot moves, then it will have a fair idea of how far it has moved, and in what direction. The problem that I’m trying to solve here, though, is that there is no certainty that the robot has actually moved that far – its sense of speed may be wrong, or its steering may be a little awry, or maybe it has slipped on a smooth patch of ground.

What we need is a second opinion so the robot can be certain, which is the purpose of the camera.

The base unit will always be assumed to be moving in an environment that has been manually created specifically for that purpose. Because of this, we can be certain that a few focal points, or landmarks will be present. For instance, the robot will assume that based on where it thinks it is, the path will look a certain way, including any expected junctions.

So, we need a ray tracer for the environment, to provide an accurate-ish view of the world, and we also need for the robot itself to be able to formulate what it expects to see, based on its believed position in the world.

What the robot will do, is to build up a simulated view of what it expects, with matching colours, in a perspective 3d image. This simulation will be repeated in a few different ways – from a point to the left, a point to the right, forward, backward, etc. The robot will then compare the simulations to the “real” image received from the daemon, and correct its believed location based on which image is closest to the reality.

The simplest way to do this (I think) is to do a full-sized compare using a root mean-squared deviation to find the closest fit. That’s potentially too CPU-intensive for real-time work, though, so it might be quicker to do a quick-fit test of areas around the expected answer to see which simulations should be tested in full.

For example, let’s say you have a grid of 7×3 ( [0,0] to [6,2] ) possible locations to test (assuming a test of only two dimensions, and that the camera is not expected to shift much up or down), the center location [3,1] is the one that the robot expects to be the best fit. Let’s say the real camera returns a 320×200 image in 24b colour. We can change this to 8b grayscale and shrink it to 80×50 for a quick test. The quick test will do a quick compare between the locations in the grid. Full tests will then be done in the area that provides the best matches.

Of course, the algorithm needs work, as it’s just pure thought at the moment, but I’ll see how it turns out as the program progresses.