Monthly Archives: October 2004

finding the X,Z coordinate of a point visible in two parallel cameras

Okay – forgive the messy maths. It’s been a decade since I wrote any serious trigonometry, so this may be a little inefficient. All-in-all, though, I believe it is accurate.

I thought I would need a third image in order to measure Z, but figured it out eventually.


Consider fig.1. In it, the blue rectangle is an object seen by the cameras (the black rectangles).

The red lines are the various viewing “walls” of the camera – the diagonals are the viewing angle, and the horizontal lines are the projected images of the real-life objects. We assume that the photographs are taken by the same camera, which is held parallel in the Z axis.

The green triangle portrays the distance between the camera view points, and the distance to the point we are determining. We cannot determine whether the final measurements will be in centimetres, metres, inches, etc, but we can safely assume that no matter what final measurement unit is used, it will be a simple multiple to correct all calculations arising from this.

The only numbers that we can be sure of are the camera angle C, the width in pixels of the view screen c, and the distance in pixals from the left of the view screens that the projections cross the line ch and j. As the final measurement unit does not matter, we can safely assign d any number we want.

What we are looking for are the distance in d-type units for x and z. We can safely leave y out of this for now. It’s a simple matter to work that out.

Right… here goes… We’ll start with the obvious.

G = (180-C)/2

As there are two equal angles in each of the camera view triangles, the line f is one wall of a right-angled triangle. Therefore, using the trig. rule that Tan is the Opposite divided by the Adjacent,

Tan(G) = f/(c/2)

Which, when re-ordered and neatened, gives us


We can then work out B using the same rule.

Tan(B) = f/(h-(c/2))

Which neatens to


Using similar logic,


So, now that we have A and B, we can work out D


An because of the rule that angle sines divided by their opposite sides are the same for any triangle,

d/Sin(D) = a/Sin(A)


a = (d*Sin(A))/Sin(D)

And via more simple trig,:

z = a*Sin(B)


x = a*Cos(B)


So, to get z with one equation:


Obviously, I’ll optimise that a little before writing it into my program…

first stab at an algorithm to get 3D objects from two images

Update: This post has a few errors in its methods, but describes what I was thinking at the time. Check this out for a more accurate method.

This post is myself putting into words some ideas on my present project – creating an internal 3D model from images.

First off, take two photos of the location you want to map. The photos should be taken from a few steps apart so there is some difference between near and far objects. It is not necessary for the distance to be exact. What isnecessary, is that the cameras be parallel. For that, try aiming the camera through the room, to some distant focal point.

The next step is to find the points where the two images overlap best. See the previous post for how to do this.

Here comes the fun part – we need to come up with a 3D model which somehow matches both of those photos, including their differences.

The difficulties that I forsee here mostly have to do with lighting differences and certainty. A photograph of an object may look different based on time of day or angle or a myriad of other things, so you can’t just do an exact comparison – some amount of error must be allowable. This brings up uncertainty – If two pixels next to each other look alike, they may confuse the build engine.

Anyhow – we’ll press on and build those bridges when we come to them.

The simpler items to create will be those that overlap exactly with the separate viewpoints.

  1. Load up the images, and run them through the overlapper.
  2. Assume the distance between the POVs is equal to the X offset. We don’t need to use real-world measurements for this. Assume one pixel of the original images is equal to one “unit” in the internal world model.
  3. Crop everything which does not overlap.
  4. Assume the camera’s view angle is about 45° – makes the maths simpler…
  5. Assume the average distance (Z) in the overlapped area is equal to the X offset. We can fix this later with a third photo.
  6. For each visible pixel in the overlapped area in image 1, if there is a correlation (allowing for an error of, say, 1%) in image 2 at that exact location
    1. Create a small square (two triangles) in the 3D model at that X/Y coordinate, with distance Z.
    2. The square should be large enough that it would cover the whole pixel. Mark that pixel as “done”.
  7. Optimise the created triangles, merging where possible.

The next part is more difficult – we need to find correlations with pixals that appear in one image but are not at the same point in the other.

  1. For each number (D) between 1 and half the width of the overlap
    1. For each pixel in image1 which is not marked as “done”
      1. If the pixel in image1 matches with reasonable certainty to the pixel D pixels left or right of the corresponding position in image2, then create a 3D block at the calculated distance (simple trigonometry).

There will be pixels that do not match. They can wait for further photos.

Pretty easy, when I put it like that, right? I’m sure there will be some headaches ahead.

finding the points where two similar images overlap

Take a look at this:

So? It’s two photographs that overlap. What’s so special about that?

What’s special is that the overlapping was done by my computer. It’s the first step towards computer vision for my robots.

See, in order for my bots to know where they are visually, they must be able to compare a real-life camera view to a rendered internal model. The above overlap thing is a first step towards that internal model.

What happened to get the overlap coordinates, is that the two separate images were compared to each other at different points. The closest match (according to the algorithm I created) is the overlap you see above.

It is possible, with this method, to determine how far away most things are, making it possible to build up a 3D model of the world through photographs.

The script is written in PHP at the moment (available here). I want to do a bit more debugging and optimisation on it before porting it to C.

The next step will be to build up a simple 3D copy of a room based on photographs.

robot localisation

I’m compiling KDevelop at the moment, in preparation for leaping into my simulation project.

While I’m at it, I’ve been looking into what other people have done to solve some of the problems I’m likely to face for this project.

For example, here is an innovative way to quickly distinguish one large area from another based on histograms.

The method I’ve been thinking of, though, is a little different…

The simulation daemon will need to simulate whatever sensors the real robot has. As I am planning on building the robot with cameras, the daemon must therefore have a ray-tracer built in.

The ray tracer will not need to be extremely correct, as this is, after all, just a training exercise designed to give the robot most of the instincts that it needs.

So anyway – the robot will be initialised with a belief that it is one certain area of the grid map. Imagine this as the machine being turned on at a “home” area. The robot will be familiar with this area and will be virtually certain of what position the camera is in, and what angle, etc.

From that certainty, we can then make up a trainer designed to give the robot a sense of balance and location.

For instance, if the robot moves, then it will have a fair idea of how far it has moved, and in what direction. The problem that I’m trying to solve here, though, is that there is no certainty that the robot has actually moved that far – its sense of speed may be wrong, or its steering may be a little awry, or maybe it has slipped on a smooth patch of ground.

What we need is a second opinion so the robot can be certain, which is the purpose of the camera.

The base unit will always be assumed to be moving in an environment that has been manually created specifically for that purpose. Because of this, we can be certain that a few focal points, or landmarks will be present. For instance, the robot will assume that based on where it thinks it is, the path will look a certain way, including any expected junctions.

So, we need a ray tracer for the environment, to provide an accurate-ish view of the world, and we also need for the robot itself to be able to formulate what it expects to see, based on its believed position in the world.

What the robot will do, is to build up a simulated view of what it expects, with matching colours, in a perspective 3d image. This simulation will be repeated in a few different ways – from a point to the left, a point to the right, forward, backward, etc. The robot will then compare the simulations to the “real” image received from the daemon, and correct its believed location based on which image is closest to the reality.

The simplest way to do this (I think) is to do a full-sized compare using a root mean-squared deviation to find the closest fit. That’s potentially too CPU-intensive for real-time work, though, so it might be quicker to do a quick-fit test of areas around the expected answer to see which simulations should be tested in full.

For example, let’s say you have a grid of 7×3 ( [0,0] to [6,2] ) possible locations to test (assuming a test of only two dimensions, and that the camera is not expected to shift much up or down), the center location [3,1] is the one that the robot expects to be the best fit. Let’s say the real camera returns a 320×200 image in 24b colour. We can change this to 8b grayscale and shrink it to 80×50 for a quick test. The quick test will do a quick compare between the locations in the grid. Full tests will then be done in the area that provides the best matches.

Of course, the algorithm needs work, as it’s just pure thought at the moment, but I’ll see how it turns out as the program progresses.

rubber room, here I come

I got my initial psychiatric evaluation today. I talked to the psychiatrist for about an hour and a half, and told me it was depression, with a possibility that it is manic depression (bi-polar disorder).

It’s kind of relieving to have an actual diagnosis confirmed. I was half-afraid that the shrink would find nothing wrong with me and send me away. I’ve lived with this for ten years now, and I was getting bloody tired of doctors not finding anything wrong.

After the meeting, I was sent for a blood test to find out if there was a physical reason for my tiredness. I warned the nurse that I have a small phobia of injections and especially blood tests. She gave me the option to either sit, or take a bed. Being an idiot, I chose to sit.

At first, I tried to just distract myself by reading the charts on the walls, while she got on with my arm. After a minute or so, though, I noticed a tingling in my fingers, and felt increasingly light-headed. I was a bit worried so I asked if she was almost done.

I think her answer was “Just trying to find a vein”. I only barely heard it, though, as my body chose that moment to faint.

I can only describe the experience of recovering from a faint as similar to being gently pushed awake from a very heavy slumber. I was disoriented, and took a few seconds to remember where I was.

The nurse said I had just turned white and started jerking. I was also sweating heavily, and felt weak and fragile for a while later.

With all the excitement, not a single drop of blood was extracted. I’m going to have to go back next week and have it done again. I was just not up to continuing with it today.

I’ve only ever fainted once before, and that was also from a blood test. In that case, I apparently just fell forward out of the chair I was sitting in, and it wsa just pure luck that there was another nurse there to catch me, or I might have given myself a bad head trauma.

That one was slightly different in feel to today’s episode – I remember it was as if my brain just expanded quickly and I got a “rush” before blacking out. I actually enjoyed that, I think.

gardenbot simulator – the beginning

Well, as I don’t know anything about robotics yet, and don’t have the cash to buy materials at the moment anyway, I thought a good start would be to write a simulator for the robots, so I could practice getting the intelligence right without worrying about accidental breakages and other physical problems.

Another bonus is that I hope to be able to run the simulator many times “real” time, so I can accomplish days of learning in just hours.

So – here is the plan.

I want to visually monitor the robots’ progress, so I need to learn graphical programming. Actually, I just realized I don’t – I can simply output interesting items to either png or mpg format and watch via browser (I’m more comfortable with web development than “pure” programming). Let’s call this an “environment daemon”.

I want each robot to be a standalone program, so I will need a way for them to get information about the environment they’re in. For this, I’ll write sensor functions for them, which will fake the data by requesting information from the environment daemon. A key feature to this is that; when I finally build the actual robots, I can simply replace the fake sensors with real ones, and hopefully the environment simulator will be close enough to reality that there will be not much work to do to equal the simulated progress.

The really interesting part for me at the moment is the base unit intelligence. The “base unit” will be the part of the robot which moves the whole thing around. I envision it resembling a stretched out version of the armoured tank thing from Aliens II – about 3 inches high, about 6 inches wide, and about 18 inches long.

I’ve been considering this unit for the last few weeks. In the real model, the unit will receive commands from a central garden server, which automates the tasks needed doing. The unit will be able to request a simple map of the area from the server, then should be able to plot a course to its destination which uses only designated travel areas (there will be a pathway-building robot as well at some point), not crashing with any other units, and not turning at speeds that would spill whatever is being carried.

Most of this is pretty simple – the only difficult part is the last bit – knowing what is the perfect speed for any operation. For that, I think I’ll need to train a small neural network. That’s what’s exciting about this. I love solving these problems, and I love watching my creations learning.

I was initially thinking that the training would be physically difficult – I intended to build a small box which would have contained a little liquid, and sensors that could tell when the liquid was tilted. The robot would then be instructed to drive from one point of the garden to another, along a path which would require a few turns. This would be done first slowly, then faster and faster as the robot learned what was safe to do and what was not.

Think of the above as a waiter carrying a small tray, on which a full glass of wine rests. Now imagine the waiter moving slowly from one part of a room to the other, navigating around people. Then, as the waiter grows more confident in his/her skill, the traversal speed increases.

Obviously, though, it would be much quicker to train the robot in a simulated environment! A single run, which may have initially taken five minutes to complete, could be completed in a few seconds, and with much lest energy cost.

So, there’s the plan! I’ll start on that probably Monday evening (I have guests this weekend!), and will hopefully have something up by the end of the following week which people can download and play with.

The only real difficulties that I see are my not knowing anything about network programming in C, my rusty C skills (haven’t written any in five years), and the visual aspect of the environment daemon. Everything else is just algorithms and neural nets, hopefully.