Michael Darrow

Artificial Intelligence

Final Draft

Robot Vision

         

Introduction

In order to create an intelligent robot able to survive and function within a human world, the robot would have to have the same sensory and cognitive functions as the being that created it. Some of these sensory and cognitive structures would lie within the realms of vision, touch, taste, smell, and speaking capabilities. These are the basic functions needed to survive, alone, in this human built world. When creating a robot capable of complete human interaction the robot must be able to adapt to the changes that effect humans without having a viable input of data entry. This ability to interpret environment would be impossible without vision and the ability to process that environment. Thus, modern day AI robots relate to human development in terms of vision and processing of vision but can be improved on in terms of their functionality within the processing of color and of space.

Human Vision

The two main aspects of vision for a human are the ability to see color and depth and space. The importance of color in a human world is best exemplified by a stoplight, green for go, yellow for slow, and red for stopping. Now say that instead of there being three separate lights there is one light that simply changes color. Here lie a couple of questions; first, what defines color and how do humans determine which color is which. Second, which colors stand out to humans and why and lastly, how does a human world revolve around color.

Color

To answer the first question of color interpretation you must understand the psychology of naming a color. What do we mean when we say the sky is "blue" and roses are "red?" How do you explain a blue sky to a blind man just given sight? You just simply show him blue. The reason it is blue is because it has more blue than orange, and the reason orange is orange is because it has more orange than red. Psychological perception of color for humans is a subjective one, therefore the perception made by one person might be different than another but luckily there is a high degree of universality within the color spectrum.

Within this universality lies the ability to understand why certain colors stand out rather than others. In order for this to succeed you must narrow down the universality of color. Warm colors and cold colors are an example of this universality. "Warm colors generally include magenta, red, orange, yellow and yellow green (about.com)." They give us the feeling of warmth, coziness and invitation. Where as cool colors, such as violet, blue, light blue, cyan and sea green, are emotions associated with calm and peace to sadness, withdrawal and repression.

For instance, if you place someone with in a room with a fire on the one side and a pile of snow in the middle, the subject, depending on mood, will find a comfortable place near the center of the room. Having an equal balance of Warm and Cool, both temperature and color. A real life example would be the fashion industry; most ads revolve around this "spectral" line of color. You are either really hot or really cold or somewhere in the middle. Humans thrive within this line and our world is built around it, like I said before if you have a stoplight that changes color, green to yellow to red, this covers all sides of this "spectral" line, green being one side and red being the other.

Depth and Space

Once this "spectral" line is defined we can move onto how the human mind interprets space and depth with in the line. Spatial vision is required by organisms to distinguish, identify and locate objects. Within these fields lie several factors that help distinguish, identify and the location of objects. It lies within the human's ability to understand size, the objects relation to the human, the relation of the object to other objects and the factors that can affect them, i.e. lighting changes and color changes.

A hypothesis, not a guess, in size can lead to several determining factors in a human's judgment of that object. For instance, weight can be "guessed" based on the size of the object. When I say "guessed" I'm leaving out objects that are large in size but have no weight, for this type of analysis a credible knowledge base is needed. But basically the bigger something is, the more weight it will have within that space. Size is also a determining factor in the object's relation to the subject viewing it. The bigger the object, the closer it is, the smaller the object, the further away it is. Once the distance between the subject and the object is determined, a "guess" can be made to how much distance there is between that particular object and the other objects within that space. Even if the subject is moving towards or away from the object, a "guess" is made to how far or close you are to the object at any given position with in the movement.

Restrictions

In human vision there are still several factors that can come into play that affect the how a particular object is viewed. These factors include lighting, shadowing and position. Under any conditions these "interferences" can be manipulated by focus and light diffusion. When a human needs to see a heavily lit object in the distance "the muscles of the around the eye control the shape of the lens, this enables the eye to focus, and also control the size of the pupil to control the amount of light that enters the eye (osiris)." So in other words, there are ways of determining what an object is no matter the circumstances, unless you count darkness. Of course darkness isn't a problem for a computer. A computer is capable of taking in any number of different types of images, such as night vision or full edge mappings. For the sake of this paper we will be looking at Massachusetts Institute of Technology's Kismet.

A Robot Example

In Kismet, "the robot's vision system consists of four color CCD cameras mounted on a stereo active vision head (www.ai.mit.edu)." He has two cameras one focused on the foreground and one on the background. These two "lipstick" cameras show the computer what to pay attention to and to compute distance estimation. Kismet is an infant like robot who maps the facial expressions and can recognize tones of a voice, which allows him to interpret a person's "feelings." He is so similar to an infant that he babbles and makes noises expressing a very "young" personality. The cameras do very much same thing our eyes do, they decide what to pay attention to, focus on it, and then react to it. Kismet's ability to focus in on moving objects and interact with them should make stationary objects easier to identify.

His ability to process the distance from him to an object is very similar to the way a human perception works. Like was said before, a human takes in the size of the object and the distance is "guessed." Kismet does the same thing but with a limit on the depth and closeness of the object. If you are two close to Kismet he backs his head or makes sounds until you are at a good focal distance. This restriction of the vision is a problem.

Conclusion

In conclusion, this writer feels that if you have a robot with the ability to centralize color and manipulate multiple focus distances, a logic board can be linked together to create an image association therefore identifying the object and movements. Being that a digital camera sees in Red Blue Green, make these the spectral line that all the other colors lie on. When a color is taken in, the computer checks its knowledge base for colors similar to that color. Then checks the same knowledge base on the size and structure of the image. If you have several cameras determining the color, the size, through an edge mapping algorithm, and the distance between the camera and the object, you can place these into a reasoning system and a "guess" can be made to what the object is and its intentions.

The strength of this conclusion is that if you have a smaller amount of objects to search within a knowledge base the quicker you can come up with a comparable match. And if you have other cameras judging size, heat, texture and even depth the robot can narrow down the object into a viable category capable of interpretation and understanding. And to solve the problem of a huge knowledge base you can narrow down the memory into a more general base. For instance, if you have a building and then have another building, you don't need both pictures in the base you just need a comparable "snapshot."

Being that our world is a world built with color almost everything within the world can be placed in some kind of color category, now with this color category broken down into R B G the world can be interpreted faster and allow the Robot to react faster. And who says the robot has to be an adult; after all, we have learned everything we are faced with, so why not teach the child robot the same lessons.

The weaknesses of this conclusion are the universality of color and the memory capacity. Within the universality of color lie many different levels of subjective interpretation. Red in one culture could have a totally different meaning than that of another culture. The answer to this problem would involve narrowing down the colors of the environment that the robot lives in or giving the robot a chance to learn and explore the culture before letting it go. Along with teaching the robot, if we provide it with a "general" knowledge base of "general" objects it should narrow down the searching algorithm and provide a "guess" to what that object is. This also answers the problem of memory.

Finally, all this combined enables the robot to function within any human environment. If you have a robot car trying to determine how to stop at a light that changes color you need apply all these factors. The more we make robots like humans the easier it will be for them, the robots not the humans, to get along as if they really did exist as free thinking, emotion recognizing and even smart robots. And if we allow the robot to grow and learn it will be able to know what to do and not to do, just like human beings. Now as for humans getting along with robots, well they can't even get along with other humans but that's a whole different paper.

Bibliography

www.ai.mit.edu/projects/socialble /overview.html

 

http://psychology.about.com/library/weekly/aa031501a.htm?terms=vision

 

Regan, David. Human Perception of Objects. Sinauer Associates inc., 2000.

 

Davis, Steven. Color Perception Vol 9: Vancouver Studies in Cognitive Science.  Oxford University Press, 2000.