How Do Humans Do It? | Introduction
First Principles of Computer Vision is a lecture series presented by Shree Nayar who is faculty in the Computer Science Department, School of Engineering and Applied Sciences, Columbia University. Computer Vision is the enterprise of building machines that “see.” This series focuses on the physical and mathematical underpinnings of vision and has been designed for students, practitioners, and enthusiasts who have no prior knowledge of computer vision.
Video Summary & Chapters
No chapters for this video generated yet.
Video Transcript
Before we begin to develop tools to help us solve vision problems,
it's worth taking a look at how our human visual system works.
So here you see the human eye and the visual cortex.
Here you see the eye in the front.
The eye has a lens which projects the three-dimensional world
onto a two-dimensional image.
This two-dimensional image is being formed on the retina,
which is in the back here.
The retina, by the way, has some cells within it that do some early visual processing.
So there's a little bit of information reduction.
that's happening on the retina itself.
And then the reduced image, so to speak, travels through the optic nerve right here and goes
to the lateral geniculate nucleus, which acts like a relay.
It's able to figure out what information needs to go to which part of the brain.
So it sends that information then back to the visual cortex right here.
And you can see that different parts of the visual cortex have been given different colors.
And that's to show you the parts that are responsible for analysis of shape, of color,
of motion, of texture, and so on and so forth.
So there's a lot we know about the human visual system, and yet it's amazing to me how little
we know.
We know, for instance, roughly where motion analysis takes place, but we have no idea
exactly what the circuit diagram, if you will,
is of that particular part of the brain.
We don't know how the neurons are connected to each other
and what their weights are.
So we don't have a detailed architecture
or a circuit, if you will, that can be mapped to silicon
so we can emulate the human visual system.
So in short, vision is easy for us,
but we're very far from understanding
how we actually do it.
So what do we do?
Well, we reinvent.
This might sound unfortunate to you, but not quite. As you can imagine, there are many applications of vision that require
functionality and precision that go well beyond what the human visual system is capable of.
While human vision is remarkable in its versatility and is able to cope with many complex real-world situations,
it is more of a qualitative system than a quantitative one.
For instance, if you want to know how many millimeters this pencil is,
in terms of its length, the human visual system can only give you very rough estimates.
Such estimates are not useful in many domains, such as factory automation or medical imaging.
While no computer vision system has yet been developed that is as versatile as the human one,
there are many computer vision systems in use today
that demonstrate much higher precision and reliability than ours.
In short, for many tasks that require vision,
the human visual system may indeed be the wrong system to emulate.
Furthermore, human vision is more fallible than we like to believe.
You see, when you and I perceive something incorrectly,
we do not have a voice in our head telling us we are wrong.
We see what we see and we believe it to be accurate.
To demonstrate this, let's take a look at some well-known optical illusions.