Be the first to like this page
Why I am interested in Vision
My goal is to design machines that can interact intelligently with humans and the environment.
What does “intelligently” mean? I don't know your definition of intelligence. To me, all humans have it, dogs have plenty of it; mosquitoes somewhat less. At the very bottom of the scale I place things like chess-playing programs, Eliza and other chatterbots designed to pass the Turing test: In a survival challenge, they would be the first to go.
What about plants?
Plants have
sensors (they measure light, temperature, pressure etc.), they perform
actions (grow, turn towards the sun), and make
decisions (sprout, flower), but they do not
travel. Merely being capable of asensing-and-action cycle (feedback) is not indicative of intelligence, but rather of “reactive” or “automatic” behavior.
As it turns out, the ability to move is fundamental to intelligence
Plants never developed a central nervous system, and in nature one can find curious organisms, called Tunicates, that have some sort of a brain (ganglion cells) and are able to move, but then settle on a rock, become stationary and thence swallow and digest their own brain.
Another necessary condition for intelligence is the ability to “see”
In general, data analysis (i.e., the process of breaking down the data into pieces by making intermediate decisions that are unrelated to the task at hand) is counter-productive, in the sense that any decision or control action based on such intermediate statistics yields no advantage (“Data Processing Inequality”, or Rao-Blackwell Theorem). As my colleague Ying-Nian Wu puts it,
“you cannot create information by torturing the data”.
However, when a decision has to be made in spite of nuisance factors such as
viewpoint,
illumination,
occlusions and
quantization, then there is a benefit in extracting from the data some kind of ``representation'' that contains all the “information'' within, no more and no less.
What is “information” for the purpose of decision and control?
In [2] we have shown that even if the data (images) had infinite-resolution, and the nuisances (viewpoint and illumination) were infinite, it is possible to extract an intermediate representation that (a) is invariant to viewpoint and illumination (so it contains only ``information''), (b) is a sufficient statistic (so it contains all the ``information''), and (c) it is discrete (it is supported on a zero-measure subset of the image domain). So, one can abstract discrete ``symbols'' from continuous data, and lose nothing when it comes to using it for decision and control. Of course, if one were to use it for compression or transmission, the two tasks implicit in traditional Information Theory, then there be a sure loss, but our goal is not to store or transmit data, it is to use it for decision and control. In fact, the coding length of this internal representation [2] is what I have suggested as a definition of Actionable Information, following ideas that date back to J. J. Gibson that stand in constrast to the traditional notion of Information as Entropy or coding length of the data, pioneered by Wiener, Shannon, and Kolmogorov.
But while viewpoint (away from occlusions) and illumination (away from cast shadows) are invertible, visibility phenomena (occlusions and cast shadows) are not invertible for a passive observer, and therefore one cannot apply the results in [2]: If one is given an image where object “A” is occluded by object “B”, no amount of “torturing” of the image will give us back Object A. However, occlusions, like quantization, are invertible for an active observer: Want to see object A? Move around B! Want to resolve structure beyond the quantization limit? Move closer! As Gibson put it, “we see in order to move, and we move in order to see.”
This helps explain the so-called ``signal-to-symbol barrier'' whereby we, living organisms, measure physical phenomena essentially in the continuum, take actions also in the continuum, and yet we postulate the existence of some kind of ``internal representation'' that is intrinsically discrete and can be manipulated with the rules of logic or probabilistic inference. If we did not have occlusion of line of sight, and the ability to move, I am not aware of any argument other than [1] that could explain why there would be a benefit in having a discrete internal representation.
It is also worth noting that about half of the primate cortex is devoted to processing visual information. So, even when you are absorbed in the most exoteric thoughts, most of your brain is busy making sense of what comes from the eyes.
This explains my interest in Vision.
The Link with Control
Sensing and control go together. The notion of Actionable Information is the knot that ties them. Without control, it is not possible to close the gap between
Actionable Information and the
Complete Information [1]. Therefore, active exploration is key to the development of internal representations, and cannot be done without control.
[1] Actionable Information in Vision. Proc. of the Intl. Conf. on Comp. Vis., 2009 (UCLA-CSD-090007)
[2] On the Set of Images Modulo Viewpoint and Contrast Changes (with G. Sundaramoorthi, P. Petersen, and V. S. Varadarajan). In Proc. of the IEEE Intl. Conf. on Comp. Vis. and Patt. Recog., 2009