Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
[2.1] 3D Imaging, Analysis and Applications-Springer-Verlag London (2012).pdf
Скачиваний:
12
Добавлен:
11.12.2021
Размер:
12.61 Mб
Скачать

8

R. Koch et al.

1.3 The Development of Computer Vision

Although the content of this book derives from a number of research fields, the field of computer vision is the most relevant large-scale research area. It is a diverse field that integrates ideas and methods from a variety of pre-existing and coexisting areas, such as: image processing, statistics, pattern recognition, geometry, photogrammetry, optimization, scientific computing, computer graphics and many others. In the 1960s–1980s, Artificial Intelligence was the driving field that tried to exploit computers for understanding the world that, in various ways, corresponded to how humans understand it. This included the interpretation of 3D scenes from images and videos.

The process of scene understanding was thought of as a hierarchy of vision levels, similar to visual perception, with three main levels [38], as follows:

Low-level vision: early 2D vision processes, such as filtering and extraction of local image structures.

Mid-level vision: processes such as segmentation, generation of 2.5D depth, optical flow computation and extraction of regional structures.

High-level vision: semantic interpretation of segments, object recognition and global 3D scene reasoning.

This general approach is still valid, but it was not successful at the first attempt, because researchers underestimated the difficulties of the first two steps and tried to directly handle high-level vision reasoning. In his recent textbook Computer Vision: Algorithms and Applications [49], Rick Szeliski reports an assignment of Marvin Minsky, MIT, to a group of students to develop a computer vision program that could reason about image content:

According to one well-known story, in 1966, Marvin Minsky at MIT asked his undergraduate student Gerald Jay Sussman to “spend the summer linking a camera to a computer and getting the computer to describe what it saw”.15

Soon, it became clear that Minsky underestimated this challenge. However, the attempts to resolve the various problems of the three levels proved fruitful to the field of computer vision and very many approaches to solve partial problems on all levels have appeared. Although some vision researchers follow the path of cognitive vision that is inspired by the working of the human brain, most techniques today are driven by engineering demands to extract relevant information from the images.

Computer vision developed roughly along the above-mentioned three levels of vision. Research in low-level vision has deepened the understanding of local image structures. Digital images can be described without regard of scanning resolution by the image scale space [55] and image pyramids [50]. Image content can be described in the image domain or equivalently in the frequency (Fourier) domain, leading to a theory of filter design to improve the image quality and to reduce noise. Local

15Szeliski, Computer Vision: Algorithms and Applications, p. 10 [49].

1 Introduction

9

structures are defined by their intrinsic dimension16 [4], which leads to interest operators [20] and to feature descriptors [6].

Regional relations between local features in an image or between images are powerful descriptions for mid-level vision processes, such as segmentation, depth estimation and optical flow estimation. Marr [38] coined the term 2.5D model, meaning that information about scene depth for a certain region in an image exists, but only viewed from a single view point. Such is the case for range estimation techniques, which includes stereo, active triangulation or time-of-flight depth measurement devices, where not a full 3D description is measured but a range image d (u, v) with one distance value per image pixel. This range value, along with some intrinsic parameters of the range sensing device, allows us to invert the image projection and to reconstruct scene surfaces. Full 3D depth can be reconstructed from multiview range images if suitably fused from different viewpoints.

The special branch of computer vision that deals with viewing a scene from two or more viewpoints and extracting a 3D representation of the geometry of the imaged scene is termed geometric computer vision. Here, the camera can be thought of as a measurement device. Geometric computer vision developed rapidly in the 1990s and 2000s and was influenced strongly by geodesy and photogrammetry. In fact, those disciplines are converging. Many of the techniques well known in photogrammetry have found their way into computer vision algorithms. Most notably is the method of bundle adjustment for optimally and simultaneously estimating camera parameters and 3D point estimates from uncertain image features [51].

Combining the geometric properties of scene objects with image based reflectance measurements allows us to model the visual-geometric appearance of scenes. There is now a strong relationship between computer vision and computer graphics that developed during the last decade. While computer graphics displays computer-defined objects with given surface properties by projecting them into a synthetic camera, vision estimates the surface properties of real objects as seen by a real camera. Hence, vision can be viewed as the inverse problem of graphics. One of the key challenges in computer vision is that, due to the projection of the objects into the camera, the range information is lost and needs to be recovered. This makes the inverse problem of recovering depth from images especially hard and often ill-posed. Today, both disciplines are still converging, for example in the area of image-based rendering in computer graphics and by exploiting the computing capabilities of Graphics Processing Units for computer vision tasks.

High-level vision attempts to interpret the observed scene and to assign semantic meaning to scene regions. Much progress has been made recently in this field, starting with simple object detection to object recognition, ranging from individual objects to object categories. Machine learning is vital for these approaches to work reliably and has been exploited extensively in computer vision over the last decade [43]. The availability of huge amounts of labeled training data from databases and the Web, and advances in high-dimensional learning techniques, are keys to the success

16Intrinsic Image Dimension (IID) describes the local change in the image. Constant image: 0D, linear structures: 1D, point structures: 2D.