This essay deliberately delimited the discussion since otherwise no reasonable synthesis of the abundant material appeared feasible within the available space. Restriction to CV allows to exclude, for example, microwave radar, ultrasound, and range sensors. Analogously, subjects such as off-road driving, mobile indoor robots, vision-based guidance of vehicles on railways, of aircrafts or ships have not be taken into consideration. Restriction to CV for a DSS also allowed to exclude the treatment of road surveillance, for example in the context of traffic management systems. This implied that a DSS is considered to be an essentially vehicle-autarchic agent.
An attempt has been made to explicate the assumptions which underly certain categories of approaches: such assumptions are reformulated as components of a model required for a model-based CV approach. Models for bodies formalize expectations of the designer of a CV system about what should be searched for, segmented, and tracked in an image sequence. Similarly, motion models formalize expectations about systematic changes of relative pose between a body and the recording camera system - or the vehicle carrying the camera(s).
Explicating assumptions underlying a CV system by reformulating them as models of bodies and their properties, of spatiotemporal relations between bodies and of their change with time, prepares for the next abstraction step: association of geometric results with conceptual descriptions. Explicit, system-internal models for bodies and their movements facilitate to associate spatiotemporal gray value variations with states and state-changes of traffic participants recorded by a videostream. This, in turn, requires to introduce the concept of a scene-agent - a movable body with additional attributes, observable by CV in the recorded scene - which has to be clearly distinguished from the concept of a system-internal agent used to conceive and realize a perspicuous internal structure of a DSS. Scene-agents exhibit visible behavior due to their property to concatenate elementary, movement-related activities in a goal-oriented manner. This implies that the DSS even introduces models of ways in which different types of movements are concatenated: models for different types of motion, their duration and change encode expectations about the behavior of scene-agents.
The abstraction step from geometric results to conceptual descriptions necessitates, too, to take into account uncertainty due to sensor and transducer noise, artifacts of the image evaluation process (caused by, e.g., oversimplified assumptions or numerical inaccuracies), as well as due to inherent vagueness of concepts used for the automatically generated description.
The discussion of the preceding Section implicitly relied on a scenario which differentiated vehicle guidance into
A semi-logarithmic plot of some measure of computing power provided by a VLSI CPU versus its first year of availability exhibits a surprisingly linear relationship over two decades, indicating an exponential increase in computing power at a rate of close to 50 % per year. This adds up to an increase by an order of magnitude every five years, at about constant cost, power, and space consumption. If an algorithm has been designed or hand-tuned for a particular CPU or some special purpose processor, the design decisions on which such efforts have been based must be re-evaluated about every two to three years in order to exploit the increase in computing power accrueing due to the technological innovation. For an algorithm amenable to rational analysis, this effort can be considerably smaller, thus providing a substantial competitive edge in the longer run even if such an algorithm might be somewhat slower initially. This argument becomes relevant as soon as the initial performance begins to reach threshold requirements for real-time experimentation as it is the case since the mid-nineties: the computing power of a modern VLSI CPU begins to become comparable with that required for real-time elementary treatment of B/W interlaced video signals (576 lines by 768 pixels, each at 8 bit gray values). `Elementary treatment' implies the detection of local gray value transitions and initial selection steps such as non-maximum suppression in, say, a or pixel environment. This premise formed the basis for the argument to forego a discussion of - in general oversimplified - approaches such as thresholding a gray value picture, followed by carefully tuned processing of binary images: it is well known that such approaches are usually very brittle.
In summary, Computer Vision has proven that it provides a reliable basis for solution approaches to be incorporated in Driver Support Systems. Technology has advanced to the point where the emphasis in algorithmic development begins to shift from ad-hoc approaches - essentially enforced previously by insufficient computing power available onboard a road vehicle, thus presenting an invitation to cut all feasible (and infeasible) corners - towards analyzable, well engineered approaches. At the moment, it appears too early to convert such approaches into products for the mass market. This may change soon, however, as an extrapolation of the computing power becoming available within the foreseeable future allows to argue. In view of the complexity of the overall task, experience is required to properly exploit this computing power. It appears to be high time to prepare if one intends to enter the market. This essay attempts to convince the reader that the scientific basis for such an endeavor has come within grasping distance.