Autonomous Driving: Context and the State-of-the-Art

In this section the state-of-the-art in Intelligent Vehicles will be presented from a vehicle navigation perspective as these achieve autonomous navigation capabilities. The section is structured as follows:

  1. Motivation: The motivation to this ongoing transformation of modern vehicles are presented in terms of usage, safety and external factors such as fossil-fuel constraints, pollution.

  2. Vehicle navigation functions: The state-of-the art review is formulated in terms of vehicle navigation functions to focus the section on the machine intelligence and decision-making processes that are being developed and introduced to transform modern vehicles into connected platforms with autonomous navigation capabilities. It addresses the issues of autonomy, driver needs and communications. That is, it formulates the vehicle onboard intelligence as a navigation problem and thus defines the functional needs for vehicles t demonstrate autonomous navigation capabilities.

  3. Related vehicle techniques: Current developments have been classified under three different perspectives:

    1. Driver Centric addresses systems that seek to increase the situational awareness of drivers providing different types of driving assistance systems that include the driver in the decision making process.
    2. Network Centric addresses the use of wireless networks enabling the sharing of information among vehicles and in the infrastructure creating awareness of the drivers and machines beyond what standalone vehicle systems could observe.
    3. Vehicle Centric addresses systems that seek to convert vehicles into fully autonomous vehicles, with the driver outside the control loop.

    These different perspectives will be defined, presenting current developments in academia and industry.

  4. Future developments: A perspective on future developments and how these technologies could be adopted taking account cost, legal and societal constraints will be provided.

Societal, Technological and Economical Motivators

Vehicle Intelligence and the Navigation Functions

From the driver's perspective and use of any level of computer controlled functions (e.g., for comfort, safe, and networking), vehicle navigation functions could be characterized as consisting of four basic functions: Mapping, Localisation, Motion, and Interaction. These answer four basic functions: Mapping, Localisation, Motion, and Interaction. These answer four basic navigation questions: Where am I? Where I can move? How can I do it? How do I interact?


Vehicles traverse road networks that are shared by other entities like pedestrians, and other vehicles that are expected to obey a set of pre-agreed traffic rules. Road networks are environments where unpredictable to events might occur, where different actors have different levels of driving skills, and where human errrors are the cause of most accidents. If the cause of an accident is due to motion without driver intervention, then the vehicle manufacturer might be liable, by contrast if the driver remains in the control loop the liability will be with the driver. This is very important in terms of reliability, safety and overall system integrity; it reflects the reluctance shown on the automation of certain driving tasks by vehicle OEMs.

Localisation. This function can be defined as knowing the vehicle pose (position and orientation) with respect to an absolute or relative reference coordinate frame or determining the whereabouts of the vehicle. Global coordinates such as those used in Global Navigation Satellite Systems (GNSS) like GPS provide absolute information. The relative location of a vehicle could be expressed with respect to a frame at a road intersection, or with respect to other vehicles. Absolute location estimations rely on weak radio signals from constellations of GNSS satellites which can be easily occluded an subject to errors due to noise and disturbances. To compensate for these errors different dead reckoning, fault detection and fusion algorithms are used to estimate the vehicle location by fusing data from GPS receivers with data from exteroceptive and proprioceptive sensors. While good solutions exist, these rely on costly equipment like navigation level Inertial Navigation Units (IMU) or external RF corrections like RTK, and thus deployable solutions on passenger vehicles remain a challenge.

The "where am I" question is represents a fundamental requirement for vehicle navigation. Knowing the coordinates of a vehicle position is insufficient for vehicle navigation. It is necessary to know the context that is to project location information on a map that will provide the vehicle driver or vehicle intelligence the ability to relate the whereabouts of the vehicle in the road network.Currently digital maps are commercially available and used extensively in onboard vehicle navigation system; These maps hold road geometries and attributes associated to the links and nodes representing the road network. The projection of the vehicle into the digital maps uses map-matching techniques that take into account errors on the location estimates as well as on the digital maps.

The location of a vehicle for autonomous navigation in all conditions is a challenge, as this is dependent on the absolute location estimates, the quality of the digital maps and the map-matching algorithms. For autonomous vehicle guidance it is thus not only a localisation problem but is also on determining with certainty the context where the vehicle evolves. Digital models of the environments are fundamental for autonomous vehicle navigation.

Vehicles can be localised at the same time as building maps that are used for navigation purposes by using the Simultaneous Localisation and Map Building (SLAM) approach that minimises or bypasses the need for GNSS signals. New approaches consider maps as probability distributions over environment properties rather than fixed representations of the environment at a snapshot in time. The environment is modelled as a probabilistic grid, instead of a spatial grid, approach that allows for the reduction in uncertainty. By storing maps as probability models and not just expected values, it is possible to describe any environment much better. Contrary to the use navigation digital maps that have been built for driver guidance, navigation maps that are built concurrently as the vehicle localizes itself, represent the likely path that the vehicles will follow and thus are closer to the expected vehicle position.

Mapping. Vehicle guidance entails understanding the spatio-temporal relationship between the vehicle and its environement. This can be regarded as modeling and understanding the world from the driver or computer controller perspective. Human or Machine Perception provide information about the vicinity of the vehicle environment. This is stored and represented in an abstract map and used at different granularity levels. The environment model is then used to gain understanding of the vehicle relationship with its environment and decide "where can I move?" A driver perceives the environment which is to be traveled and builds a mental model using stored information of similar situations will be incorporated as well as subconscious recalls to driving legislation. These are used to gain an understanding of the situation and to decide what action to take to safely maneuver the vehicle. A driving assistance system or autonomous navigation system will follow a similar process. A simplified representation of this process when this is performed by a machine is shown in Fig 50.4. The world model allows for the decision-making process to occur. The constraints reside on the machine representation of the world which is constructed by perception process that is based on the observations made from a series of vehicle on board sensors, digital maps and the vehicle state (position, heading, speed, etc.)


The basis for building a world model originates from signals from the vehicle onboard sensors. To these signals different algorithms are applied in order to extract information on features and on other entities sharing the road network. The models are built to take into account limits in the perception process, the uncertainty associated to the data used and the temporal properties. These can be represented in the form of occupancy grids using a probabilistic tessellated representation of spatial information. That is, a grid that corresponds to stochastic estimates of the occupancy states of the cells in a spatial lattice. For this purpose, probabilistic sensor models are used extensively. The computation of the grids. can be done either at the lowest level, as in the case of the disparity space for stereo vision systems, information which is then transformed into a Cartesian space occupancy grid that forms part of the world model. Within the robotics community there is a particular approach towards building a world model, this is know as the Simultaneous Localisation and Map Building (SLAM) problem, in which the robotic device as it moves builds a map of its environment while localising itself. It is possible to combine this approach with the use of digital maps to facilitate the construction of the map and localisation estimate.

A world model and the location of the subject vehicle in its a basic requirement for autonomous vehicle navigation.

Motion. This function can be defined as a series of tasks (including path planning to reach the destination and for obstacle avoidance, and, and vehicle control) that enable the platform to move safely and efficiently. Determining the vehicle trajectory comprises two tasks.

  • Local path planning that relates to the immediate motion of the vehicle for obstacle avoidance.
  • Global path planning indicates the path that vehicle is to follow from its current position to its destination using stored information on the road network and associated attributes.

The results of the local planner are used to actuate the vehicle. The motion function in vehicle navigation answers to the question: "How can I do it?" It is important to note that determining the vehicle motion depends on the system capability to perceive as far as possible in order to anticipate situations that might represent risk. This is difficult if only onboard vehicle sensors are used due to limits in their field of view and layout.

Vehicle actuation is difficult due to safety constraints and cost. The vehicle heading and speed need to be controlled to create a path that avoid any obstacle perceived by the onboard sensors.

  • Currently, the Electronic Stability Programmes (ESP) are the most used to control vehicle stability once sudden accelerations occur so as to avoid slippage.
  • Longitudinal speed control included as part of Adaptive Cruise Control (ACC) that allows a vehicle to follow another at a safe distance is an example of computer controlled motion.

Currently vehicle actuation is more and more under compute control, making the automation of vehicles more likely. Typical examples are the automated parking systems commercialised by several vehicle OEMs.

The Localisation, Mapping and Actuation functions have a high level of interaction. At the vehicle level, they conform to to an interdependent complex system evolving in a highly unpredictable environment. Two observations can be asserted: A model of the world and its understanding is what determines the level of intelligence that can be embedded for decision making purposes. The construction of the model is limited by the size of the area which could be sensed, the limited field of view of the onboard vehicle sensors. Architecture for vehicle navigation that is centred on a representation of the world that the vehicle is to traverse is one of the basic requirements for automating vehicle navigation tasks. If the environment is known over large areas, it will be possible to anticipate what the vehicle could expect and plan accordingly.

A systematic representation that could respond to these requirements as proposed first by the 4-D/RCS architecture by J.Albus. It provides a layered representation of the world where each layer has different characteristics, in terms of size, accessing time, granularity of the information, etc. Figure 50.5 shows a representation of this architecture and the manner in which the layers are distributed. Each layer represents different features such as road geometric primitives, object tracks, object groups, etc. The lowest layer level is the closest to the vehicle, a small area having a fine granularity. In general, all the data captured by the exteroceptive sensors is written into this area.


The decision making mechanism will scan this area at rapid intervals, the granularity should allow for the representation of vulnerable road users (pedestrians). The underlying structure for this layer is given by the geometry and attributes found in standard digital maps. Higher level layers will have larger zones of interest where objects will be identified and attributed associated to them. The refresh rates will be slower and the resolution coarse. Information from other vehicles or infrastructure will be in general written on the upper layers so as to extend the situational awareness of the vehicle.


The concept of structuring the world model as formulated by J. Albus has been applied in a landmark project on Cooperative vehicles safety applications, namely, the SafeSpot as part of the Local Dynamic Map concept. The later forms today part of a discussion on standards for V2V and V2I applications are being developed. The technical Committee ITS STF404 is addressing the standardisation of the Local Dynamic Map (LDM).

The different applications and technologies related to autonomous navigation reside on a representation of the human process that makes possible an understanding of the vehicle situation with respect to its immediate environment. Anything that is not represented in this model will be ignored by the decision-making process and thus would lead to errors. Perception is a complex process that is limited by the physics of the sensors used, which leads to undefined areas, uncertainty in the measurements, and delays. One of the challenges in Intelligent Vehicle research is the construction of such a model and its interpretation in real-time under all types of driving conditions. The manner in which the first layer is represented is very important as actuation of the vehicle depends on decisions taken on the knowledge of the environment defined in this layer. _Occupancy grids_ are used for this purpose as they encapsulate the multidimensional information stored on them in order to represent uncertainty. Earlier work in this area considered that the road network was static; it was assumed that obstacles did not move. However in traffic conditions this is not the case. Today as vehicles have began to move in cluttered environments, the dynamics of the obstacles and the limits of the perception systems are being incorporated. Concepts such as the probabilistic velocity obstacle (PVO) approach applied to a dynamic occupancy grid are being used in order to infer the likelihood collision when uncertainty in position, shape and velocity of the obstacles, occlusions, and limited sensor range constrain calculations.


Different entities share the same road networks; these include vulnerable road users, powered vehicles, powered two wheelers and bicycles. Their behaviour is determined by their interaction and the constraints imposed by traffic rules. That is, the interaction represents the spatio-temporal relationship between all entities, which has the underlying objectives to avoid collisions, reduce driver anxiety and ensure traffic flow.

As vehicles move in a road network, they interact with each other and other entities. This can be considered as a social phenomenon dependent on the emotional state, and physical conditions of the drivers, weather conditions and the layout of the road network. Thus interaction depends on the context. Interaction occurs with pedestrians, other powered vehicles, powered two wheelers and bicycles. Statics have demonstrated that the interaction of pedestrian and driver demographic factors, and road geometry, traffic and environment conditions are closely related to conditions leading to accidents. Much work has been done in this area with results being incorporated into new vehicles as in the case of Mobileye's Pedestrian Detection systems. However, the question resides not only pedestrian detection, but rather in the manner in which these interact with vehicles, how they move and react. Once pedestrians are detected, their future paths are difficult to predict as it is necessary to estimate the collision probability, so as to prevent any physical interaction with the vehicle. This is a compound problem, when vehicles are close to pedestrians, it is likely that close gesture interaction occurs, for example a driver by watching the eyes and direction of observation of a pedestrian can understand that the later is aware of its presence. This level of interaction is very difficult using current perception systems.

Driving a vehicle implies taking decisions continuously based on the current awareness of the vehicle situation and its likely evolution. Therefore the ability to infer the intentions of the actors in a scene from the available cues is essential. When estimating a driver's manoeuvre intention, it is necessary to account for interactions between vehicles. Indeed, the context in which a driver performs a certain action completely changes the interpretation that should be made of that action. For example, if a vehicle is changing lanes: information about other vehicles, their relative speeds, accelerations, etc. should facilitate the inference of the drivers' intentions when this is associated to the context, it should enable drivers to infer better the intentions of other vehicles and thus improve the inference of risk situations. Because of the high number of possible scenarios at road intersections, and the complexity of interactions between vehicles in these areas, driver intentions, influence the decision making process.

  • If a vehicle as it enters an intersection locates itself in the right hand lane and has activated its indicators, the computer controlling the observer vehicle will infer that it is highly likely that the driver has the intention to turn right; accordingly the behaviour will be different if there was no inference on the driver intention.

Thus if manoeuvres of other vehicles can be predicted independently, it will be possible to estimate the collision risk and predict the manoeuvres that would avoid or reduce it.

  • This is being applied to road intersection safety when using wireless communications technologies that enable the sharing of information amongst road actors.

Vehicles can be regarded as social entities, as such interaction is central to their behaviours, where compromises are part of the decision making process.

Classification of Technological Advances in Vehicle Technology

Vehicle navigation comprises the control of the mobile platform as it moves from its original position to its desired destination in a safer manner while traversing an infrastructure built for human driving. Given the state-of-the-art in sensing, computing and decision making, today a mobile platform would be able to cross most road networks, if it were the only user. The major difficulty resides in the sharing of the infrastructure with other entities like different powered mobile platforms, vulnerable road users, etc. The behaviour of these being unpredictable despite the existence of traffic rules and law enforcing mechanisms, driver errors occur, leading to a high number of road accidents. The major difficulty on deploying autonomous vehicles is on finding solutions that enable the sharing of the same workspace with other entities.

The rationale for studying vehicle navigation technologies applied to passenger vehicles is that driver centric, network centric and vehicle centric developments are all contributing to the development of autonomous vehicles.

Driver Centric

Today, the transport industry, universities and government R&D centers are developing Intelligent Vehicles from different perspectives. The car industry for example is deploying vehicle onboard technologies that facilitate the usage of a car by drivers and to improve safety. From this perspective advances in current mass-produced Intelligent Vehicles can be first defined as being Driver Centric.

Driver centric approaches enable the understanding of situations that autonomous vehicles will encounter when deployed in real traffic situations, it shows the techniques used on human controlled systems to increase safety are similar to those that will be tackled by autonomous vehicles.

Network Centric

The introduction of communications technologies onboard of passenger vehicles enables the sharing of information between vehicles or between vehicles and the infrastructure (Vehicle to Infrastructure, V2I). This has led to different types of vehicles whose functionality resides on their integration onto a communications network that allows for V2V and V2I wireless links that are known as Cooperative Vehicles. In a functional manner these types of vehicles are regarded as Network Centric.

Network centric solutions are providing the means to share information among all actors in road networks. The network can then accumulate and analyse data prior to their broadcast to the networks. For autonomous vehicles this is a very important contribution as it means that the autonomous vehicles of the future do not need to be stand-alone systems. They are to be nodes that move in cooperation with other mobile nodes.

Vehicle Centric

A different approach consists on automating as much as possible the vehicle navigation functions. The vehicle comes under computer control and the role of the driver is reduced until it is no longer within the vehicle control loop; the vehicles are autonomous. The architectures replicate the function necessary to navigate a vehicle in an autonomous manner and thus all the system design is centered on the vehicle functions. These types of the vehicle can be regarded as Vehicle Centric.

Vehicle Centric vehicles concern the realm of autonomous vehicles by considering the most salient experimental platforms developed so far and the associated technologies. This full panorama provides an overview of the technologies that are being developed for autonomous vehicles.

Evolution of Vehicle Architectures

From a vehicle controllability perspective, if there is 100% control by the driver, then, there will not be a vehicle navigation function under computer control. Most applications today are centered on informing the driver first and then letting the driver act on the vehicle, they are driver centric. Applications in which there is direct machine control are few, though these are being introduced gradually as in the case of Lane Keeping Support. Vehicle control is gradually being shifted away from the driver to computers.

On a different perspective, computer controlled vehicles are gaining in autonomy from simple functions such as Obstacle detection and Obstacle avoidance to Situational Awareness and Mission Planning, with computers controlling ultimately the vehicle through various adaptive behaviors, Vehicle Centric. This shift is shown in Fig. 50.6 where the transition between driver and computer (machine) control can be observed.


Driver Centric Technologies

In this category the overall control of the vehicle remains with the driver. Different functions are built to enhance the situational awareness of the driver and its safety either by improving the perception of the environment or maintaining the stability/controllability of the vehicle. All the perception functions are directed towards information the driver about what occurs within its immediate environment. The emphasis is on the fusion of data from proprioceptive and exteroceptive sensors that allow for the building of a model of the vehicle state and the spatio-temporal relationship with its immediate environment. This model is then used to infer situational information. Decisions are made all the time in every aspect of driving. As stated previously, accidents occur mainly due to driver error when the wrong decisions are taken.

The emphasis is on the acquisition and association of data to facilitate awareness and provide the most suitable means to enhance the driver situational awareness and hence facilitate decision making. In most cases this consists of perceiving what occurs within the immediate space in front of the vehicle or to detect its response. A typical example is the use of vision systems or laser scanners to detect pedestrians and either inform drivers or reduce the vehicle speed. This is a function that currently is being implemented as part of new generations of vehicles. The main difficulties reside on the perception systems due to the plethora of situation that might arise, the layout of sensors and their capabilities, with cost being a determining factor for deployment.


The overall structure of a Driver Centric architecture could be built around a model representation of the world surrounding the subject vehicle. Figure 50.7 shows the architecture of a vehicle centered on a driver.


Driver Centric architectures would lead to two ultimate functions that of operating a vehicle remotely or under the supervision of a central sever as what occurs with the use of autonomous guided systems (AGVs) and the other will be of Indirect Driving.

Vehicle Tele-operation.

Indirect Vehicle Driving.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.