Paper Daily: Indoor Scene Understanding in 2.5/3D for Autonomous Agents: A Survey

A Taxonomy of Problems

Image Classification

Image recognition is a basic, yet one of the fundamental tasks for visual scene understanding. Information about the scene or object category can help in more sophisticated tasks such as scene segmentation and object detection. Classification algorithms are being used in diverse areas such as medical imaging, self-driving cars and context-aware devices.


Important challenges for image classification include:

  • 2.5/3D data can be represented in multiple ways as discussed above. Challenge then is to choose the data representation that provide maximum information with minimum computational complexity.
  • A key challenge is to distinguish between fine-grained categories and appropriately model intra-class variations.
  • Designing algorithms that can handle illuminations, background clutter and 3D deformations.
  • Designing algorithm that can learn from limited data.

Methods Overview

Object Detection

Object detection deals with recognizing object instances and categories. Usually, an object detection algorithm outputs both the location and the class of an object. This task has high significance for applications such as self-driving cars, augmented and virtual reality. However, in applications such as robot navigation, we need so-called ‘amodal object detection’ that tries to find an object’s location as well as its complete shape and orientation in 3D space when only a part of it is visible.


Key challenges for object detection are as follows:

  • Real world environments can be highly cluttered and object identification in such environments is very challenging.
  • Detection algorithm should also be able to handle view-point and illuminations variations and deformations.
  • In many scenarios, it is necessary to understand the scene context to successfully detect objects.
  • Objects categories have a long-tail (imbalanced) distribution, which makes it challenging to model the infrequent classes.

Methods Overview

  1. “Amodal Detection of 3D Objects: Inferring 3D Bounding Boxes from 2D Ones in RGB-Depth Images”
  2. “3d attention-driven depth acquisition for object identification”

Semantic Segmentation

This task relates to the labeling of each pixel in an image with its corresponding semantically meaningful category. Applications of semantic segmentation include domestic robots, content-based retrieval, self driving cars and medical imaging. Efforts to address the semantic segmentation problem have come a long way from using hand crafted and data specific features to automatic feature learning techniques.


Despite being an important task, segmentation is highly challenging because:

  • Pixel level labeling requires both local and global information and challenge then is to design such algorithms that can incorporate the wide contextual information together.
  • The difficult level increases a lot for the case of instance segmentation, where the same class is segmented into different instances.
  • Obtaining dense pixel level predictions, especially close object boundaries, is challenging due to occlusions and confusing back-grounds.
  • Segmentation is also affected by appearance, view point and scale changes.

Methods Overview

  1. “Semantic segmentation of rgbd images with mutext constraints”
CNN as local feature extractors
  1. “Multimodal Neural Networks: RGB-D for Semantic Segmentation and Object Detection”
  1. “Fusenet: Incorporating depth into semantic segmentation via fusion-based architecture.”
Point cloud as input
  • “Pointnet: Deep learning on point sets for 3d classification and segmentation”
  • “Pointnet++: Deep hierarchical feature learning on point sets in a metric space”
  • “3dcontextnet: Kd tree guided hierarchical learning of point clouds using local contextual cues”
  • “Segcloud: Semantic segmentation of 3d point clouds”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.