Place perception from the fusion of different image representation

  • Pei Li
  • , Xinde Li
  • , Xianghui Li
  • , Hong Pan
  • , M. O. Khyam
  • , Md Noor-A-Rahim
  • , Shuzhi Sam Ge

Research output: Contribution to journalArticlepeer-review

Abstract

Inspired by the human way of place understanding, we present a novel indoor place perception network to overcome: 1). the simplicity of existing methods that only use the image features of object regions to recognize the indoor place, 2). insufficient consideration of the semantic information about object attributes and states. By utilizing multi-modal information containing the image and natural language, the proposed method can comprehensively express the attributes, state, and relationships of objects which are beneficial for indoor place understanding and recognition. Specifically, we first present a natural language generation framework based on a Convolution Neural Network (CNN) and Long Short-Term Memory (LSTM) to imitate the process of place understanding. Next, a Convolutional Auto-Encoder (CAE) and a mixed CNN-LSTM are proposed to extract image features and semantic features, respectively. Then, two different fusion strategies, namely feature-level fusion and object-level fusion, are designed to integrate different types of features and features from different objects. The category of the indoor place is finally recognized based on fused information. Comprehensive experiments are conducted on public datasets, and the results verify the effectiveness of the proposed place perception method based on linguistic cues.

Original languageEnglish
Article number107680
JournalPattern Recognition
Volume110
DOIs
Publication statusPublished - Feb 2021

Keywords

  • CNN
  • Convolutional auto-encoder
  • Indoor place perception
  • LSTM
  • Natural language

Fingerprint

Dive into the research topics of 'Place perception from the fusion of different image representation'. Together they form a unique fingerprint.

Cite this