Sunday, August 11, 2013

My New Research Plan (2013-08)


To create a rover that explores the environment, learns from it and communicates with human-beings in a human language.
The cognitive model of the rover shall be based on association. The real purpose of the research is to verify the feasibility of cognitive models based on association.

The cognitive model based on association

In the model, generative patterns are represented as series of association.
Patterns can be visual, auditory, tactile, and linguistic.

Recognition of situations

The pattern representing a situation could be obtained by integrating pattern recognition results from various modalities within a time series.
An associative series of the situation is a "semantic network", which gives the semantics for the linguistic function.


The learning mechanism will be neural networks (in a broad sense).
The learning includes the recognition of exterior things, the episodic memory and the categorization of situations.  Categorization may be done by supervised and/or unsupervised  learning.
The rover will learn from its voluntary actions.


The interface between syntax (parsing/generation) and semantics (association of situations) shall be learned (acquired), by means of association-based cognitive models.
While morpheme dictionaries may be given initially, vocabulary acquisition will be an issue in the future.
The reason for the linguistic interface is that language will be the key for realizing the human-level intelligence, besides it is an effective means for communicating with human-beings.
The rover will have basic language features such as the following .

  • Description of the scene (verbalization of situation recognition)
  • Response to (human) questions
  • Response to (human) instructions


Human (animal) infants tend to have the innate ability to recognize other individuals and communicate with them. The rover will be given certain recognitive abilities such as face recognition, motion capture, speech recognition and gaze recognition.

Core cognitive architecture

  • Time-series pattern recognizers having motor commands and sensory input as their input
  • Attention and situation assessment
  • Episodic memory (← pattern recognizers, situation assessment & attention)
  • Backtracking (parsing and planning with certain evaluation functions)

 cf. A Figure of Human Cognitive Function

System configuration


  • Locomotive function
    Roomba / Kobuki, etc.
  • Visual function
    Early stage: Kinect
    Later adding saccade
  • Acceleration sensor
    (works also as collision detector)
  • Audio function
  • On board PC (notebook)
  • Wireless connection (WiFi / BlueTooth) for monitoring


  • OS
    ROS, etc.
  • Visual data processing
    Peripheral vision, central visual field, optical flow, depth recognition
    Object detection, tracking, motion capture, facial recognition, gaze recognition
    Kinect software, OpenCV, etc.
  • Speech recognition
    HARK, etc.
  • Speech synthesis
    Vocaloid, etc.
  • Learning module
    SOINN, k-means, SOM, SVN, DeSTIN, HMM, etc.
    ※To be used as plug-ins depending on the purpose.

A Tentative Research Steps

Phase I: Kinect + OpenCV + HARK + Vocaloid (preliminary exercises)

  • Checking for visual functions (facial recognition, motion capture)
  • Checking for speech recognition function
  • Checking for speech synthesis function
  • Implementating a conventional linguistic (speech) interface
  • Experimenting on visual experience reports with a conventional linguistic interface

Phase II: Pattern Recognition

  • Selection and implementation of time-series pattern recognizers
  • Visual pattern recognition experiment
  • Experiments on pattern recognition reports with a conventional linguistic interface

Phase III: Episodic memory

  • Defining the situations to be remembered
  • Implementing episodic memory and attentional mechanism
  • Experimenting on episode reports with a conventional linguistic interface

Phase IV: Eye movement

  • Kinect may be put on a movable (controllable) stage (w. an acceleration sensor)
  • Human tracking
  • Behavior control (extending a conventional language generation mechanism) 
  • Gaze induction by instruction with a conventional linguistic interface
  • Q & A with a conventional linguistic interface

Phase V: Roaming (Roomba / Kobuki)

  • Coupling vision and roaming (reflexive) 
  • Defining the relation between attention and roaming ("curiosity")
  • 3D object learning/recognition via roaming
  • Instruction of motion with a conventional linguistic interface

Phase VI: Design and Implementation of a non-conventional (associative) linguistic information processing

No comments:

Post a Comment