Purpose
To create a rover that explores the environment, learns from it and communicates with human-beings in a human language.
The cognitive model of the rover shall be based on association. The real purpose of the research is to verify the feasibility of cognitive models based on association.
In the model, generative patterns are represented as series of association.
Patterns can be visual, auditory, tactile, and linguistic.
Recognition of situations
The pattern representing a situation could be obtained by integrating pattern recognition results from various modalities within a time series.
An associative series of the situation is a "semantic network", which gives the semantics for the linguistic function.
Learning
The learning mechanism will be neural networks (in a broad sense).
The learning includes the recognition of exterior things, the episodic memory and the categorization of situations. Categorization may be done by supervised and/or unsupervised learning.
The rover will learn from its voluntary actions.
Language
The interface between syntax (parsing/generation) and semantics (association of situations) shall be learned (acquired), by means of association-based cognitive models.
While morpheme dictionaries may be given initially, vocabulary acquisition will be an issue in the future.
The reason for the linguistic interface is that language will be the key for realizing the human-level intelligence, besides it is an effective means for communicating with human-beings.
The rover will have basic language features such as the following .
- Description of the scene (verbalization of situation recognition)
- Response to (human) questions
- Response to (human) instructions
Sociality
Human (animal) infants tend to have the innate ability to recognize other individuals and communicate with them. The rover will be given certain recognitive abilities such as face recognition, motion capture, speech recognition and gaze recognition.
Core cognitive architecture
- Time-series pattern recognizers having motor commands and sensory input as their input
- Attention and situation assessment
- Episodic memory (← pattern recognizers, situation assessment & attention)
- Backtracking (parsing and planning with certain evaluation functions)
cf.
A Figure of Human Cognitive Function
System configuration
Hardware
- Locomotive function
Roomba / Kobuki, etc.
- Visual function
Early stage: Kinect
Later adding saccade
- Acceleration sensor
(works also as collision detector)
- Audio function
- On board PC (notebook)
- Wireless connection (WiFi / BlueTooth) for monitoring
Software
- OS
ROS, etc.
- Visual data processing
Peripheral vision, central visual field, optical flow, depth recognition
Object detection, tracking, motion capture, facial recognition, gaze recognition
Kinect software, OpenCV, etc.
- Speech recognition
HARK, etc.
- Speech synthesis
Vocaloid, etc.
- Learning module
SOINN, k-means, SOM, SVN, DeSTIN, HMM, etc.
※To be used as plug-ins depending on the purpose.
A Tentative Research Steps
Phase I: Kinect + OpenCV + HARK + Vocaloid (preliminary exercises)
- Checking for visual functions (facial recognition, motion capture)
- Checking for speech recognition function
- Checking for speech synthesis function
- Implementating a conventional linguistic (speech) interface
- Experimenting on visual experience reports with a conventional linguistic interface
Phase II: Pattern Recognition
- Selection and implementation of time-series pattern recognizers
- Visual pattern recognition experiment
- Experiments on pattern recognition reports with a conventional linguistic interface
Phase III: Episodic memory
- Defining the situations to be remembered
- Implementing episodic memory and attentional mechanism
- Experimenting on episode reports with a conventional linguistic interface
Phase IV: Eye movement
- Kinect may be put on a movable (controllable) stage (w. an acceleration sensor)
- Human tracking
- Behavior control (extending a conventional language generation mechanism)
- Gaze induction by instruction with a conventional linguistic interface
- Q & A with a conventional linguistic interface
Phase V: Roaming (Roomba / Kobuki)
- Coupling vision and roaming (reflexive)
- Defining the relation between attention and roaming ("curiosity")
- 3D object learning/recognition via roaming
- Instruction of motion with a conventional linguistic interface
Phase VI: Design and Implementation of a non-conventional (associative) linguistic information processing