Friday, June 27, 2014

Research Plan Updated (2014-06)

I streamlined the plan: I dropped three dimensional concept learning and (therefore) the use of robotics simulator, to focus more on language acquisition.
While the robotics/embodiment is essential for phenomenology of artifacts, as it incurs too much technical complication, it is put off until relevant tools are readily available.  The new plan also drops the emphasis on the recognition of Spelke's objects, which would require action during perception.



Purpose

Creating a system that performs language acquisition (symbol grounding) with a robot simulator pattern recognizers and association networks, to verify my associationist hypotheses of cognition.

Language acquisition

The system will follow the human (infant) language acquisition process: it shall associate linguistic expressions with the shapes, colors, movements and relations of perceived objects, while creating adequate internal syntactic/semantic representations.

Core cognitive architecture

The core cognitive architecture shall have the following functions.
Upon designing and implementing the cognitive architecture, generic mechanisms shall be (re)used whenever possible.
  • Time-series pattern learner/recognizer
    having motor commands and sensory input as their input
  • Attention and situation assessment
    on what will be learned and which action will be taken.
  • Cognitive model based on association
    to memorize and recollect (temporal) generative patterns as associative sequences.
    Linguistic competence will be realized with this model.
    It contains backtracking mechanism based on the function of attention and situation assessment mentioned above.
  • Episodic memory
    Patterns (the representation of situations -- combinations of abstract patterns created by (non-supervised) learning) positively assessed by the attention and situation assessment will be memorized.
 cf. A Figure of Human Cognitive Function

Platform

  • Visual data processing
    OpenCV, SimpleCV, etc.
  • Learning module
    BESOM, DeSTIN, k-means, SOM, SVN, HMM, SOINN, etc.
    (To be used as plug-ins depending on the purpose)
  • Publicly available cognitive architectures may be surveyed
    e.g., OpenCog and LIDA


A Tentative Research Steps


Phase I: Survey on robot simulators (done)

Research on robot simulators such as SigVerse and V-rep and trial on attitude control.

Phase II: Survey on Spelke's Object recognition (done)

Proper recognition of Spelke's objects, having coherent, solid & inert bundle of features of a certain dimension that continues over time, would require optical flow processing associated with the action of the perceiver.

The following part doesn't have much change from the previous plan.

Phase III: Labeling

  • Basic Ideas
    • The system shall relate specific types of figures it looks at with linguistic labels.
    • Figures get the system's attention by force (exterior instruction)
    • Labels may be nominals representing shapes and adjectives representing features such as colors.
      Types of objects may be learned in a supervised manner with labels or have been categorized by non-supervised learning.
    • The system shall utter labels on recognizing types after learning association between the labels and types.
    • The system shall recognize/utter the syntactic pattern 'adjective + noun'.
  • Determining the recognition method & implementation
  • Designing and implementing mechanism for handling syntactic structure.
  • Incorporating episodic memory at this stage is to be considered.
  • Labeling experiment

Phase IV: Relation Learning

  • Basic Ideas
    • The system shall learn labels for
      • object locomotion such as moving up/down, right/left and closer/away
      • orientational relations between objects such as above/below, right/left and short/thither
    • Objects should be get the system's attention by force (programming) or by certain preprogrammed mechanism of attention (such as attention to moving objects). 
  • Designing & implementing the mechanism
  • Experiments

Phase V: Linguistic Interaction

  • Basic Ideas
    • The system shall answer to questions using labels learned in Phase III & Phase IV.
    • The system shall respond to requests on its behavior.
    • The system shall utter clarification questions.
  • Designing & implementing mechanism for linguistic interaction
  • Experiments

Phase VI: Episodic memory

  • Basic Ideas
    • Episodes (situations) to be memorized are the appearance of objects and changes in relations among them.
    • The memory of novel objects and situations is prioritized.
  • Designing & implementing episodic memory and attentional mechanism
  • Designing & implementing episodic recollection & linguistic report.
  • Experiments

Phase VII: More complicated syntax

  • Basic Idea
    The system shall understand/utter linguistic expressions having nested phrase structure.
  • Designing & implementing nested phrase structure.
  • Experiments

Saturday, June 21, 2014

Looking for Spelke's Object with Computer Vision: First Try

As I read in the SimpleCV textbook, I found that there were functions to find 'interesting parts' in motion pictures.  So I tested one of them, hoping that it could be used to detect Spelke's objects (ref. my current research plan).
For video input, I used a scene from the V-Rep simulator where a camera moves along a quasi-elliptical path tracking a cube inside the orbit.
I fed it to the RunningSegmentation function of SimpleCV and the result was something I had expected: parts of the floor was 'as interesting as' the cube for the algorithm, so that the cube alone was not chosen as the interesting object (the Fig. below).
Fig.  The part enclosed with the red lines 'interested' the algorithm.
(It changes dynamically over time.)

In my blog post mentioned above, I wrote that I would try figure-ground separation algorithms and optical flow detectors to find Spelke's objects.  I tried a very rudimentary border ownership detection algorithm (for figure-ground separation) and an optical flow type algorithm (this time), and do not feel either approach alone is promising.  Regular figure-ground separation does not take motion into account and regular optical flow approaches would not leave out spurious flow caused by agent motion.
As the cognitive mechanisms of infants for animate objects and still-life objects may differ, I might start again from static figure-ground separation for still-life objects and try to figure out how to develop the concept of 3D objects via kinesthetic interaction between the agent and objects...

Thursday, June 12, 2014

A(G)I Approach Comparison

In this post, I try to compare some A(G)I approaches with their foci.
The foci or features/characteristics of approaches I chose are, 1) Biomimetic, 2) Connectionist, 3) Bayesian, 4) Developmental, 5) Embodiment, 6) Task-Oriented, 7) Symbolic, 8) Linguistic, 9) Social and 10) Theoretical. Here, 6) Task-Oriented means that the goal of an approach is to create systems to perform certain tasks rather than structural, cognitive functional or theoretical concerns. On the other hand, a 10) Theoretical approach would be concerned about theoretical aspects of cognitive systems. These features are by no means mutually independent.

The approaches to be compared here are:
a) AIXI, b) OpenCog (CogPrime), c) Soar, d) ACT-R, e) BESOM, f) Whole Brain Architecture (BICA), g) RoboCup@Home, h) a developmental robotics approach, i) Social robotics approach, j) a developmental linguistic robotics approach, and k) my humble project.
The approaches a) through f) would fall within cognitive architectures and g) through k) robotics approaches.
f) Whole Brain Architecture (WBA, hereafter) is an approach proposed by researchers in Japan, which basically tries to mimic the functions of brain organs to realize human-level AGI. You may just assume this as a biologically inspired cognitive architecture (BICA).
h) Developmental Robotics
As there are many projects in this approach, I had an approach at Osaka University that tries to simulate early development of children in mind for the comparison.
i) Social Robotics
Social robotics may involve verbal/non-verbal communication.
j) Developmental Linguistic Robotics
Generally, it is concerned with language acquisition by robots.  My humble project (k) also falls under this category.  For the comparison, I had a bit unique, particular approach (Symbol Emergence in Robotics) in mind.

The below is the comparison table I made. The scores (0-10) are subjectively and tentatively given; I think people wouldn't agree on such scoring anyway (you can make a similar table for yourself).
a)
AIXI
b)
OpenCog
c)
Soar
d)
ACT-R
e)
BESOM
f)
WBA
g)
RoboCup
@Home
h)
Dev. Robotics
i)
Social Robotics
j)
Dev. Ling. Robotics
k)
Rondelion
Biomimetic 0 3 0 5 8 9 1 8 5 5 4
Connectionist 0 2 0 3 10 10 0 3 3 3 3
Bayesian 10 7 2 5 10 7 0 3 3 10 3
Developmental 0 7 1 3 0 4 2 10 5 6 8
Embodiment 0 3 1 1 0 3 10 10 10 7 8
Task-Oriented 0 3 3 3 0 2 10 5 7 7 6
Symbolic 5 8 10 5 3 2 5 0 6 8 5
Linguistic 0 2 1 1 0 2 7 1 7 9 8
Social 0 3 0 0 0 2 7 6 10 5 3
Theoretical 10 5 5 5 5 3 1 3 5 8 3

The following is radar charts generated from the table (note that the maximal score differs depending on the chart).

The area of a radar chart does not mean much, as it depends on the ordering of features.  The feature vector length may have better indication of the scope:
a)
AIXI
b)
OpenCog
c)
Soar
d)
ACT-R
e)
BESOM
f)
WBA
g)
RoboCup
@Home
h)
Dev. Robotics
i)
Social Robotics
j)
Dev. Ling. Robotics
k)
Rondelion
Extent 15.0 15.2 11.9 11.4 17.3 16.7 18.1 18.8 20.7 22.4 17.5

Finally, I show a graph showing the proximity of the approaches above.
The edges are made for cosine similarity larger than 0.71 (ref. awk script) and the visualization was done with Gephi.  There should be better ways to visualize proximity, though...