Saturday, October 11, 2014

Simple Word Segmentation Experiment

I made a simple word segmentation experiment with word candidates created by the method in the previous post.  The segmentation logic is a quasi viterbi search with the cost function = the number of segment.  When no word candidate is found, a character is used as a segment instead.  Here is a part of the result:

^lecirculoesrubie$
% le circulo e s rubie $
# le circulo es rubie $
^tomasuncirculo$
% tomas un circulo $ 
# tomas un circulo $ 
^lerectanguloesrubie$
% le rectangulo e s rubie $ 
# le rectangulo e s rubie $ 
^illoesrubie$
% il loesrubie $ 
# illo es rubie $ 
^ilfacefrigide$
% ilfacefrigide $ 
# il face frigide $
^tomasilloesblau$
% tomasilloes blau $ 
# tomas illo es blau $ 
* lines starting with "^" are strings to be segmented.
* lines starting with "%" are the segmentation result.
* lines starting with "#" are intended segmentation.

Of course, this all depends on the extracted word candidates (see below).
Children who are learning a language can have cues for extracting words other than statistical nature of given strings; i.e., semantic cues (cf. the preliminary experiment) or accents (I guess accents should be very important for perceiving words).  In any case, to make a language model more sensible, grammatical categories (classes) must be introduced and semantics (symbol grounding) should be considered again...

Word candidates used in the experiment above :
        lo,757
        il,537*
      loes,440
      illo,291*
    illoes,260
        le,256*
    angulo,251*
        au,251
        un,250
        ta,205
        as,196
    ilface,170
      blau,167*
     rubie,165*
     verde,154*
      esun,152
 triangulo,133*
 loesrubie,130
   circulo,129*
rectangulo,118*
  loesblau,117
 loesverde,117
  anguloes,115
     tomas,109*
       lor,108
   ascolta,87*
ilfacecalor,86
ilfacefrigide,84
  illoesun,76
ilesunpaucobscur,76
  reguarda,59*
     nonne,39*
   uloblau,37
tomasilloes,36
* Intended words are marked with '*'.
* The numbers are the frequency of strings in the corpus of 1000 utterances.
* Candidates whose occurrence is fewer than 30 were not used.

Friday, October 10, 2014

Word candidates

Children learn language without any lexicon explicitly given, so that they have to learn words from perceived utterances.  As machine learning solution for word segmentation, there are methods such as Nested Pitman-Yor Language Modeling.  However, if you just want to get ideas of word candidates, you can use a simpler method, perhaps similar to a method used for keyboard input auto-completion.  The basic idea here is that a word string may have substrings whose frequencies are similar to the frequency of the word.  For example, the frequency of the occurrence of "xample" may be similar to the frequency of "example," as they occur almost always together.
The result of a simple experiment with an artificial corpus (modified from the previous labeling experiment, adding one-word utterances such as 'illo' and sentences about the ambience such as 'ilfacefrigide' ("It's cold.")):
        lo,778
        il,531*
      loes,456
      illo,301*
        le,271*
        un,268*
    illoes,262
    angulo,261*
        au,242
        ta,204
        re,188
        as,176
     verde,171*
      esun,160
      blau,157*
     rubie,156*
    ilface,153
   circulo,143*
rectangulo,132*
 loesverde,131
 triangulo,129*
  anguloes,125
  loesblau,122
 loesrubie,120
       lor,104
     tomas,104*
  illoesun,83
ilfacefrigide,80
  ....
* Intended words are marked with '*'.
The numbers are the frequency of strings in the corpus of 1000 utterances.

Tuesday, October 7, 2014

Research Plan Updated (2014-10)

I'm streamlining my research plan further.  I'm dropping computer vision to concentrate on language acquisition.  Instead, I'll use simplified internal representation presumably obtained by abstracting data from vision systems.  The internal representation would contain information on the features of objects jointly-attended by putative 'care-takers,' orientational relations among objects and their motion.


Purpose

Creating a system that performs language acquisition (symbol grounding) with a robot simulator, pattern recognizers and association networks, to verify my associationist hypotheses of cognition.

Language acquisition

The system will follow the human (infant) language acquisition process: it shall associate linguistic expressions with internal representations of the shapes, colors, movements and relations of perceived objects.

Core cognitive architecture

The core cognitive architecture shall have the following functions.
Upon designing and implementing the cognitive architecture, generic mechanisms shall be (re)used whenever possible.
  • Time-series pattern learner/recognizer
    having motor commands and sensory input as their input
  • Attention and situation assessment
    on what will be learned.
  • Cognitive model based on association
    to memorize and recollect (temporal) generative patterns as associative sequences.
    Linguistic competence will be realized with this model.
    It contains backtracking mechanism based on the function of attention and situation assessment mentioned above.
  • Episodic memory
    Patterns (the representation of situations -- combinations of abstract patterns created by (non-supervised) learning) positively assessed by the attention and situation assessment will be memorized.
 cf. A Figure of Human Cognitive Function

Platform

  • Visual data processing
    OpenCV, SimpleCV, etc.
  • Learning module
    BESOMDeSTIN, k-means, SOM, SVN, HMM, SOINN, etc.
    (To be used as plug-ins depending on the purpose)

    * Currently, the use of learners with distributed representation is avoided.
  • Publicly available cognitive architectures may be surveyed
    e.g., OpenCog, MicroPSI and LIDA
    (✔Done)


A Tentative Research Steps

The following part doesn't have much change from the previous plan.
Changes in Phase III are indicated by blue letters.


Phase I: Survey on robot simulators (✔Done)

Research on robot simulators such as SigVerse and V-rep and trial on attitude control.

Phase II: Survey on Spelke's Object recognition (✔Done)

Proper recognition of Spelke's objects, having coherent, solid & inert bundle of features of a certain dimension that continues over time, would require optical flow processing associated with the action of the perceiver.

Phase III: Labeling (Currently Underway)

  • Basic Ideas
    • The system shall relate specific types of figures it looks at with linguistic labels.
    • Figures get the system's attention by force (exterior instruction)
    • Labels may be nominals representing shapes and adjectives representing features such as colors.
      Types of objects may be learned in a supervised manner with labels or have been categorized by non-supervised learning.
    • The system shall utter labels on recognizing types after learning association between the labels and types.
    • The system shall recognize/utter the syntactic pattern 'adjective + noun'.
  • Determining the recognition method & implementation
  • Designing and implementing mechanism for handling syntactic structure.
    (A viterbi/chart analyzer may be used.)
  • Incorporating episodic memory at this stage is to be considered.
  • Labeling experiment (cf. preliminary experiment)

Phase IV: Relation Learning (Coming Soon?)

  • Basic Ideas
    • The system shall learn labels for
      • object locomotion such as moving up/down, right/left and closer/away
      • orientational relations between objects such as above/below, right/left and short/thither
    • Objects should be get the system's attention by force (programming) or by certain preprogrammed mechanism of attention (such as attention to moving objects). 
  • Designing & implementing the mechanism
  • Experiments

Phase V: Linguistic Interaction

  • Basic Ideas
    • The system shall answer to questions using labels learned in Phase III & Phase IV.
    • The system shall respond to requests on its behavior.
    • The system shall utter clarification questions.
  • Designing & implementing mechanism for linguistic interaction
  • Experiments

Phase VI: Episodic memory

  • Basic Ideas
    • Episodes (situations) to be memorized are the appearance of objects and changes in relations among them.
    • The memory of novel objects and situations is prioritized.
  • Designing & implementing episodic memory and attentional mechanism
  • Designing & implementing episodic recollection & linguistic report.
  • Experiments

Phase VII: More complicated syntax

  • Basic Idea
    The system shall understand/utter linguistic expressions having nested phrase structure.
  • Designing & implementing nested phrase structure.
  • Experiments

Wednesday, October 1, 2014

Requirements and Choices in Approaches for AGI

In this article, a few requirements for the realization of AGI are listed and choices in related approaches are discussed (日本語版).
The purpose of this article is to give a simple overview toward the realization of AGI.

Requirements for AGI

  1. Cognitive architecture capable of planning and plan execution
  2. Emergent conceptual learning (or representation learning)
  3. Associative memory based on statistics (necessary for coping with the frame problem)
  4. Linguistic Functions (as for more general function for handling symbols, conditions such as those discussed in Fodor and Pylyshyn (1988) should be met.)
  5. Embodiment (in case cognitive development matters)
  6. Language acquisition (if cognitive development matters, E and B are required)
* The list is not meant to be exhaustive; obviously, there are other necessary cognitive functions such as episodic memory, reinforcement learning and information binding. The items are chosen for the sake of discussion.

Choices of approaches

While there are many approaches towards AGI, not so many deal with all of the requirements above. Below, alternatives for the classification of approaches are discussed.
  • Embodiment If an approach deals with embodiment, it should be involved in (cognitive) robotics. A cognitive robotics approach that also deals with language (acquisition) can be found here, which might be said one of the most comprehensive approaches for AGI.
  • Distributed representation vs. symbolic representation
    As perception processing such as computer vision is largely distributive, many approaches adopt distributed representation in this domain.
    Approaches vary depending on whether they adopt distributed representation in other domains (cf. the requirements listed above).
  • Kinds of distributed representation
    Such as neural network models and bayesian networks.
    Neural models vary depending on to which extent they are close to the real neural networks (cf. BICA below).
    Some approaches are hybrid between neural and bayesian networks (e.g., BESOM, combining SOM and the bayesian network).
  • Symbolic representation While there are classical knowledge representations such as frames, scripts and production rules, it is empirically known that at least a certain statistical method/representation should be used together to solve real-world problems. Besides approaches incorporating probabilistic elements in classical representation such as production rules (e.g., ACT-R), various probabilistic inference systems (such as probabilistic programming languages, NARS and PLN@OpenCog) have been proposed.
  • Emergent conceptual learning
    A well-known example is the neural-network-inspired SOM.
    Emergent conceptual learning may be regarded as a kind of hidden variable estimation. In the area of deep learning, hidden variables are estimated with auto-encoders and restricted Boltzmann machines. Other approaches may adopt HMM-like models or statistical methods such as LDA for hidden variable estimation.
  • Associative memory Supervised learning can be seen as associative memory in a broad sense. There are approaches using distributed representation such as neural/bayesian network models, and approaches using statistical methods such as LDA (most recent document search methods fall under the latter).
  • BICA (Biologically-Inspired Cognitive Architecture)
    Approaches for AGI vary depending on whether or to which extent they mimic living organisms (notably the brain). While WBE (whole brain emulation) is the extreme case, there are more abstract approaches such as Leabra, which mimics brain organs, SPAUN/Nengo based on the spiking neuron model, BESOM, which uses the bayesian network while mimicking brain organs, HTM that mimics neocortex (only SPAUN/Nengo and Leabra may currently realize cognitive architectures). DeSTIN, which incorporates reinforcement learning with hierarchical temporal learning, is loosely inspired by the neocortex and basal ganglia. Psi/MicroPsi is a more psychological model though it is also inspired by neural networks. In general, the more inspired by the brain, the more the approach tends to adopt emergent distributive representation as the brain does.

Tuesday, September 23, 2014

Preliminary experiment on Lexicon Acquisition #1

As part of Phase ||| experiments, I conducted a preliminary experiment on lexicon acquisition, where a simple learner associates strings within given sentence strings with signals that represent things in the environment.

Setting:
  • Representation of things in the environment
    There is (always) an object in the environment having one of three kinds of shapes (rectangle, triangle, circle) and three kinds of colors (red, blue, green), and the system is given two symbols respectively representing the shape and color of the object.
  • Sentential strings
    The system is given a sentence string also representing the situation (shape and color).  The task for the system is to extract lexical entries (strings) that would represent the shapes and colors.
    The sentences are in Interlingua.  The grammar for this experiment is as follows (in a pseudo BNF):
    S ⇒"tomas"?"reguarda"?S'"nonne"?
    S'⇒"illoes"?"un"Shape Color?
    S'
    ⇒{"illo"|"le"Shape}"es"Color
    Shape
    ⇒{"rectangulo"|"triangulo"|"circulo"}
    Color
    ⇒{"rubie"|"blau"|"verde"}
    where "tomas" is the name of the system being addressed and "reguarda" means "look!," "illo" "that," "es" "is," and "nonne" "isn't it," respectively.  '?' represents 0 or 1 occurrence.
    The following is sample sentences randomly generated from the grammar above:
3 1 illoesrubie
1 2 tomasilloesblaunonne
1 1 unrectangulo
2 1 letrianguloesrubie
1 3 reguardalerectanguloesverde
2 1 reguardailloesrubienonne
Two numbers at the beginning of sentences represent the shape and color.
Algorithm:
The system collects all the ngrams in given sentences and calculates tf*idf with the shapes and colors in the environment.  Basically, ngrams with the highest scores with particular colors or shapes are supposed to represent the features.
First try:
Blue     :       bla,1.098066
Red      :        bi,1.087502
Blue     :        au,1.065256
Triangle :    iangul,1.053565
Red      :      oesr,1.021488
Blue     :   loesbla,1.017333
Green    :      verd,1.000361
Rectangle: ectangulo,0.987110
Circle   :     rculo,0.954782
Green    :     sverd,0.945333 
# of Sentence: 1,000
The left-hand is the features to be represented by the ngrams in the right hand.
At first glance, it looks like a disaster, but if you look closely, the extracted strings are mostly parts of the shape/color words.
Second try:
Instead of tf*idf, tf*idf*ngram.length was used.  The result seems better.
Sentence Count=50
Rectangle:rectangulo,2.628845
Triangle : triangulo,2.613976
Blue     : uloesblau,2.477903
Red      : loesrubie,2.287229
Triangle :rianguloes,2.263571
Rectangle:nrectangul,2.250493
Triangle : letriangu,2.159996
Rectangle: unrectang,2.147516
Red      :uloesrubie,2.109837
Circle   :lecirculoe,1.920132
Sentence Count=100
Circle   :   circulo,2.277660
Red      : loesrubie,2.239706
Blue     :  loesblau,2.206975
Green    : loesverde,2.143210
Red      :uloesrubie,2.090366
Rectangle:rectangulo,2.048243
Blue     :illoesblau,2.045903
Triangle : triangulo,2.039147
Green    :illoesverd,1.920606
Circle   : uncirculo,1.905309
Sentence Count=500
Rectangle:rectangulo,2.330337
Triangle : triangulo,2.217422
Red      : loesrubie,2.180421
Blue     :  loesblau,2.163493
Green    : loesverde,2.127100
Circle   :   circulo,1.923614
Rectangle:nrectangul,1.856743
Triangle :rianguloes,1.801742
Rectangle: unrectang,1.771783
Blue     :illoesblau,1.755595
Sentence Count=1000
Triangle : triangulo,2.314919
Rectangle:rectangulo,2.272904
Red      : loesrubie,2.244439
Blue     :  loesblau,2.115484
Green    : loesverde,2.077109
Circle   :   circulo,1.857920
Triangle :ntriangulo,1.785375
Red      :lloesrubie,1.773095
Triangle :rianguloes,1.710290
Green    :illoesverd,1.709630
It seems it cannot extract color terms properly with the given setting.  Presumably, it must learn other terms such as "illo" and lexical items may be determined by fitting them into sentences (looking for best segmentation). 

Tuesday, September 9, 2014

Phase III: Labeling / Plan I

This article is of an experimental design for a part of my research plan (on language acquisition), where phonetic labels and visual patterns are to be associated with machine learning. 

  • Basic Ideas
    • Goal: The system relates figure images with linguistic labels (phoneme strings).
    • Labels represent the shapes or colors of figures.
    • Labels and shapes/colors of figures shall be learned in a non-supervised manner by presenting labels and figures together.
    • The system should be able to associate labels with shapes/colors of figures after learning.
  • Tools
    • Computer Vision: OpenCV/SimpleCV
    • Machine Learning Tool: TBD
      Requirement:
      • Clustering (non-supervised learning)
        Hopefully, non-parametric for the number of clusters.
      • Association
        Auto-complete-like function for learned patterns. 
  • Steps
    1. Learning patterns
      • Input patterns (in two modalities)
        • Figure images: rectangles, circles, triangles of various sizes and colors
        • Label pattern: 2D array of (phoneme position × phoneme type)
      • Image processing: monochromatizing, edge detection, SIFT?
      • Learning
        Co-occurrence patterns of figures and labels, figure shapes, colors, and labels shall be clustered with the following configuration of learners:
        [the following has been modified on 2014-09-14]
        • Label learner: label candidates are selected from frequent phoneme N-grams.
        • Shape learner: assign a non-supervised learner for figure shapes.
        • Color learner: assign a non-supervised learner for figure colors.
          Colors are learned with color label candidates as teacher signals.
        • Cross-modal (label sense) learner: for a label (candidate) to have a sense, it must have a statistic correlation with a 'cluster' in shape/color learner.
    2. Association
      Hopefully, the non-supervised leaners used in 1 have associative, 
      auto-complete-like function for learned patterns.  Otherwise, some external mechanism should be added to associate labels (phoneme patterns) with shapes/colors and vice versa.
      In any case, as learned category nodes do not have complete information to construct particular lower input patterns, some external mechanism should be added to visualize the association.

Friday, July 11, 2014

AGI in Japan, 2014

As interest in artificial general intelligence (AGI) is emerging in Japan, I try to report the situation here (mostly in the chronological order).

As far as I know, the first public presentation on AGI in Japan was a symposium on the Technological Singularity held by Fujitsu Co., in September 2012.  The content of the symposium was translated into Japanese and published in the May 2013 issue of the Journal of the Japanese Society for Artificial Intelligence (JSAI).  The same issue reports the AGI & AGI Impact conferences held in 2012.

Meanwhile, a book entitled ‘The Year 2045 Problem (2045年問題)’ on the Singularity was published on January 1, 2013 for the general public.

Toward the summer of 2013, some members of JSAI interested in AGI organized a reading group for the book Artificial General Intelligence (2006).  The first meeting was held in Tokyo in July 2013 and meetings have been held monthly since then (Dr. Ben Goertzel was invited to the second meeting).  The group (currently with about 20 members) is now a special interest group (not officially recognized from JSAI yet) and its major activities will be reading articles on cognitive architectures and preparing a prospective book on AGI.  It has a web page and a Facebook group (both in Japanese) now.

Another important ‘movement’ on AGI, called ‘The Whole Brain Architecture,’ is emerging in Japan.  This movement is basically BICA (biologically inspired cognitive architecture).  The reason why it is dubbed ‘whole brain’ is that it emphasizes to get inspiration not from one brain part but from the entire brain (note that some BICA models are inspired only from the cerebral cortex or hippocampus) to create cognitive architecture that will realize AGI (the soonest way, as they say).  The organizers of the Whole Brain Architecture movement are grass-root (again JSAI members) and holding monthly open seminars since December 2013.  Recent seminars in Tokyo gather about 200 persons.  They also have a web page and a Facebook group (in Japanese).

As one would expect, there are also cognitive robotics researches related to AGI in Japan:  there are developmental cognitive robotics labs such as Asada Lab, ISI Lab & Iwahashi Lab, the special interest group for SocioIntelliGenesis that focuses on communication with/among robots, and the community of symbol emergence robotics (e.g., see here, which focuses on the symbol grounding issue (their main weaponry is LDA (Latent Dirichlet Allocation) :-).

The May 2014 issue of the JSAI journal featured AGI.  The feature articles include an introduction to AGI by Dr. Goertzel (in Japanese translation), the translation of the 2012 AI magazine article “Mapping the Landscape of Human-Level Artificial General Intelligence,” an introduction to AIXI, an article on the year 2045 problem, among others.

Almost at the same time as its publication, there was a JSAI annual convention in May, where there was an organized session on AGI (Part I & Part II, which gathered 50+ people (the convention had 14 tracks and gathered 1000+ people altogether).  The English translation of the paper presented at its tutorial presentation can be found here.

The convention also held a panel discussion on AGI and the film Transcendence, an event collaborated with the film distributor, having gathered also a few dozens of people.  Both panels at the organized session and the Transcendence session discussed social issues of AGI. Moreover, Japan Singularity Institute has also been launched recently.

In sum, AGI is getting recognized around JSAI in Japan.  As for the research side, the Whole Brain Architecture and robotic research are would-be prospects.  AGI research in Japan is still grass-root and has not obtained major fundings (from the government or the major companies).

Friday, June 27, 2014

Research Plan Updated (2014-06)

I streamlined the plan: I dropped three dimensional concept learning and (therefore) the use of robotics simulator, to focus more on language acquisition.
While the robotics/embodiment is essential for phenomenology of artifacts, as it incurs too much technical complication, it is put off until relevant tools are readily available.  The new plan also drops the emphasis on the recognition of Spelke's objects, which would require action during perception.



Purpose

Creating a system that performs language acquisition (symbol grounding) with a robot simulator pattern recognizers and association networks, to verify my associationist hypotheses of cognition.

Language acquisition

The system will follow the human (infant) language acquisition process: it shall associate linguistic expressions with the shapes, colors, movements and relations of perceived objects, while creating adequate internal syntactic/semantic representations.

Core cognitive architecture

The core cognitive architecture shall have the following functions.
Upon designing and implementing the cognitive architecture, generic mechanisms shall be (re)used whenever possible.
  • Time-series pattern learner/recognizer
    having motor commands and sensory input as their input
  • Attention and situation assessment
    on what will be learned and which action will be taken.
  • Cognitive model based on association
    to memorize and recollect (temporal) generative patterns as associative sequences.
    Linguistic competence will be realized with this model.
    It contains backtracking mechanism based on the function of attention and situation assessment mentioned above.
  • Episodic memory
    Patterns (the representation of situations -- combinations of abstract patterns created by (non-supervised) learning) positively assessed by the attention and situation assessment will be memorized.
 cf. A Figure of Human Cognitive Function

Platform

  • Visual data processing
    OpenCV, SimpleCV, etc.
  • Learning module
    BESOM, DeSTIN, k-means, SOM, SVN, HMM, SOINN, etc.
    (To be used as plug-ins depending on the purpose)
  • Publicly available cognitive architectures may be surveyed
    e.g., OpenCog and LIDA


A Tentative Research Steps


Phase I: Survey on robot simulators (done)

Research on robot simulators such as SigVerse and V-rep and trial on attitude control.

Phase II: Survey on Spelke's Object recognition (done)

Proper recognition of Spelke's objects, having coherent, solid & inert bundle of features of a certain dimension that continues over time, would require optical flow processing associated with the action of the perceiver.

The following part doesn't have much change from the previous plan.

Phase III: Labeling

  • Basic Ideas
    • The system shall relate specific types of figures it looks at with linguistic labels.
    • Figures get the system's attention by force (exterior instruction)
    • Labels may be nominals representing shapes and adjectives representing features such as colors.
      Types of objects may be learned in a supervised manner with labels or have been categorized by non-supervised learning.
    • The system shall utter labels on recognizing types after learning association between the labels and types.
    • The system shall recognize/utter the syntactic pattern 'adjective + noun'.
  • Determining the recognition method & implementation
  • Designing and implementing mechanism for handling syntactic structure.
  • Incorporating episodic memory at this stage is to be considered.
  • Labeling experiment

Phase IV: Relation Learning

  • Basic Ideas
    • The system shall learn labels for
      • object locomotion such as moving up/down, right/left and closer/away
      • orientational relations between objects such as above/below, right/left and short/thither
    • Objects should be get the system's attention by force (programming) or by certain preprogrammed mechanism of attention (such as attention to moving objects). 
  • Designing & implementing the mechanism
  • Experiments

Phase V: Linguistic Interaction

  • Basic Ideas
    • The system shall answer to questions using labels learned in Phase III & Phase IV.
    • The system shall respond to requests on its behavior.
    • The system shall utter clarification questions.
  • Designing & implementing mechanism for linguistic interaction
  • Experiments

Phase VI: Episodic memory

  • Basic Ideas
    • Episodes (situations) to be memorized are the appearance of objects and changes in relations among them.
    • The memory of novel objects and situations is prioritized.
  • Designing & implementing episodic memory and attentional mechanism
  • Designing & implementing episodic recollection & linguistic report.
  • Experiments

Phase VII: More complicated syntax

  • Basic Idea
    The system shall understand/utter linguistic expressions having nested phrase structure.
  • Designing & implementing nested phrase structure.
  • Experiments

Saturday, June 21, 2014

Looking for Spelke's Object with Computer Vision: First Try

As I read in the SimpleCV textbook, I found that there were functions to find 'interesting parts' in motion pictures.  So I tested one of them, hoping that it could be used to detect Spelke's objects (ref. my current research plan).
For video input, I used a scene from the V-Rep simulator where a camera moves along a quasi-elliptical path tracking a cube inside the orbit.
I fed it to the RunningSegmentation function of SimpleCV and the result was something I had expected: parts of the floor was 'as interesting as' the cube for the algorithm, so that the cube alone was not chosen as the interesting object (the Fig. below).
Fig.  The part enclosed with the red lines 'interested' the algorithm.
(It changes dynamically over time.)

In my blog post mentioned above, I wrote that I would try figure-ground separation algorithms and optical flow detectors to find Spelke's objects.  I tried a very rudimentary border ownership detection algorithm (for figure-ground separation) and an optical flow type algorithm (this time), and do not feel either approach alone is promising.  Regular figure-ground separation does not take motion into account and regular optical flow approaches would not leave out spurious flow caused by agent motion.
As the cognitive mechanisms of infants for animate objects and still-life objects may differ, I might start again from static figure-ground separation for still-life objects and try to figure out how to develop the concept of 3D objects via kinesthetic interaction between the agent and objects...

Thursday, June 12, 2014

A(G)I Approach Comparison

In this post, I try to compare some A(G)I approaches with their foci.
The foci or features/characteristics of approaches I chose are, 1) Biomimetic, 2) Connectionist, 3) Bayesian, 4) Developmental, 5) Embodiment, 6) Task-Oriented, 7) Symbolic, 8) Linguistic, 9) Social and 10) Theoretical. Here, 6) Task-Oriented means that the goal of an approach is to create systems to perform certain tasks rather than structural, cognitive functional or theoretical concerns. On the other hand, a 10) Theoretical approach would be concerned about theoretical aspects of cognitive systems. These features are by no means mutually independent.

The approaches to be compared here are:
a) AIXI, b) OpenCog (CogPrime), c) Soar, d) ACT-R, e) BESOM, f) Whole Brain Architecture (BICA), g) RoboCup@Home, h) a developmental robotics approach, i) Social robotics approach, j) a developmental linguistic robotics approach, and k) my humble project.
The approaches a) through f) would fall within cognitive architectures and g) through k) robotics approaches.
f) Whole Brain Architecture (WBA, hereafter) is an approach proposed by researchers in Japan, which basically tries to mimic the functions of brain organs to realize human-level AGI. You may just assume this as a biologically inspired cognitive architecture (BICA).
h) Developmental Robotics
As there are many projects in this approach, I had an approach at Osaka University that tries to simulate early development of children in mind for the comparison.
i) Social Robotics
Social robotics may involve verbal/non-verbal communication.
j) Developmental Linguistic Robotics
Generally, it is concerned with language acquisition by robots.  My humble project (k) also falls under this category.  For the comparison, I had a bit unique, particular approach (Symbol Emergence in Robotics) in mind.

The below is the comparison table I made. The scores (0-10) are subjectively and tentatively given; I think people wouldn't agree on such scoring anyway (you can make a similar table for yourself).
a)
AIXI
b)
OpenCog
c)
Soar
d)
ACT-R
e)
BESOM
f)
WBA
g)
RoboCup
@Home
h)
Dev. Robotics
i)
Social Robotics
j)
Dev. Ling. Robotics
k)
Rondelion
Biomimetic 0 3 0 5 8 9 1 8 5 5 4
Connectionist 0 2 0 3 10 10 0 3 3 3 3
Bayesian 10 7 2 5 10 7 0 3 3 10 3
Developmental 0 7 1 3 0 4 2 10 5 6 8
Embodiment 0 3 1 1 0 3 10 10 10 7 8
Task-Oriented 0 3 3 3 0 2 10 5 7 7 6
Symbolic 5 8 10 5 3 2 5 0 6 8 5
Linguistic 0 2 1 1 0 2 7 1 7 9 8
Social 0 3 0 0 0 2 7 6 10 5 3
Theoretical 10 5 5 5 5 3 1 3 5 8 3

The following is radar charts generated from the table (note that the maximal score differs depending on the chart).

The area of a radar chart does not mean much, as it depends on the ordering of features.  The feature vector length may have better indication of the scope:
a)
AIXI
b)
OpenCog
c)
Soar
d)
ACT-R
e)
BESOM
f)
WBA
g)
RoboCup
@Home
h)
Dev. Robotics
i)
Social Robotics
j)
Dev. Ling. Robotics
k)
Rondelion
Extent 15.0 15.2 11.9 11.4 17.3 16.7 18.1 18.8 20.7 22.4 17.5

Finally, I show a graph showing the proximity of the approaches above.
The edges are made for cosine similarity larger than 0.71 (ref. awk script) and the visualization was done with Gephi.  There should be better ways to visualize proximity, though...

Saturday, May 3, 2014

Simple Trial for Border Ownership Detection

I implemented a simple algorithm to determine the border ownership in a visual receptive field.  The idea is based on the physiological finding that there are neurons in the visual field, excitatory for cross-directional nearby edges and inhibitory for iso-directional nearby edges.  Researchers such as Sakai and Nishimura (2006) have proposed models of border ownership coding with this finding.
What I have done is:
  • Apply Gabor filters to find 0° and 90° edges.
  • Sum up excitatory and inhibitory effects for cross and iso-directional nearby edges.
  • Determine which side is more excitatory for each edge.
(Coding was done with OpenCV Java API.)

The following are sample pictures:
Fig1: the input
Fig2: Gabor filter applied (0°)
(There are two edge lines for
the right side due to, perhaps,
the phase setting of Gabor filter.)
Fig3: Gabor filter applied (90°)
Fig4: Border ownership detected
(indicated with gray area in either
side of edges (Fig2 + Fig))

Afterthoughts

The experiment was part of my attempt to find an algorithm for identifying Spelke's objects.  While the algorithm I tried above may be a heuristics used in the brain, it does not seem so straightforward to represent the coherence of Spelke's objects such as local color coherence over time.  So, I leave the result above as tentative and go for exploring other algorithms.  Besides, to make the algorithm efficient, I would have to hack around the code (it took 15 seconds to process the Lenna picture without sensible result).

Reference

Saturday, February 15, 2014

Phase II Plan

I modified the plan from the previous post.
  • In the previous post, I thought I'd use the depth map for object recognition, but I'd use regular optical maps in this plan, as the depth map and the optical map are different modalities and it is more difficult to imagine what is going on with the depth perception from the human (phenomenological) point of view.
  • In this plan, I dropped machine learning and will use more hard-wired approaches as I'd avoid unpredictable aspects.


Again recapitulating the Phase II part of my research plan here:

Phase II: Recognizing Spelke's Objects

  • Basic Ideas
    • Spelke's Object: coherent, solid and inert bundle of features of a certain dimension that continues over time.
      Features: colors, shapes (jagginess), texture, visual depth, etc.
    • While recognition of Spelke's objects may be preprogrammed, recognized objects become objects of categorization by means of non-supervised learning.  In this process, hierarchical (deep) learning would be done from the categorization of primitive features to the re-categorization of categorized patterns.
    • Object recognition will be carried out within spontaneous actions of the robot.
    • The robot shall gather information preferentially on 'novel' objects (curiosity-driven behavior) ('novelty' to be defined).
The following is the new plan for Phase II.

Robot Basics
  • Fish like robot swimming in a 3D space
    Experiments will be done with the SigVerse robot simulator.
Environment
  • Keep the robot from wandering away by making it attracted by objects on the floor (see below).
  • There are passively movable objects on the ground.
Robot Vision
  • Line of Sight and 2D depth sensors (SigVerse)
  • Static images
  • Optical flow (temporal ⊿)
  • Line of sight depth sensor (to avoid collision)
Central visual field (CVF)
  • CVF (gaze) moves randomly (saccades) in the fixed visual field.
  • CVF is attracted to information-dense areas.
  • Information density is measured by the density of extracted features such as line segments and optical flow.
  • CVF is 'bored' with each attractor as time passes.
  • CVS allows high-resolution feature extraction.
Peripheral visual field
  • Information density is measured with low-resolution feature extraction.
Feature extraction
  • Line segments (using such as SIFT or Gabor filter) 
  • Border ownership/Figure-Ground separation (see below)
Basic activities (hard-wired)
  • Randomly change direction
  • Direction change is attracted by line of sight
  • If reward increases by direction change, then accelerate till next direction change.
  • If locomotion decreases reward, then decelerate (with water resistance) (and change direction).
  • If depth sensor predicts collision, then decelerate (with water resistance) and change direction.
Rewards
  • Increase in information density in CVF gives a positive reward (curiosity; aesthetics)
  • Concussion by collision will give a negative reward.
Spelke's object detection
  • Uses figure-ground separation algorithms inspired by visual information processing in the brain.  Ref. 
  • Spelke's objects are recognized as figure-like lumps detected by figure-ground separation algorithms.
  • Optical flow may also be used for figure-ground separation.


Sunday, February 2, 2014

Phenomenology of Artefacts

In this memorandum, I propose a new scientific endeavor that I call “phenomenology of artefacts.”  The gist of the endeavor is that (human) epistemology would be emulated by constructing artefacts with epistemic functions similar to human beings.  While it is of an epistemology, it is inspired by (Husserlian) phenomenology, as the enterprise of Husserl was to construct a rigorous science (strenge Wissenschaft) where objective knowledge is established.  The basic idea of phenomenology of artefacts is the parallel that while Husserl started with introspection of his own mind to obtain knowledge, the 'mental state' of an artificial intelligence can be inspected (by external observers).  an artificial intelligence (AI) can inspect its own ‘mental states.’ (modified 2015-07-12, see the footnote* below)  Thus, I believe, AI researchers can obtain inspirations from ideas of the Husserlian phenomenology.  I’ll give a few points on this regard in the following.

Transcendence

Husserl regarded his phenomenology as transcendental.  Since one starts from phenomenal (internal) world, s/he would never reach or be sure about external objects.  Yet Husserl’s attempt was to transcend this barrier and to obtain objective knowledge of the external world.  The situation can be similar for an AI.  Though its perceptual states could be causally explained as sensory information processing, it is not clear at all how an AI could construct knowledge of the world from perceptual imagery.  This is especially true when it is not provided with prior knowledge of the world and should learn from scratch.  For example, how can an AI tell that there are 3D physical objects out there from transient perceptual patterns given as part of its internal states?  When it establishes the model of the 3D physical world and becomes ‘sure’ about external objects, then it transcends from internal perceptual imagery.

Perception and kinesthesis

Husserl emphasized the role of perception (especially the visual one) in obtaining objective knowledge.  He also emphasized the relation of perception to our motion.  When we perceive an object while moving, the perceptual images change over time in a certain way.  The way of change can be learned and becomes predictable.  To put it very simply, we learn about the external 3D world as we move and perceive.  Husserl used the term kinesthesis, which in a narrow sense means bodily sensation as we move, but may refer to any motion-related sensation (including visual perception).

Time-consciousness

“According to Husserl, the most fundamental consciousness, presupposed in all other forms and structures of consciousness, is the consciousness of time.” [Bernet et al.] (p.101)   If the knowledge of the world is obtained through kinesthetic interaction with the world, it is apparently true also for phenomenology of artefacts, but this aspect has not been fully explored by AI researchers.
  Husserl distinguished three moments of time-consciousness, namely, 1) premodal sensation corresponding to the now-moment, 2) retention or ‘a comet tail of memory’ and 3) protention or expectation.   Putting philosophical consideration asides, these moments may be understood from a neural perspective.  The cortex (brain) can be regarded as a ‘recurrent neural network’ learning temporal patterns.  At any moment, it retains information of its state of the immediate past (retention) and expects its state of the immediate future (protention) as it learns temporal patterns.

Concluding remarks

I often encounter useful insights when I read books on Husserl and there may be more inspirations from Husserl for AI researchers.  However, even if there is no more inspiration there, I believe phenomenology of artefacts could stand by itself and it could even replace Husserl’s attempt.  Human introspection is notoriously vague/opaque and does not seem to be suited for rigorous pursuit  of science.  Meanwhile, artefacts can be designed definitely and observed without opaqueness.  Thus, I advocate phenomenology of artefacts as a way to rigorous science Husserl envisaged…

Reference

[Bernet et al.] An Introduction to Husserlian Phenomenology, Rudolph Bernet, Iso Kern & Eduard Marbach, Northwestern University Press (1993).
[Gallagher et al.] The Phenomenological Mind (2nd edition), Shaun Gallagher & Dan Zahavi, Routledge (2012).
[J.J.Gibson] The Ecological Approach To Visual Perception, James J. Gibson,, Routledge (1986).

Footnote

* Modified after the comment by Shogo Tanaka that inspection by a machine would lead to the idea of 'consciously' reflecting on the internal representation of the world.  Phenomenology avoids such an idea and stress pre-reflective consciousness.  I fully agree with this comment.  Machine learning-based emergent AI does not normally make symbolic representation of what they are perceiving or doing.

The PDF version of this content (version 2014-02-02) is available here.

Monday, January 20, 2014

Phase I ⇒ Phase II

I spent too much time on attitude control.  It's time to move on.
As for attitude control, I have done:
  • 1 dimensional rotation control by simple learning algorithms
    (experimenting in Java)
  • porting the algorithms to c++
    (SigVerse controllers must be written in c++.)
  • Implementing simple attitude control mechanism (3D but without learning) with SigVerse.
OK, attitude control doesn't require learning, and my learning/constructing c++ environment required a lot of time...

Last September, I wrote on my research plan:

Phase II: Recognizing Spelke's Objects

  • Basic Ideas
    • Spelke's Object: coherent, solid and inert bundle of features of a certain dimension that continues over time.
      Features: colors, shapes (jagginess), texture, visual depth, etc.
    • While recognition of Spelke's objects may be preprogrammed, recognized objects become objects of categorization by means of non-supervised learning.  In this process, hierarchical (deep) learning would be done from the categorization of primitive features to the re-categorization of categorized patterns.
    • Object recognition will be carried out within spontaneous actions of the robot.
    • The robot shall gather information preferentially on 'novel' objects (curiosity-driven behavior) ('novelty' to be defined).
The following is a bit more concrete specification for Phase II.
Experiments will be done with the SigVerse robot simulator.

Robot Basics
  • Fish like robot swimming in a 3D space
Environment
  • The 'aquarium' is a cube/cuboid enclosed by walls.
    The robot cannot see transparent walls.
    The observer cannot see inside opaque walls.


    Keep the robot from wandering away by making it attracted by objects on the floor.
  • There are passively movable objects on the ground.
Robot Vision
  • Line of Sight and 2D distance sensors (SigVerse)
Basic activities
  • If reward is under a threshold, then change direction.
  • Change direction for reward increase.
  • Accelerate towards the direction of maximal reward.
  • Accelerate for reward increase. Accelerate inversely for reward decrease.
Rewards
  • Complexity in the (variance of) 2D distance (depth) patterns will give a positive reward (motivation of reaching for objects; curiosity; aesthetics)
  • Concussion by collision will give a negative reward.
  • Getting bait will give a positive reward. 
Learning behaviors
  • Learning by rewards (reinforcement learning)
  • For example, the robot may learn bumping objects away to get bait.
Learning sensory patterns
  • Sensory input
    • 2D distance depth (including optical flow)
    • Acceleration and rotation (kinesthetic input)
    • Concussion (large acceleration)
  • Clustering sensory input
  • Of course, feature selection (manual or automatic) is the key for successful learning.
Spelke's Objects
Check if the sensory pattern learning above yields the recognition of Spelke's objects.
If it does not, then add built-in mechanisms.