Saturday, October 11, 2014

Simple Word Segmentation Experiment

I made a simple word segmentation experiment with word candidates created by the method in the previous post.  The segmentation logic is a quasi viterbi search with the cost function = the number of segment.  When no word candidate is found, a character is used as a segment instead.  Here is a part of the result:

^lecirculoesrubie$
% le circulo e s rubie $
# le circulo es rubie $
^tomasuncirculo$
% tomas un circulo $ 
# tomas un circulo $ 
^lerectanguloesrubie$
% le rectangulo e s rubie $ 
# le rectangulo e s rubie $ 
^illoesrubie$
% il loesrubie $ 
# illo es rubie $ 
^ilfacefrigide$
% ilfacefrigide $ 
# il face frigide $
^tomasilloesblau$
% tomasilloes blau $ 
# tomas illo es blau $ 
* lines starting with "^" are strings to be segmented.
* lines starting with "%" are the segmentation result.
* lines starting with "#" are intended segmentation.

Of course, this all depends on the extracted word candidates (see below).
Children who are learning a language can have cues for extracting words other than statistical nature of given strings; i.e., semantic cues (cf. the preliminary experiment) or accents (I guess accents should be very important for perceiving words).  In any case, to make a language model more sensible, grammatical categories (classes) must be introduced and semantics (symbol grounding) should be considered again...

Word candidates used in the experiment above :
        lo,757
        il,537*
      loes,440
      illo,291*
    illoes,260
        le,256*
    angulo,251*
        au,251
        un,250
        ta,205
        as,196
    ilface,170
      blau,167*
     rubie,165*
     verde,154*
      esun,152
 triangulo,133*
 loesrubie,130
   circulo,129*
rectangulo,118*
  loesblau,117
 loesverde,117
  anguloes,115
     tomas,109*
       lor,108
   ascolta,87*
ilfacecalor,86
ilfacefrigide,84
  illoesun,76
ilesunpaucobscur,76
  reguarda,59*
     nonne,39*
   uloblau,37
tomasilloes,36
* Intended words are marked with '*'.
* The numbers are the frequency of strings in the corpus of 1000 utterances.
* Candidates whose occurrence is fewer than 30 were not used.

Friday, October 10, 2014

Word candidates

Children learn language without any lexicon explicitly given, so that they have to learn words from perceived utterances.  As machine learning solution for word segmentation, there are methods such as Nested Pitman-Yor Language Modeling.  However, if you just want to get ideas of word candidates, you can use a simpler method, perhaps similar to a method used for keyboard input auto-completion.  The basic idea here is that a word string may have substrings whose frequencies are similar to the frequency of the word.  For example, the frequency of the occurrence of "xample" may be similar to the frequency of "example," as they occur almost always together.
The result of a simple experiment with an artificial corpus (modified from the previous labeling experiment, adding one-word utterances such as 'illo' and sentences about the ambience such as 'ilfacefrigide' ("It's cold.")):
        lo,778
        il,531*
      loes,456
      illo,301*
        le,271*
        un,268*
    illoes,262
    angulo,261*
        au,242
        ta,204
        re,188
        as,176
     verde,171*
      esun,160
      blau,157*
     rubie,156*
    ilface,153
   circulo,143*
rectangulo,132*
 loesverde,131
 triangulo,129*
  anguloes,125
  loesblau,122
 loesrubie,120
       lor,104
     tomas,104*
  illoesun,83
ilfacefrigide,80
  ....
* Intended words are marked with '*'.
The numbers are the frequency of strings in the corpus of 1000 utterances.

Tuesday, October 7, 2014

Research Plan Updated (2014-10)

I'm streamlining my research plan further.  I'm dropping computer vision to concentrate on language acquisition.  Instead, I'll use simplified internal representation presumably obtained by abstracting data from vision systems.  The internal representation would contain information on the features of objects jointly-attended by putative 'care-takers,' orientational relations among objects and their motion.


Purpose

Creating a system that performs language acquisition (symbol grounding) with a robot simulator, pattern recognizers and association networks, to verify my associationist hypotheses of cognition.

Language acquisition

The system will follow the human (infant) language acquisition process: it shall associate linguistic expressions with internal representations of the shapes, colors, movements and relations of perceived objects.

Core cognitive architecture

The core cognitive architecture shall have the following functions.
Upon designing and implementing the cognitive architecture, generic mechanisms shall be (re)used whenever possible.
  • Time-series pattern learner/recognizer
    having motor commands and sensory input as their input
  • Attention and situation assessment
    on what will be learned.
  • Cognitive model based on association
    to memorize and recollect (temporal) generative patterns as associative sequences.
    Linguistic competence will be realized with this model.
    It contains backtracking mechanism based on the function of attention and situation assessment mentioned above.
  • Episodic memory
    Patterns (the representation of situations -- combinations of abstract patterns created by (non-supervised) learning) positively assessed by the attention and situation assessment will be memorized.
 cf. A Figure of Human Cognitive Function

Platform

  • Visual data processing
    OpenCV, SimpleCV, etc.
  • Learning module
    BESOMDeSTIN, k-means, SOM, SVN, HMM, SOINN, etc.
    (To be used as plug-ins depending on the purpose)

    * Currently, the use of learners with distributed representation is avoided.
  • Publicly available cognitive architectures may be surveyed
    e.g., OpenCog, MicroPSI and LIDA
    (✔Done)


A Tentative Research Steps

The following part doesn't have much change from the previous plan.
Changes in Phase III are indicated by blue letters.


Phase I: Survey on robot simulators (✔Done)

Research on robot simulators such as SigVerse and V-rep and trial on attitude control.

Phase II: Survey on Spelke's Object recognition (✔Done)

Proper recognition of Spelke's objects, having coherent, solid & inert bundle of features of a certain dimension that continues over time, would require optical flow processing associated with the action of the perceiver.

Phase III: Labeling (Currently Underway)

  • Basic Ideas
    • The system shall relate specific types of figures it looks at with linguistic labels.
    • Figures get the system's attention by force (exterior instruction)
    • Labels may be nominals representing shapes and adjectives representing features such as colors.
      Types of objects may be learned in a supervised manner with labels or have been categorized by non-supervised learning.
    • The system shall utter labels on recognizing types after learning association between the labels and types.
    • The system shall recognize/utter the syntactic pattern 'adjective + noun'.
  • Determining the recognition method & implementation
  • Designing and implementing mechanism for handling syntactic structure.
    (A viterbi/chart analyzer may be used.)
  • Incorporating episodic memory at this stage is to be considered.
  • Labeling experiment (cf. preliminary experiment)

Phase IV: Relation Learning (Coming Soon?)

  • Basic Ideas
    • The system shall learn labels for
      • object locomotion such as moving up/down, right/left and closer/away
      • orientational relations between objects such as above/below, right/left and short/thither
    • Objects should be get the system's attention by force (programming) or by certain preprogrammed mechanism of attention (such as attention to moving objects). 
  • Designing & implementing the mechanism
  • Experiments

Phase V: Linguistic Interaction

  • Basic Ideas
    • The system shall answer to questions using labels learned in Phase III & Phase IV.
    • The system shall respond to requests on its behavior.
    • The system shall utter clarification questions.
  • Designing & implementing mechanism for linguistic interaction
  • Experiments

Phase VI: Episodic memory

  • Basic Ideas
    • Episodes (situations) to be memorized are the appearance of objects and changes in relations among them.
    • The memory of novel objects and situations is prioritized.
  • Designing & implementing episodic memory and attentional mechanism
  • Designing & implementing episodic recollection & linguistic report.
  • Experiments

Phase VII: More complicated syntax

  • Basic Idea
    The system shall understand/utter linguistic expressions having nested phrase structure.
  • Designing & implementing nested phrase structure.
  • Experiments

Wednesday, October 1, 2014

Requirements and Choices in Approaches for AGI

In this article, a few requirements for the realization of AGI are listed and choices in related approaches are discussed (日本語版).
The purpose of this article is to give a simple overview toward the realization of AGI.

Requirements for AGI

  1. Cognitive architecture capable of planning and plan execution
  2. Emergent conceptual learning (or representation learning)
  3. Associative memory based on statistics (necessary for coping with the frame problem)
  4. Linguistic Functions (as for more general function for handling symbols, conditions such as those discussed in Fodor and Pylyshyn (1988) should be met.)
  5. Embodiment (in case cognitive development matters)
  6. Language acquisition (if cognitive development matters, E and B are required)
* The list is not meant to be exhaustive; obviously, there are other necessary cognitive functions such as episodic memory, reinforcement learning and information binding. The items are chosen for the sake of discussion.

Choices of approaches

While there are many approaches towards AGI, not so many deal with all of the requirements above. Below, alternatives for the classification of approaches are discussed.
  • Embodiment If an approach deals with embodiment, it should be involved in (cognitive) robotics. A cognitive robotics approach that also deals with language (acquisition) can be found here, which might be said one of the most comprehensive approaches for AGI.
  • Distributed representation vs. symbolic representation
    As perception processing such as computer vision is largely distributive, many approaches adopt distributed representation in this domain.
    Approaches vary depending on whether they adopt distributed representation in other domains (cf. the requirements listed above).
  • Kinds of distributed representation
    Such as neural network models and bayesian networks.
    Neural models vary depending on to which extent they are close to the real neural networks (cf. BICA below).
    Some approaches are hybrid between neural and bayesian networks (e.g., BESOM, combining SOM and the bayesian network).
  • Symbolic representation While there are classical knowledge representations such as frames, scripts and production rules, it is empirically known that at least a certain statistical method/representation should be used together to solve real-world problems. Besides approaches incorporating probabilistic elements in classical representation such as production rules (e.g., ACT-R), various probabilistic inference systems (such as probabilistic programming languages, NARS and PLN@OpenCog) have been proposed.
  • Emergent conceptual learning
    A well-known example is the neural-network-inspired SOM.
    Emergent conceptual learning may be regarded as a kind of hidden variable estimation. In the area of deep learning, hidden variables are estimated with auto-encoders and restricted Boltzmann machines. Other approaches may adopt HMM-like models or statistical methods such as LDA for hidden variable estimation.
  • Associative memory Supervised learning can be seen as associative memory in a broad sense. There are approaches using distributed representation such as neural/bayesian network models, and approaches using statistical methods such as LDA (most recent document search methods fall under the latter).
  • BICA (Biologically-Inspired Cognitive Architecture)
    Approaches for AGI vary depending on whether or to which extent they mimic living organisms (notably the brain). While WBE (whole brain emulation) is the extreme case, there are more abstract approaches such as Leabra, which mimics brain organs, SPAUN/Nengo based on the spiking neuron model, BESOM, which uses the bayesian network while mimicking brain organs, HTM that mimics neocortex (only SPAUN/Nengo and Leabra may currently realize cognitive architectures). DeSTIN, which incorporates reinforcement learning with hierarchical temporal learning, is loosely inspired by the neocortex and basal ganglia. Psi/MicroPsi is a more psychological model though it is also inspired by neural networks. In general, the more inspired by the brain, the more the approach tends to adopt emergent distributive representation as the brain does.