Tuesday, September 23, 2014

Preliminary experiment on Lexicon Acquisition #1

As part of Phase ||| experiments, I conducted a preliminary experiment on lexicon acquisition, where a simple learner associates strings within given sentence strings with signals that represent things in the environment.

  • Representation of things in the environment
    There is (always) an object in the environment having one of three kinds of shapes (rectangle, triangle, circle) and three kinds of colors (red, blue, green), and the system is given two symbols respectively representing the shape and color of the object.
  • Sentential strings
    The system is given a sentence string also representing the situation (shape and color).  The task for the system is to extract lexical entries (strings) that would represent the shapes and colors.
    The sentences are in Interlingua.  The grammar for this experiment is as follows (in a pseudo BNF):
    S ⇒"tomas"?"reguarda"?S'"nonne"?
    S'⇒"illoes"?"un"Shape Color?
    where "tomas" is the name of the system being addressed and "reguarda" means "look!," "illo" "that," "es" "is," and "nonne" "isn't it," respectively.  '?' represents 0 or 1 occurrence.
    The following is sample sentences randomly generated from the grammar above:
3 1 illoesrubie
1 2 tomasilloesblaunonne
1 1 unrectangulo
2 1 letrianguloesrubie
1 3 reguardalerectanguloesverde
2 1 reguardailloesrubienonne
Two numbers at the beginning of sentences represent the shape and color.
The system collects all the ngrams in given sentences and calculates tf*idf with the shapes and colors in the environment.  Basically, ngrams with the highest scores with particular colors or shapes are supposed to represent the features.
First try:
Blue     :       bla,1.098066
Red      :        bi,1.087502
Blue     :        au,1.065256
Triangle :    iangul,1.053565
Red      :      oesr,1.021488
Blue     :   loesbla,1.017333
Green    :      verd,1.000361
Rectangle: ectangulo,0.987110
Circle   :     rculo,0.954782
Green    :     sverd,0.945333 
# of Sentence: 1,000
The left-hand is the features to be represented by the ngrams in the right hand.
At first glance, it looks like a disaster, but if you look closely, the extracted strings are mostly parts of the shape/color words.
Second try:
Instead of tf*idf, tf*idf*ngram.length was used.  The result seems better.
Sentence Count=50
Triangle : triangulo,2.613976
Blue     : uloesblau,2.477903
Red      : loesrubie,2.287229
Triangle :rianguloes,2.263571
Triangle : letriangu,2.159996
Rectangle: unrectang,2.147516
Red      :uloesrubie,2.109837
Circle   :lecirculoe,1.920132
Sentence Count=100
Circle   :   circulo,2.277660
Red      : loesrubie,2.239706
Blue     :  loesblau,2.206975
Green    : loesverde,2.143210
Red      :uloesrubie,2.090366
Blue     :illoesblau,2.045903
Triangle : triangulo,2.039147
Green    :illoesverd,1.920606
Circle   : uncirculo,1.905309
Sentence Count=500
Triangle : triangulo,2.217422
Red      : loesrubie,2.180421
Blue     :  loesblau,2.163493
Green    : loesverde,2.127100
Circle   :   circulo,1.923614
Triangle :rianguloes,1.801742
Rectangle: unrectang,1.771783
Blue     :illoesblau,1.755595
Sentence Count=1000
Triangle : triangulo,2.314919
Red      : loesrubie,2.244439
Blue     :  loesblau,2.115484
Green    : loesverde,2.077109
Circle   :   circulo,1.857920
Triangle :ntriangulo,1.785375
Red      :lloesrubie,1.773095
Triangle :rianguloes,1.710290
Green    :illoesverd,1.709630
It seems it cannot extract color terms properly with the given setting.  Presumably, it must learn other terms such as "illo" and lexical items may be determined by fitting them into sentences (looking for best segmentation). 

Tuesday, September 9, 2014

Phase III: Labeling / Plan I

This article is of an experimental design for a part of my research plan (on language acquisition), where phonetic labels and visual patterns are to be associated with machine learning. 

  • Basic Ideas
    • Goal: The system relates figure images with linguistic labels (phoneme strings).
    • Labels represent the shapes or colors of figures.
    • Labels and shapes/colors of figures shall be learned in a non-supervised manner by presenting labels and figures together.
    • The system should be able to associate labels with shapes/colors of figures after learning.
  • Tools
    • Computer Vision: OpenCV/SimpleCV
    • Machine Learning Tool: TBD
      • Clustering (non-supervised learning)
        Hopefully, non-parametric for the number of clusters.
      • Association
        Auto-complete-like function for learned patterns. 
  • Steps
    1. Learning patterns
      • Input patterns (in two modalities)
        • Figure images: rectangles, circles, triangles of various sizes and colors
        • Label pattern: 2D array of (phoneme position × phoneme type)
      • Image processing: monochromatizing, edge detection, SIFT?
      • Learning
        Co-occurrence patterns of figures and labels, figure shapes, colors, and labels shall be clustered with the following configuration of learners:
        [the following has been modified on 2014-09-14]
        • Label learner: label candidates are selected from frequent phoneme N-grams.
        • Shape learner: assign a non-supervised learner for figure shapes.
        • Color learner: assign a non-supervised learner for figure colors.
          Colors are learned with color label candidates as teacher signals.
        • Cross-modal (label sense) learner: for a label (candidate) to have a sense, it must have a statistic correlation with a 'cluster' in shape/color learner.
    2. Association
      Hopefully, the non-supervised leaners used in 1 have associative, 
      auto-complete-like function for learned patterns.  Otherwise, some external mechanism should be added to associate labels (phoneme patterns) with shapes/colors and vice versa.
      In any case, as learned category nodes do not have complete information to construct particular lower input patterns, some external mechanism should be added to visualize the association.