Monday, January 20, 2014

Phase I ⇒ Phase II

I spent too much time on attitude control.  It's time to move on.
As for attitude control, I have done:
  • 1 dimensional rotation control by simple learning algorithms
    (experimenting in Java)
  • porting the algorithms to c++
    (SigVerse controllers must be written in c++.)
  • Implementing simple attitude control mechanism (3D but without learning) with SigVerse.
OK, attitude control doesn't require learning, and my learning/constructing c++ environment required a lot of time...

Last September, I wrote on my research plan:

Phase II: Recognizing Spelke's Objects

  • Basic Ideas
    • Spelke's Object: coherent, solid and inert bundle of features of a certain dimension that continues over time.
      Features: colors, shapes (jagginess), texture, visual depth, etc.
    • While recognition of Spelke's objects may be preprogrammed, recognized objects become objects of categorization by means of non-supervised learning.  In this process, hierarchical (deep) learning would be done from the categorization of primitive features to the re-categorization of categorized patterns.
    • Object recognition will be carried out within spontaneous actions of the robot.
    • The robot shall gather information preferentially on 'novel' objects (curiosity-driven behavior) ('novelty' to be defined).
The following is a bit more concrete specification for Phase II.
Experiments will be done with the SigVerse robot simulator.

Robot Basics
  • Fish like robot swimming in a 3D space
Environment
  • The 'aquarium' is a cube/cuboid enclosed by walls.
    The robot cannot see transparent walls.
    The observer cannot see inside opaque walls.


    Keep the robot from wandering away by making it attracted by objects on the floor.
  • There are passively movable objects on the ground.
Robot Vision
  • Line of Sight and 2D distance sensors (SigVerse)
Basic activities
  • If reward is under a threshold, then change direction.
  • Change direction for reward increase.
  • Accelerate towards the direction of maximal reward.
  • Accelerate for reward increase. Accelerate inversely for reward decrease.
Rewards
  • Complexity in the (variance of) 2D distance (depth) patterns will give a positive reward (motivation of reaching for objects; curiosity; aesthetics)
  • Concussion by collision will give a negative reward.
  • Getting bait will give a positive reward. 
Learning behaviors
  • Learning by rewards (reinforcement learning)
  • For example, the robot may learn bumping objects away to get bait.
Learning sensory patterns
  • Sensory input
    • 2D distance depth (including optical flow)
    • Acceleration and rotation (kinesthetic input)
    • Concussion (large acceleration)
  • Clustering sensory input
  • Of course, feature selection (manual or automatic) is the key for successful learning.
Spelke's Objects
Check if the sensory pattern learning above yields the recognition of Spelke's objects.
If it does not, then add built-in mechanisms.