Abstract: As many animals, including humans, make behavioral decisions based on visual information, a cognitive model of the visuomotor system would serve as a basis in intelligence research, including AGI. This article reports on the implementation of a relatively simple system: a virtual environment that displays shapes and cursors and an agent that performs gaze shift and cursor control based on the information from the environment. The visual system is modeled after that of humans with the central and peripheral fields of view, and the agent architecture is based on the structure of the brain.
1. Introduction
This article reports on the implementation of a simple environment and agent architecture for decision making based on visual information, which would serve as part of more generic cognitive models/architectures. It also addresses human ‘active vision,’ where visual information is collected and integrated through gaze shift.
This work adopts a strategy of starting with a relatively simple model. The implemented two-dimensional visual environment displays simple figures and cursors. Figures and a cursor can be moved (dragged) by instructions from the agent.
As for the agent, the following were modeled and implemented, imitating the human visual system.
1) distinction between central and peripheral vision,
2) gaze shift based on salience in the peripheral vision [1],
3) unsupervised learning of shapes captured in the central vision,
4) reinforcement learning of cursor movement and dragging,
5) “surprise” due to changes in the environment caused by actions and habituation due to learning,
6) reward based on “surprise”.
Here 3), 4), and 5) involve learning and are provided with learning models. Agent's action consists of gaze shift and cursor movement + dragging. gaze shift in the model does not learn and is driven by salience.
2. Environment
The environment has a screen divided into an N × N grid (Figure 1). The center of the screen is a "stage" consisting of an M × M grid (M<N). The edges of the stage are marked with border lines. M different shapes are displayed on the stage. The visual information presented to the agent is a color bitmap of the field of view (M × M grid) centered on the gaze. The gaze is located at the center of a grid cell on the stage, and shifted when the environment is given a gaze shift signal (a vector of maximum and minimum values [± M, ± M]). It does not move off the stage. Two cursors of different colors are displayed on the stage. When the environment is given a cursor movement signal (a vector of maximum and minimum [± 1, ± 1]), one of the cursors may move, while it does not move off the stage. If the cursor is superimposed on a figure and the environment is given a non zero cursor move and grab signal, the figure is moved in the same direction and distance as the cursor move (i.e., dragged). Figure 1 shows an example display.
3. Agent
3.1 Salience Calculation Module (Periphery2Saliency)
3.2 Gaze Shift Module (PriorityMap2Gaze)
3.3 Object Recognition Module (ObjectRecognizer)
3.4 Central Visual Field Change Prediction Module (FoveaDiffPredictor)
3.5 Surprise-Reward Calculation Module (SurpriseReward)
3.6 Cursor Control Module (CursorActor)
4 Implementation and Test
4.1 Environment
4.2 Agent
Salience Calculation Module (Periphery2Saliency)
Gaze Shift Module (PriorityMap2Gaze)
Object recognition module (ObjectRecognizer)
Central Visual Field Change Prediction Module (FoveaDiffPredictor)
Cursor Control Module (CursorActor)
4.3 Experiments (Tests)
Object Recognition Module
Central Visual Field Change Prediction Module
Surprise-Reward Calculation Module
Cursor Control Module
5. Conclusion
References
-
[1] Veale, et al.: How is visual salience computed in the brain? Insights from behaviour, neurobiology and modelling, Phil. Trans. R. Soc. B, 372(1714) (2017). https://doi.org/10.1098/rstb.2016.0113
-
[2] Hiraki, K.: Detecting contingency: A key to understanding development of self and social cognition, Japanese Psychological Research, 48(3) (2006).
https://doi.org/10.1111/j.1468-5884.2006.00319.x -
[3] Ferrera, V. and Barborica, A.: Internally Generated Error Signals in Monkey Frontal Eye Field during an Inferred Motion Task, Journal of Neuroscience, 30 (35) (2010). https://doi.org/10.1523/JNEUROSCI.2977-10.2010
-
[4] Kouichi Takahashiet al.: A Generic Software Platform for Brain-inspired Cognitive Computing, Procedia Computer Science, 71 (2015). https://doi.org/10.1016/j.procs.2015.12.185
-
[5] Arakawa, N.: Implementation of a Model of the Cortex Basal Ganglia Loop, ArXiv (2024). https://doi.org/10.48550/arXiv.2402.13275
-
[6] Leibo, J., et al.: Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents, ArXiv (2018) https://doi.org/10.48550/arXiv.1801.08116
-
[7] Hoang, K. et al.: Active vision: on the relevance of a bio-inspired approach for object detection, Bioinspiration & Biomimetics, 15(2) (2020).
https://doi.org/10.1088/1748-3190/ab504c -
[8] McBride, S., Huelse, M., and Lee, M.: Identifying the Computational Requirements of an In- tegrated Top-Down-Bottom-Up Model for Overt Visual Attention within an Active Vision System. PLoS ONE 8(2) (2013). https://doi.org/10.1371/journal.pone.0054585
-
[9] Oudeyer P.Y., Kaplan , F., and Hafner, V.: Intrinsic Motivation Systems for Autonomous Mental Development, IEEE Transactions on Evolutionary Computation, 11(2). (2007). https://doi.org/10.1109/TEVC.2006.890271
-
[10] Schmidhuber, H.: Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010), IEEE Transactions on Autonomous Mental Development, 2(3) (2010). https://doi.org/10.1109/tamd.2010.2056368
-
[11] Cangelosi, A., et al.: Developmental Robotics: From Babies to Robots, MIT Press (2015) https://doi.org/10.7551/mitpress/9320.001.0001
-
[12] Fiore V., et al.: Instrumental conditioning driven by neutral stimuli: A model tested with a simulated robotic rat, in Proceedings of the Eighth International Conference on Epigenetic Robotics (2008).
-
[13] Santucci, V.G., et al.: Biological Cumulative Learning through Intrinsic Motivations: A Sim- ulated Robotic Study on the Development of Visually-Guided Reaching, in Proceedings of the Tenth International Conference on Epigenetic Robotics (2010).