I implemented a simple salience-saccade model. Visit the repository for details. The model can be used for any (active) vision-based agent building.
In 2021, I wrote about Visual Task and Architectures. The current implementation is about the where path, saliency map, and active vision (gaze control) in the post. As for the what path, I did a rudimentary implementation in 2021. I implemented a cortico-thalamo-BG control algorithm in 2022. I also worked on the match-to-sample task of a non-visual type this year (previous post).
While I might go for experiments on minimal visual word acquisition, I should add the what path (object recognition) to the current model in any case.