Friday, April 18, 2025

A Simple Environment for Language Acquiring Agents

 I made a Gymnasium environment for agents that acquire language. (GitHub)

In the environment, one or two objects (card suites) move around in a scene.  The environment outputs a scene representation map and its text description as observation.  The scene representation map consists of features (shapes and colors, each represented as a one-hot vector) of objects embedded in a 2D map.  The text is in Interlingua.  Verbs include: pausa (pauses), va (goes), colpa (hits), and passa (passes).  Adjectives indicate the colors of objects.  Adverbs indicate the direction of the movement.

An agent that acquires (learns) language from this environment is fed with the observation.  It is supposed to associate object descriptions in the text with the object representation in the scene, to learn motion and interaction of objects, and to associate the learned activity representation with predicates in the text.


Sample text in the observation

Trifolio pausa
Trifolio va sub
Trifolio verde colpa Diamante
Trifolio va sup
Trifolio verde va sup
Spada va dextre con Corde
Spada blau va dextre con Corde 
Diamante rubie passa Corde
Diamante rubie va sub sinistre
Diamante colpa le muro
Corde jalne passa Diamante