Tuesday, November 22, 2022

Remaining Issues with AGI as of 2022

Abstract

This article confirms the definition of AGI and discusses unrealized functions of human-like AGI as of 2022, which include fluid intelligence, generative rule handling with case-based AI, making out in the real-world, social intelligence, language acquisition, and mathematics.  (The article is an English version of a proceedings article in Japanese for a local workshop on AGI [2022-11-22].)

1. AGI and General Intelligence

General intelligence, which is a part of the term Artificial General Intelligence (AGI), is a psychological term, originally postulated as one or a few general problem-solving factors in the measurement of human intelligence [1].  Factors of intelligence are determined by statistically processing the results of intelligence tests.  The CHC model is an attempt to comprehensively enumerate the factors.

While AGI has not been unanimously defined in the community, it is generally considered to be an attempt to provide artifacts with problem-solving abilities that can deal with problems beyond those assumed at the time of design.  (AI that solves only the problems assumed at the time of design is called "narrow AI" as opposed to AGI.)

While, as indicated above, human general intelligence and the general intelligence for AGI are different by definition, this article gives examples of what the current AI has not achieved from the standpoint that AGI should achieve "at least" human intelligence or human problem-solving abilities.

2. Fluid Intelligence

Fluid intelligence, posited as one of the intelligence factors, is "the ability to solve novel, abstract problems that do not depend on task-specific knowledge" [2] and is often regarded as a central part of human intelligence.  By this definition, fluid intelligence is closely related to the "problem-solving ability to deal with problems beyond design assumptions" required for AGI. (Note: A more general discussion of fluid intelligence as "policy generation" is given in [3] (Chapter 12)).

Fig.1 Raven Progressive Matrix
CC BY-SA 3.0 Life of Riley @Wikimedia

In a matrix reasoning task, subjects are presented with a matrix, where a cell in the last row is blank.  Subjects discover a rule from the pattern shown in the other rows and apply the rule to the last row to fill in the blank cell.

Tasks to measure fluid intelligence require the ability to conduct an internal search from one or a small number of examples presented to find a solution while generating hypotheses (cf. my blog article).  The Raven Progressive Matrices (RPMs; see Fig.1) are typical intelligence test tasks that measure fluid intelligence.  A review article [4] summarizes attempts to solve RPMs using deep learning, and describes the problem of insufficient generalization with deep learning.  Humans solve tasks without being given a large number of task examples in advance as they discover the regularities/rules while dealing with the tasks.  Thus, to realize fluid intelligence in AGI, it would be important to implement the ability to discover rules (see my previous post).

3. A Theoretical Problem: Generative Rules and Case-Based AI

Current mainstream machine learning-based AI is basically case-based, which tries to solve problems with a large number of examples.  Case-based AI cannot, in principle, solve problems that do not exist in the examples or in their generalization.  Meanwhile, human languages use generative rules, which can generate an infinite number of patterns from a finite set of rules and vocabulary.  A finite set of cases can not, in principle, cover the infinity to be generated by rules.  Besides natural languages, computer languages, logic, and mathematics are examples of systems based on generative rules.

The inability of case-based AI to cover generative rule-based phenomena does not mean that AI in general cannot handle them; "good old" symbolic AI often handled generative rules.  Given the success of case-based AI, it will be important to incorporate generative rule handling into case-based AI.

Notes: For a discussion rather favoring the symbolic approach to case-based AI, see [5]. cf. related conference: The Challenge of Compositionality for AI and a recent talk.
For a successful example of combining deep learning and symbolic search, see MuZero [6].

4. Dealing with the Real World

Intelligent robots that work in the real world like humans are not yet available.  For example, we are yet to have a robot proposed by Wozniak, which can make coffee in a kitchen it enters for the first time.  While the current mainstream ML-based AI is case-based as pointed out above, it lacks enough experience (cases) in the real-world.  While data for learning is often collected from the Internet, data from interaction of agents with the real (3D) world or of "lived experience” is scarce.  Note that research on real-world interactions of artificial agents has been made in the field of cognitive (developmental) robotics [7] [8].

5. Social Intelligence

Humans begin to infer the intentions of others as infants [9] and often acquire a "theory of mind" before reaching school age.  Such intelligence has not been realized in AI.  Because society is also part of the real world, lived experience is required to learn social intelligence.  While data for social intelligence can be collected in cognitive developmental robotics and cognitive psychology, human social intelligence may require genetically wired mechanisms (or prior knowledge), which are studied in broader cognitive science such as neuroscience. 

6. Language Acquisition

Linguistic competence is the ability to appropriately handle the phonological, morphological, syntactic, semantic, and pragmatic aspects of a language.  As grammar is a set of generative rules, its appropriate handling requires the ability to handle generative rules (see above) [10].  Case-based AI can handle "meaning" hidden in the distribution of words and  associations between words and images appearing in data sources (corpus).  Since the meaning of a complex linguistic expression such as a sentence is synthesized from the meanings of its components by generative rules, the ability to handle generative rules is also necessary to handle compositional semantics.  Meanwhile, "lived experience" (see above) is required to handle semantics grounded on real-world experience (cf. The symbol grounding problem has been partially solved [11]).  Pragmatic competence is social intelligence acquired through the practice of linguistic exchange (language games) with others; so, again, lived experience is necessary.  Linguistic competence requires the ability to handle generative rules and the lived experience of language practice, both of which have not yet been fully integrated to the current AI.

Human language acquisition begins in infancy.  Infants are assumed to have an innate ability to handle generative rules in addition to statistical learning.  Infants are also able to infer the intention of their caregivers to understand the relationship between words and their referents (see social intelligence above).  Given these facts, AI's acquisition of linguistic abilities would profit from research on human language acquisition.

7. Mathematics

According to mathematical logic, mathematics can be viewed as a system of "generative rules" (see above).  In fact, case-based AI cannot even handle addition [12][13].  On the other hand, the part of mathematics formulated in first-order predicate logic can be handled by the Good Old symbolic AI (e.g., quantifier elimination solvers).

If AI is to imitate human mathematical abilities, cognitive scientific research on human mathematical abilities (to handle numbers and quantity) would be necessary (cf. this is an area J. Piaget, et al. pioneered).

8. Summary

This article discussed the unrealized functions of current AI compared to human intelligence. Specifically, case-based AI cannot handle generative rules, so it cannot handle syntactic and compositional semantics of language nor mathematics.  It was also pointed out that current AI suffers a paucity of lived experience.

As classical symbolic AI handled generative rules, it is important to make case-based AI handle generative rules (philosophically, it is a synthesis of empiricism and rationalism).

It was suggested that cognitive robotics research will be important to address the issue of lived experience for AI.

Finally, it is noted that the insights of cognitive science in general will be important for AGI research in terms of learning from human intelligence.

References

[1] Spearman, C.: General Intelligence, Objectively Determined and Measured, The American Journal of Psychology, Vol.15, No.2, pp.201—292.  doi:10.2307/1412107 (1904).

[2] Kievit, R.A., et al.: A watershed model of individual differences in fluid intelligence, Neuropsychologia, Vol.91, pp.186–198 (2016) doi:10.1016/j.neuropsychologia.2016.08.008

[3] Hernández-Orallo, J.: The Measure of All Minds: Evaluating Natural and Artificial Intelligence, The Cambridge University Press (2017)

[4] Małkiński, M., et al.: Deep Learning Methods for Abstract Visual Reasoning: A Survey on Raven’s Progressive Matrices, arXiv, doi:10.48550/arXiv.2201.12382 (2022)

[5] Marcus, G.: The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence, arXiv, doi:10.48550/arXiv.2002.06177 (2020)

[6] Schrittwieser, J. et al.: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model, arXiv, doi:10.48550/arXiv.1911.08265 (2020)

[7] Pfeifer, R., Bongard, J.: How the Body Shapes the Way We Think: A New View of Intelligence, MIT Press (2006)

[8] Cangelosi, A., et al.: Developmental Robotics: From Babies to Robots, MIT Press (2015)

[9] Gergely, G., Bekkering, H. & Király, I.: Rational imitation in preverbal infants. Nature 415, 755 (2002). doi: 10.1038/415755a

[10] Delétang, G. et al.: Neural Networks and the Chomsky Hierarchy, arXiv, doi:10.48550/arXiv.2207.02098 (2022)

[11] Steels, L.: The symbol grounding problem has been solved, so what’s next?, in Symbols and Embodiment: Debates on meaning and cognition, doi: 10.1093/acprof:oso/9780199217274.003.0012 (2008)

[12] Brown, T., et al.: Language Models are Few-Shot Learners, ArXiv, doi: 10.48550/arXiv.2005.14165 (2020)

[13] Fujisawa, I., et al.: Logical Tasks for Measuring Extrapolation and Rule Comprehension, ArXiv, doi: 10.48550/arXiv.2211.07727 (2022)

Monday, November 21, 2022

A Model of Fluid Intelligence based on Examining Experienced Sequences

Abstract

This article proposes a model of rule/policy discovery based on examining experienced sequences.  Fluid intelligence, as measured by intelligence tests, can be viewed as the ability to discover policies for solving problems from one or a small number of examples.  Agents with fluid intelligence examine a small number of experienced time series to discover common rules.  If the sequence is not present, memory recall (replay) would be used.  The proposed model ”goes over” experienced sequences and extracts elements such as attributes, relationships among the input elements, and agent actions, to generate hypothetical policies.

1. Introduction

AGI as engineering is an attempt to give artifacts a general problem-solving capability beyond design.  General intelligence was originally postulated as a factor of one or a few general problem-solving capabilities in the measurement of human intelligence [1]Fluid intelligence was postulated as one of the factors that make up general intelligence. While the original definition of fluid intelligence [2] was "the ability to recognize relationships," the definitions in the community vary.  Kievit et al. summarize fluid intelligence as "the ability to solve novel, abstract problems that do not depend on task-specific knowledge" [3].  More generally, Hernández-Orallo [4](Chapter 12) addresses fluid intelligence as a policy-generating capability.  In an intelligence test, the subject is required to find a policy for solving the problem from one or a few examples.  This requires the ability to conduct an internal search, generate multiple hypotheses, and find a solution, and it would be the central ability of fluid intelligence.  In the following, fluid intelligence is regarded as the ability to discover policies for problem solving from one or a few examples.  Note that while there are attempts to solve fluid intelligence tasks such as Raven’s Progressive Matrices (see Appendix) with deep learning methods [5], if they have learned with ample task data similar to the task to be tested, they are using crystallized intelligence rather than fluid intelligence to solve it.

In intelligence test-like tasks (see "Appendix: Assessment Tasks"), abstraction is necessary, for the same situation is not normally repeated. The abstract elements of the solution include the attributes of the input elements, the relationships between the attributes, and the actions of the agent.  Policy discovery, including abstraction, is a process of induction.  While machine learning is also inductive, a difference lies in the number of samples.  Fluid intelligence in intelligence testing requires finding common structures from a small number of samples.  This ability is useful in devising solutions to problems encountered in a variety of situations.  In the following, a model of rule (policy) discovery based on examining experienced time series is proposed.

2. Model of the Discovery Process

Discovery of rules (policies) from experienced series is done by "going over" the the series:

  • If the entire problem is not presented to the agent at once, a replay is performed, otherwise the agent goes over the presented scene.
  • Elements (attributes of input elements, relations among the attributes, and actions of agents) are extracted from the success series to form a hypothetical policy series.
  • Various heuristics can be used to determine which elements are prioritized in the hypothetical policy. (e.g., Emphasis is placed on elements in close temporal and spatial proximity and the relationships associated with them.)
  • Elements in the failed series are discounted.
  • The hypothetical policy is verified with one or more series.
  • Hypothetical policies that fail verification are stored as rejected policies so that they will not be tested/used again.

3. Required Mechanisms

  • Mechanism to go over spatial scenes by gaze (eye) movement for problems presented visually
  • Mechanism to go over a temporal sequence – replay mechanism which recalls memorized sequences for policy generation and validation
  • Mechanism to generate policy elements;  e.g., attribute extraction (abstraction) and discovery of relationships between elements
  • Mechanism to give preference: preferences are useful for the search process to select policy elements.
  • Mechanism to create a hypothetical policy series by adopting policy elements
  • Mechanism to store hypothetical policies
  • Mechanism to determine whether a hypothetical policy can be applied to a spatial scene or temporal (replayed) series
  • Mechanism not to use rejected series
  • Working memory – required for various tasks
  • Mechanism to control the process as a whole

4. Policy Generation, Verification, and Application

Based on the required mechanisms, the process of policy generation, verification, and application can be summarized as follows:

  • Policy generation
    • Go over the successful series and create a series that reproduces the input elements.
    • Attention is given preferentially to a specific attribute or relation in the sequence.
    • Generate a hypothetical policy series from the sequence of attributes or relations extracted with the given attention.
  • Policy Verification
    • A series may be made by trial runs or by memory recall (replay).
    • Recall the hypothetical policy from the (trial or recalled) series, and try to apply it (see below).
    • If a (trial or recalled) success series matches the policy, retain it for further validation with other success series.
    • If a (trial or recalled) failure series matches the policy, reject the policy.
  • Applying policy to a series
    • Apply the policy to the sequence, starting with the first element in the sequence and checking for a match to each recalled policy sequence element in turn.
    • If the application of a policy element fails, then the policy fails.

5. Implementation with (Artificial) Neural Networks

If the elements (attributes and relations) are entirely symbolized and provided, the mechanism above could be implemented by a symbolic (GOFAI) algorithm.  If the elements are not clearly defined, it should be difficult to create a symbolic algorithm, and implementation would require fuzzy pattern matching and learning functions as found in (artificial) neural networks.  Note that problems must be solved without having been exposed to similar tasks even when learning is required.  In the following, hints for implementation with (artificial) neural networks are presented in line with the Required Mechanisms.

  • Mechanism to go over spatial scenes
    The mechanism of saccade control by the brain can be imitated.
  • Mechanism to go over temporal sequences
    Experienced sequences are recalled from other sequences and used for policy generation and validation.  Since no generalization is needed in the memory of series, a simple non-learning storage device could be used.
    Since replay is believed to occur in the hippocampus in the brain, the hippocampal mechanism can be imitated.  Meanwhile, as the phonological loop (working memory for speech) [6] is assumed to be located in the cortex, extra-hippocampal cortical circuits may also have replay-like functions.
  • Mechanism for generate policy elements
    • Attribute extraction (Abstraction)
      It is known that abstraction occurs in artificial neural networks through learning.
    •    
    • Discovery of relations between elements
      Relations (e.g., co-occurrence of attributes) among elements can be extracted with artificial neural networks.  In order for a neural network to recognize transformation (such as rotation) of a figure, it must have learned the transformation.
    •    
    • When policy elements are created during replay, it would be better to have a mechanism to control the timing of replay to create a time margin for processing. [Note]
  • Mechanism to give preferences
    Preferences such as for spatial proximity can be incorporated into the structure of a neural network.
  • Mechanism to create a hypothetical policy series by adopting policy elements/Mechanism to store hypothetical policies
    • Policy elements are recalled and adopted by attention.  A certain mechanism (e.g., winner-takes-all) would be needed to select an element for attention.
    • The series formed in the system can be stored in a mechanism similar to replay.
    • Policy elements are pairs of attributes or relations to be selected and attention.
  • Mechanism to determine whether a hypothetical policy can be applied to a spatial scene or replayed temporal series / Mechanism not to use rejected series
    • Matching a hypothetical policy with memorized series could be implemented with the pattern matching function of a neural network.
    • Policies that match the failed series are classified as rejected, and will not be used.
  • Working memory – networks such as a bi-stable network could be used.
  • Mechanism to control the process as a whole
    The process is repeated until a policy consistent with all the presented series is generated.
Note: Policy elements become the object of attention (i.e., made aware) when they are added to the policy. In this sense, policy generation involves System 2 in dual-process theory [7], which also makes policy verbalization possible.  However, other processes are not necessarily brought to attention.

6. Conclusion

This article has only suggested a model.  Future work would include its psychological validation and/or software implementation.  A literature survey on brain regions and functions corresponding to the model will be necessary to support it from the neuroscientific viewpoint.  Since policies discovered by the model include the actions (operations) of the agent, the mechanism is to discover at least one class of algorithms.  By examining how general the class of algorithms it discovers, it will be possible to evaluate it as a model of general intelligence.

References

[1] Spearman, C.: General Intelligence, Objectively Determined and Measured, The American Journal of Psychology, Vol.15, No.2, pp.201--292. doi:10.2307/1412107 (1904).

[2] Cattell, R.B.: The measurement of adult intelligence, Psychol. Bull., Vol.40, pp.153-–193. doi:10.1037/h0059973 (1943)

[3] Kievit, R.A., et al.: A watershed model of individual differences in fluid intelligence, Neuropsychologia, Vol.91, pp.186–198 (2016)
doi:10.1016/j.neuropsychologia.2016.08.008

[4] Hernández-Orallo, J.: The Measure of All Minds: Evaluating Natural and Artificial Intelligence, The Cambridge University Press (2017)

[5] Małkiński, M., et al.: Deep Learning Methods for Abstract Visual Reasoning: A Survey on Raven’s Progressive Matrices, arXiv, doi:10.48550/arXiv.2201.12382 (2022)

[6] Baddeley, A.D., Hitch, G.J.: Working Memory, In G.A. Bower (Ed.), Recent advances in learning and motivation (Vol. 8, pp. 47-90), New York: Academic Press (1974)

[7] Kahneman, D.: A perspective on judgement and choice, American Psychologist, Vol.58, No.9, pp.697-–720. doi:10.1037/0003-066x.58.9.697 (2003)

[8] Joyner, A., et al.: Using Human Computation to Acquire Novel Methods for Addressing Visual Analogy Problems on Intelligence Tests, ICCC (2015) [PDF]

[9] Carpenter, A., Just, A., Shell, P.: What one intelligence test measures: A theoretical account of the processing in the Raven Progressive Matrices Test, Psychological Review, 97(3) doi: 10.1037/ 0033-295X.97.3.404 (1990)

Friday, March 18, 2022

Attempts to implement biologically plausible combined learners

 This article is a summary of the past three articles (A, B, C) and a translation of an article in Japanese.

Abstract

Brain-inspired cognitive architecture is one of the options for the realization of AGI.  While backpropagation has been widely used in artificial neural networks, it is not considered to be used in the brain, especially across brain regions.  This paper reports the implementation of the following cognitive architectures consisting of multiple learners without backpropagation between modules:

  1. A minimal architecture consisting of multiple learners without inter-module backpropagation

  2. A model of cortical, basal ganglia, and thalamic loops

  3. A model of a working memory with an attention mechanism.


Introduction

In recent years, progress has been made in the field of machine learning using artificial neural networks (ANNs).  In agent learning with ANNs, end-to-end deep reinforcement learning with backpropagation is the mainstream.  Meanwhile, it is hoped that cognitive architectures inspired by animal brains will serve as an approach to AGI.  Since backpropagation may not occur in the animal brain (at least not across brain regions), I have attempted to implement three cognitive architectures consisting of learners without inter-module backpropagation.


In the implementations, modules modeled after the cerebral cortex and basal ganglia were incorporated.  The cortex is thought to process input into forms useful for making survival decisions and making predictions and control outputs.  The basal ganglia are thought to control the timing of the cortex's external outputs with reinforcement learning.


The cortex is composed of multiple regions, and there is a hierarchy among the regions, especially in sensory areas.  In "deep learning," learning is performed in a hierarchical manner that mimics the hierarchy, based on the assumption that errors are propagated backward across regions.  However, in the animal brain, there is no way for errors to propagate across regions.  A more biologically plausible hypothesis is that each region learns to minimize prediction errors (predictive coding).


Regular (deep) reinforcement learning assumes that a single learner controls task-specific outputs.  In the brain, multiple cortical areas with their corresponding basal ganglia are in charge of controlling motions and brain areas.  Thus, brain-inspired cognitive architectures would have to incorporate multiple reinforcement learners.


With the consideration above, this article reports the three implementations:

  1. a minimal architecture incorporating multiple learners

  2. the cortio-BG-thalamic loop

  3. working memory with attention mechanism


Implementation frameworks

The following frameworks were used.

Environment framework

Open AI Gym, a widely used environment framework, was used.

Cognitive Architecture Description Framework

BriCA (Brain-inspired Computing Architecture), a computing platform for developing brain-inspired software, was used to describe the cognitive architecture (agent definition) [1]. BriCA has a mechanism for modules to exchange numerical vector signals in a token passing manner.  Modules can also be nested.

Machine learning framework

PyTorch was used as the ANN implementation framework, and TensorForce was used as the reinforcement learning framework.  With TensorForce, internal environments are easily set up.


Implementation attempt

Verifying the working of a minimal architecture with multiple learners

(main article)

An architecture with BriCA was implemented consisting of two modules: a module containing a simple autoencoder (with PyTorch) as the cortical part, and a module containing a reinforcement learner (with TensorForce) with an internal environment set up as the basal ganglia part.  The Cart-Pole task of the Gym (input is a 4-dimensional numerical vector) was used for testing.  The autoencoder was pre-trained with environment input obtained from random output.

Since the token-passing cycle of BriCA is not the same as that of the external environment, tokens corresponding to the cycles of the external environment were set in the internal implementation so that the internal processing proceeds according to the cycles of the external environment.  Specifically, the internal processing is performed as the value of the token increases by one.

See the GitHub page for the code and results.


Fig. 1

Implementation of the cortico-BG-thalamic loop

(main article)

The implementation is an attempt to realize the action selection and execution thought to be realized by the cortico-BG-thalamic circuit with minimal mechanisms. The mechanisms to be implemented are based on the following hypotheses.

Hypotheses

The cortex selects (predicts) actions. The basal ganglia make judgments about whether the cortex's choices are suitable for execution (and disinhibit the thalamus).  The basal ganglia, as a reinforcement learner, receives information on the input to the cortex (State) and the choice being made (Action), selects Go/NoGo, and then receives a reward to learn the policy of State+Action ⇒ Go/NoGo.  Meanwhile, the cortex learns from the executed State-Action pairs and predicts (selects) the appropriate State ⇒ Action.

The following provide background that corroborates the hypotheses.

  • On the fact that the basal ganglia do not make action choices:

    • The thalamic matrix subject to basal ganglia inhibition may not be "fine grained" enough to accommodate choices (e.g., mini-columns).

    • The number of GPi cells in the basal ganglia that inhibit (deinhibit) the thalamus may be fewer than the number of minicolumns in the cortex (whether the minicolumns are coding choices is arguable).

  • The basal ganglia are said to control the timing of action initiation.

  • Reinforcement learning in the basal ganglia is necessary to cope with delayed reward.

  • The hypotheses reconcile the role of reinforcement learning in the basal ganglia with prediction in the cortex.

Implementation

The outline of the implementation is described below.
See the GitHub page for the code and results.

Cerebral cortex part

Receives observation input and outputs action selection (prediction).

Output prediction layer: learns to predict an action from "observed input" using the action selected in the basal ganglia as the teacher signal.  A three-layer perceptron was used for implementation.

Output adjustment layer: calculates outputs using the outputs from the output prediction layer and noise as inputs.  The noise creates random output.  The contribution of noise decreases as the output prediction layer learns to make correct predictions (the task correctness rate was used as the probability of using output prediction).

Output gating layer: The largest of the output adjustment layer outputs was selected (winner-take-all) and gated with a Go/NoGo signal from the basal ganglia.

Thalamus part

In this implementation, the Go/NoGo from the basal ganglia was directly transmitted to the output gate layer in the cortex, and the implementation was omitted.

Basal ganglia part

Reinforcement learning of the Go/NoGo decision was performed using the observed input + output layer output (binarized by threshold) as states.

Overall Architecture

Figure 2 shows the implementation outline and the connections between parts.

Figure 2

Test task

A minimal delayed reward task was set up in which the reward was given a few steps after the cue-observed action selection.

Learning in the Basal Ganglia part 

Though TensorForce reinforcement learning algorithms (Vanilla Policy Gradient, Actor-Critic, and Proximal Policy Optimization) were tried, learning was not stable or did not occur.  Thus, a frequency-based learning algorithm was implemented, which resulted in learning convergence after about 1,000 episodes.

Learning in the Cortex part

A three-layer perceptron was used to learn the cortical output when the basal ganglia part produced a Go with the external environmental input.  The correct response rate for the task was used as the probability of using predictions as the output candidate for which the basal ganglia part makes a Go/NoGo decision (if predictions are not used, the output candidate is randomly determined).  Loss values (cross-entropy) converged around 0.25 after about 1000 episodes (the reason for this would be that randomness remains in the selection of actions in the setting.)

Implementation of working memory with attention mechanism

(main article)

Remembering items in a task (working memory) is necessary for cognitive tasks.  Since animals (including humans) can only memorize a limited number of items at a time, it is thought that attention is used to narrow down the number of items.  Here an implementation of working memory was made with a minimal form of attention mechanism using a simplified delayed match-to-sample task.

The implementation here is based on the following hypotheses about working memory.

  1. The decision of which input the working memory should retain is an action selected by the PFC (prefrontal cortex) (action that directs attention to a specific item).

  2. Which input the working memory should retain is selected by the PFC⇔Th loop (PFC-Th-BG loop), which is controlled by the BG as in other actions, and is then put into action.

  3. The selection of which working memory to keep is exclusive, as in other actions, and only one is executed at a time in the dlPFC (dorso-lateral PFC).

  4. Working memories are retained for a certain period of time by the activity of bi-stable neurons or other neurons in the PFC, and for longer retention, "reselection" by the PFC-Th-BG loop is required.

  5. The frontal cortex uses the information retained in working memory and the sensory information to predict and execute actions to solve the task.

While PBWM [2] is known as a model of working memory, it was not adopted here as it may not be so biologically plausible:

  • It states that the basal ganglia gate sensory input, but it is the cortical output to the thalamic loop that is gated by the basal ganglia.

  • It states that working memory is retained during NoGo, which is the default state for the basal ganglia, but if that is the case, working memory retention becomes the default.

Specific mechanism

The implementation consists of three modules: an input register, a register controller, and an action determiner.  The input register is the core part of working memory that does not learn. The other two modules learn.

See the GitHub page for details on the code and results.

Input register

Its input is an observation input and input holding signal (attention signal).  When an input holding signal is given, the specified portion of the observation input is held for a specified period of time, and the output is whether the specified portion of new observation input matches the held content (recognition).

The implementation is simply a "register" with a comparator added.  It corresponds to short-term memory in the prefrontal cortex.

Register controller

It determines the observed inputs (attributes) of the input register to be retained from the observed inputs and sends an attention signal to the corresponding input register (only once per episode).

The cortico-BG-thalamic loop implementation described in the previous section was used for learning.

Action determiner

It determines the action to be output to the external environment (only once per episode) from the reaffirmation signal.

The cortico-BG-thalamic loop implementation described in the previous section was used for learning.

Architecture

Figure 3

Learning

Learning based on rewards from the external environment took place at two locations: the action determiner and the register controller.

It was hoped that the two learnings would increase task success in a bootstrapping fashion, but since this did not work, curriculum learning was attempted.  Specifically, first, one learner was replaced with a "stub" that outputs the correct solution to train the other learner.  And then, the trained learner was used to train the former learner.  The reward per action and the percentage of correctly set working memories barely exceeded 0.5, whether the action determiner or the register controller was used as the stub first.

The number of episodes required for learning exceeded 10,000.

Discussion

Here, attempts to implement architectures that incorporate multiple learners that learn in a non-end-to-end fashion were reported.

Three implementations of an architecture encompassing multiple learners with BriCA were examined.  Currently a tool that wraps the cumbersome coding with BriCA is being prepared.

The current implementation of the hypothesis of working memory control by attention did not perform adequately.  Since more complex tasks require a larger search space, it would be better to think of a way in which learning occurs over a biologically plausible range of episodes.

The search space should be reduced to improve learning efficiency.  For diachronic multiple-action learning, a (biologically plausible) mechanism that memorizes a series of observations/actions that led to success and uses them as hypotheses to test similar cases should be devised.

A mechanism such as the one attempted here will be incorporated into a whole-brain architecture that follows the connections between regions of the whole brain (connectome), and the above issues must be resolved to guarantee its working.

References

[1] Koichi Takahashi, et al., (2015) A Generic Software Platform for Brain-inspired Cognitive Computing, Procedia Computer Science, Volume 71,
DOI: 10.1016/j.procs.2015.12.185

[2] O'Reilly, R., Frank, M. (2006) “Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia.” Neural Comput., DOI: 10.1162/089976606775093909

Saturday, February 12, 2022

Report on the Implementation of a Minimal Working Memory Agent

This article reports an attempt to implement working memory, a basic cognitive function essential for AGI, in a minimal form, taking biological plausibility into account.

The implementation is based on the following hypotheses:

  1. The choice of a part of the input to be retained in the working memory is driven by attention, which is an action carried out by the PFC (prefrontal cortex).

  2. The attending action is made by the PFC⇔Th loop controlled by BG (PFC-Th-BG loop) like any other actions.

  3. The attending action is exclusive as in any other action, and only one of them is executed in dlPFC at a time.

  4. Working memory is retained for a certain period of time with the activity of neurons such as bi-stable neurons in the PFC, and WM should be "reselected" by the PFC-Th-BG loop for longer retention.

  5. The PFC uses the information stored in the working memory and sensory information to predict and execute the action to cope with tasks.

The Task

The task used in the implementation is a Match-to-Sample task simplified as much as possible, whose input is a low-dimensional binary vector consisting of the following:

  • Choice of the attribute to be used for comparing the sample to the target (task switch)
    (a one-hot vector whose dimension is the number of attributes)

  • Attribute list (each attribute is represented as a two-dimensional binary vector.)

  • Control vector (sample presentation phase [1, 0], target presentation phase [1, 1], others [0, 0])

The task consists of five phases: sample presentation, pause (delay), target presentation, pause (delay), and reward presentation.  The task switch is given only during the sample presentation phase. In the target presentation phase, if the attributes of the sample and targets specified by the task switch match, the task accepts 2 ([0, 1]), and if they do not match, the task accepts 1 ([1, 0]) as the correct answer, respectively.  If the answer is correct, the task returns reward 1, otherwise reward 0 after a delay.
In executing the task, the following assumptions were made on the agent side:

  • No prior knowledge of the composition of the input vector is given.

  • It does not memorize all the attributes.

How it works

Input Register

Input:
    • observation input
    • attention signal: instructs the attribute (a part of the observation input) to be retained 

Output: whether the retained observation input attribute and the corresponding attribute of a new observation input match (recognition)

Function: When an attention signal is given, the specified portion of the observation input is retained for a certain period of time, where a new observation input comes in, it outputs whether the specified portion of the retained content matches the portion of the input.

Implementation: Simple register + comparator

Neural Correspondence: the short-term memory layer of the PFC

Register Controller

Input: observation input

Output: attention signal

Function: determines the attribute to be retained in the input, and sends an attention signal to the input register (once per episode).

Implementation: 👉 A Minimal Cortex-Basal Ganglia Architecture

Neural Correspondence: (dl) PFC-Th-BG loop

Action Determiner

Input: observation input + recognition signal (from the Register)

Output: action selection

Function: determines the action to be output from the input (once per episode).

Implementation:  👉 A Minimal Cortex-Basal Ganglia Architecture

Neural Correspondence: PFC-Th-BG loop

Learning

Reward-based learning took place in two locations, i.e., the action determiner and the register controller.  Both were implemented with the Minimal Cortex-Basal Ganglia Architecture.

Though it was expected that the two learnings would bootstrap the task successes, it did not work, and it was found that curriculum learning barely worked (see the Results section for details).

Architecture

Fig. 1

Implementation

Frameworks used

  • Cognitive Architecture Description Framework: BriCA BriCA (Brain-inspired Computing Architecture), a computational platform for brain-based software development, was used.
  • Environment description framework: OpenAI Gym OpenAI Gym is a widely used framework for agent learning environments.

Delayed reward learner

A Minimal Cortex-Basal Ganglia Architecture was used.  BriCA is also used in the learner.  It also uses PyTorch to model the cortex.  For delayed reward learning (the basal ganglia part), a frequency-based learning algorithm was used.

Results

Though the above implementation was used to train the task, the performance did not improve even after 400,000 episodes of trials.  So it was given up and curriculum learning was tried instead, where one learner was trained while another learner was replaced with a "stub" that output correctly, and subsequently the training data was used to train the agent without the stub (with the two learners).

When a stub is used for the Register Controller

Fig. 2 Training Action Determiner when a stub is used for Register Controller

Horizontal axis: episodes (x 100; 40,000 episodes learned)

avr. reward: average reward

reward per go: reward per action output


Fig. 3 Training the two learners with the learning data of Action Determiner
learned with a stub for Register Controller

Horizontal axis: episodes (x 100)

avr. reward: average reward

reward per go: reward per action output

go in sample phase: Register Controller output performed in the sample phase

correct wm: correctly set working memory

go in target phase: actions performed in the target phase


The reward per action output and the percentage of correctly set working memories barely exceeded 0.5.  Phase selection was relatively successful.

When a stub is used for Action Determiner
Fig. 4 Training Register Controller when a stub is used for Action Determiner

Horizontal axis: episodes (x 100)

avr. reward: average reward

reward per go: reward per action output

go in sample phase: Register Controller output performed in the sample phase

correct wm: correctly set working memory

go in target phase: actions performed in the target phase

The proportion of correctly set working memories did not approach to 1, because even if the working memory is not set correctly, it can be rewarded by the random selection of the Action Determiner.


Training the two learners with the learning data of Register Controller learned with a stub for Action Determiner

Horizontal axis: episodes (x 100)

avr. reward: average reward

reward per go: reward per action output

go in sample phase: Register Controller output performed in the sample phase

correct wm: correctly set working memory

go in target phase: actions performed in the target phase

The reward per action output and the percentage of correctly set working memories barely exceeded 0.5.  Meanwhile, the percentage of correctly set working memories was on a downward trend over 10,000 episodes.  Phase selection was relatively successful.

Discussions

Existing model

While PBWM is known as a model of working memory, it was not adopted because of the following "unnaturalness" on biological plausibility:

  • While it assumes that the basal ganglia (BG) gate sensory input, they are supposed to gate the cortical output-thalamus loop.
    cf. Benarroch, E. (2008) The midline and intralaminar thalamic nuclei Fig.2

  • While it assumes working memory is retained during NoGo, the default state for the BG, if this is the case, then working memory retention becomes the default state.

Meanwhile, it would be worthwhile to try the 1-2-AX working memory task used in the PBWM paper for comparison.

Learning Performance

The two choices, attentional selection of attributes to be remembered and action selection based on the working memory, could not be learned from scratch in this implementation.  The original expectation was that the correct attentional choice would provide more information for the correct action choice via the working memory, and thus the performance of the action choice would improve, and with it the learning of the correct attentional choice would improve in a "bootstrapping" way.  However, it was not observed.  While it was not proved that this strategy would not work in other implementations, since the search space is larger in more complex tasks, it would be better to consider other strategies that allow learning to take place in a biologically reasonable number of episodes.

Though the curriculum learning "barely" worked this time, the results were not satisfactory.  It took about 10,000 episodes for the stub-based learning to converge, which is also beyond the biologically reasonable range.  This point may be improved by changing parameters in the Minimal Cortex-Basal Ganglia Architecture.

Prospects

In order to improve learning efficiency, the search space should be smaller.  For diachronic learning of multiple actions, it would be worthwhile to devise a (biologically plausible) mechanism that memorizes series of observations/actions that lead to success and examines similar cases while regarding them as hypotheses.