Tuesday, October 3, 2023

AutoEncoder-based Predictor (implementation)

I have been 'playing around' with autoencoder implementations to realize 'a predictor,' as the principal function of the neocortex is supposed to be prediction. I tried a simple autoencoder and a sparse autoencoder from a cerenaut repository and a β-VAE implementation from a project repository of Princeton University (see the explanatory article). I chose the β-VAE, for I'll use it to model the association cortex, where the use of CNN may not be appropriate (the β-VAE does not use CNN but only Linear layers). (And the simple one may not be potent enough.)

I constructed a predictor with the encoder, decoder, and autoencoder factory from the repository with a single modification in the decoder setting. Namely, the predictor differs only with the decoder output setting; while the autoencoder predicts encoder input, the predictor predicts other input.

The implementation is found here: https://github.com/rondelion/AEPredictor

A test result with MNIST rotation (to predict rotated images) is shown below after 100 epochs of training:

Sunday, July 9, 2023

Basic Salience - Saccade model

I implemented a simple salience-saccade model. Visit the repository for details. The model can be used for any (active) vision-based agent building.

In 2021, I wrote about Visual Task and Architectures. The current implementation is about the where path, saliency map, and active vision (gaze control) in the post. As for the what path, I did a rudimentary implementation in 2021. I implemented a cortico-thalamo-BG control algorithm in 2022. I also worked on the match-to-sample task of a non-visual type this year (previous post).

While I might go for experiments on minimal visual word acquisition, I should add the what path (object recognition) to the current model in any case.

Monday, April 24, 2023

Solving a Delayed Match-to-Sample Task with Sequential Memory

Introduction

This report presents a solution and implementation of a delayed Match-to-Sample Task using episode sequences. (See my post for the importance of the M2S task in AGI research.)

A delayed Match-to-Sample task is a task to determine whether a presented (target) pattern is the same as another one (sample) presented previously in the session. In the case of a graphic-based task, either the shape or the color of the presented graphic can be used as the matching attribute. In this report, a cue (task switch) is presented before sample presentation to specify the matching attribute. Both the cue and the matching pattern are low-dimensional binary vectors for the sake of simplicity.

Working memory is required to solve a DM2S task. The agent needs to remember the cue (task switch), select a part of a pattern presented as the attribute of the sample according to the cue, remember the part, and compare it with the attribute of the target pattern presented later. Due to the need for working memory, it is assumed that simple reinforcement learning cannot solve the problem.

In this report, the agent memorizes the sequences appearing in all task episodes (for a long term) and solves the task by finding a past sequence that would lead to success in the current episode (memory for a short term). Implementation has shown that in the simplest setting, the agent can solve the task after experiencing several hundred episodes in most cases.

The Method

Sequence Memory

The agent memorizes the entire input-output sequences of episodes experienced. The memory has a tree structure with the root at the end of the episode. The tree branches according to inputs-outputs, and its nodes have information on the number of successes and the number of experiences.

Using the Sequence Memory

The agent remembers the input-output sequence in each episode and searches sequences in the ‘long-term’ sequence memory that matches the current sequence and leads to success. The sequence memory is indexed with the partial observation sequences as a key to allow the longest match. Among the sub-sequences matched by the index, the one with the highest value (success rate x number of successes) at the beginning is used (the reason for using the number of successes is to eliminate the ones that succeeded due to a fluke), and the action is decided by following the rest of the sub-sequences at the end of the sub-sequence. The sequence memory for each episode corresponds to the working memory, and its ‘long-term’ sequence memory corresponds to the policy in reinforcement learning.

Architecture

Fig.1 Architecture

The agent consists of the Gate, Episodic Memory, and Action Chooser.

Gate

Attention is paid to a part of the observation and gated observation (non-attended parts are masked) is output. It also outputs whether there has been a change in observation (obs. change).

Attention is determined by the salience of the environmental input and the attention signal from Episodic Memory; if there is a definite attention from Episodic Memory and the target of the attention is salient, the part is selected; otherwise, one of the salient parts is selected as the target of attention with equal probability. If there is no salient part in the observation (if it is a 0 vector), no attention is given and the attention output is a 0 vector.

Episodic Memory

It receives gated observation, attention, obs. change, and reward from the Gate, and outputs attention instruction to Gate and action instruction to Action Chooser.

At the end of each episode, Episodic Memory registers the input-output sequence of the episode in the sequence memory.

If a (sub-)sequence of the gated observation matches a success sequence in the memory, Episodic Memory determines outputs according to the rest of the sequence. Episodic Memory receives information about attentional and action choices made ('efferent copy') from Gate and Action Chooser respectively, to be recorded in the sequence memory.

For two steps immediately after a change in the observation (obs. change), Episodic Memory chooses only ‘attentions.’ This is to allow the agent to check the situation before outputting to the external environment (it also narrows the search space).

Action Chooser

It receives an action instruction (probability vector) from Episodic Memory, performs action selection, and passes the results to the environment and Episodic Memory.

Implementation and Experimental Results

Environment/Task

Phases

The task has the following phases:

{task switch presentation, blank, sample presentation, blank, target presentation, blank}

Input/Output

The output from the environment (observation) is a binary sequence consisting of {task switch, attribute sequence, control switch}.

The number of dimensions of an attribute sequence is the number of attributes x the attribute dimension. Each attribute is a one-hot vector having the attribute dimension.

A task switch is a one-hot vector with the attribute dimension that specifies the attribute to be extracted (for implementation convenience, attribute dimension > number of attributes).

The number of dimensions of the control switch is also a binary vector of the attribute dimension, with the first column being 1 in the sample presentation phase, the second column being 1 in the target presentation and response phase, and with columns being 0 otherwise. The output of the blank phases is a 0 vector.

Reward values are either 0 (failure) or 1 (success).

There are three types of inputs (actions) from the agent: {0, 1, 2}.

Success Conditions

The environment gives success only when the attribute specified in the task switch matches the sample and target and the input from the agent in the target presentation phase is 2, or when the attribute specified in the task switch does not match the sample and target and the input from the agent in the target presentation phase is 1.

Implementation

Python and Open AI Gym are used.

The agent implementation used Python and BriCA (Brain-inspired Computing Architecture), a platform for building cerebral agents, in which information is passed between modules at each time step in defined connections.

Experimental Setup

Length (steps) of the phases

Task switch presentation: 2, Blank: 1, Sample presentation: 2, Target presentation and response: 3

Number of attributes and attribute dimensions: 2 or 3, respectively

Perplexity of inputs and actions (size of the search space)

The number of different inputs and outputs that can appear is the number shown below, and all of these must be experienced in order to gain full knowledge. Since the environment is stochastic, there is no guarantee that a complete experience can be obtained in a finite number of trials.

When number of attributes: 2, attribute dimension: 2: 2 x (4 x 3) x (4 x 5) = 480

When number of attributes: 3 and attribute dimension: 3: 3 x (8 x 4) x (8 x 6) = 3,024

Solution: task switches x (attribute values x Attention destinations) x (attribute values x (attention destinations + action types))

Results

Fig. 2 Experimental results

Vertical axis: average reward, horizontal axis: episodes x 100

Blue line: number of attributes: 2, attribute dimension: 2;

Red line: number of attributes: 3, attribute dimension: 3

The learning curves differ according to the number of attributes and attribute dimension settings. In the setting with a minimum complexity (blue line – number of attributes: 2, attribute dimension: 2), the task is solved in a few hundred trials in most cases.

Comparison with Reinforcement Learning

It was examined whether the reinforcement learning agents (vpg, a2c, and ppo from TensorForce) learn the task. The results are shown in the graph below, and it appears that proper learning does not occur.

Fig. 3 Experimental results of reinforcement learning

Vertical axis: average reward; horizontal axis: episodes x 100

Discussions

Comparison with Reinforcement Learning

The proposed system can generally solve the task if it has enough experience to tell matched sequences are not of ‘fluke.’ With the perplexity of the task (see above), it is assumed that the problem is solved with a minimum number of trials.

While a reinforcement learner may also maintain a graph of the ‘Markov’ series leading to the reward (e.g., Bellman backup tree), the sub-series are not normally memorized and used for matching. In this implementation, the number of successes is also stored to avoid ‘fluke’ sequences, whereas only probability and reward evaluation values are stored in normal RL.

Related Research

[McCallum 1995] uses case trees for problem solving and refers to further works in the context of reinforcement learning.

My post in 2022 proposed a “model of fluid intelligence based on examining experienced sequences,” a mechanism that allows agents to discover the conditions of the sequences required by the task. In the real world, it is not possible to know in advance how far back from the reward the agent should remember, so the proposed strategy could be applied to start with a sequence near the reward and extend the policy sequence if it does not work.

I also reported in another post in 2022 on an attempt to solve a delayed Match-to-Sample task with a brain-inspired working memory architecture, which did not store sequences and learned to select attention and action independently; it could not identify overall successful sequences.

Biological Plausibility

While the current implementation is not biologically plausible in that it does not use artificial neural networks (or other neural mimicking mechanisms), its design was inspired by the information processing mechanisms of the brain.

Gate incorporates the mechanisms of attention and salience maps in the mammalian visual system. If attention is thought of as eye movements, it can also be understood as the mechanism of active vision.

In the brain, episodic memory is believed to be held in the hippocampus. If so, it is conceivable that episodic memory can be recalled from partial input-output sequences and used for action selection (see [Sasaki 2018][Dragoi 2011] for the discussion of hippocampal use of sequential memory).

In the current implementation, a single module (Episodic Memory) was used to manage the control of both attention and action; it might be better to implement modules separately because they differ in terms of timing (Gate runs before Episodic Memory while Action Chooser runs after).

Information Compression

In this implementation, the environmental input is a low-dimensional vector; even so, the number of cases becomes quite large if all of the input-output pattern sequences are to be searched (see above on perplexity). When dealing with real environments, it would be necessary to compress information with deep learning or other methods to reduce the search space. The pattern matching method implemented in this study is based on perfect (strict) matching; with analog data from real environments, the use of a more flexible matching method would be a must. For this purpose, it would also be desirable to use artificial neural networks.

In the current implementation, Episodic Memory stores the masked environmental input (gated observation) as it is; if recognition of the attended attribute and the choice of the attention is used for action, the attribute itself need not be remembered, and it will reduce the perplexity.

Future Directions

Future directions may include: validation with other intelligence test tasks (e.g., analogy tasks), search for more biologically plausible architectures, tasks using image (see the information compression section above), and search for "causal inference" capabilities such as those performed by human infants.

References

[McCallum 1995] McCallum, R.A., Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State, Proceedings of the Twelfth International Conference on Machine

Learning (1995) https://doi.org/10.1016/B978-1-55860-377-6.50055-4

[Sasaki 2018] Takuya Sasaki, et al.: Dentate network activity is necessary for spatial working memory by supporting CA3 sharp-wave ripple generation and prospective firing of CA3 neurons, Nature Neuroscience vol. 21 (2018) https://doi.org/10.1038/s41593-017-0061-5

[Dragoi 2011] George Dragoi and Susumu Tonegawa: Preplay of future place cell sequences by hippocampal cellular assemblies, Nature 469 (7330) (2011)

https://doi.org/10.1038%2Fnature09633

Thursday, January 26, 2023

Memo: AGI Research Map 2023-01

This memo gives an overview of the AGI research with Fig.1 "AGI Research Map 2023-01" shown below.

Fig. 1 AGI Research Map 2023-01

1. The Choice of Approaches

The upper left portion of the figure shows the approach choices; all choices except No are Yes.

If you don’t go after human cognitive functions, you’d obtain an AGI that is not (necessarily) human-like (e.g., ~~General Problem Solver~~ or AIXI).
Note: "Not-human-like” AGI may not efficiently process tasks that are efficiently processed by humans.

If you go after human cognitive functions, you have a choice whether to go after human modeling (i.e., cognitive science). If you don’t go after human modeling, you may go after functionally human-like (but structurally not human-like) AGI (this may be a rather engineering oriented approach). If you go after human modeling, you have a choice whether to mimic the brain. If you don’t go after mimicking the brain, then you would go after (cognitive) psychological modeling. You can go after mimicking the brain and still be on engineering (reverse engineering).

If you go after human cognitive functions, you would also go after embodiment (in 3D space) and implementing episodic memory.

2. Problem Solving

The upper right portion of the figure is a classification of problem-solving capabilities. There are two broad categories there: statistical problem solving and constraint satisfaction, both of which AGI should use.

In statistical problem solving, predictions and decisions are made based on statistics. Machine learning is a type of statistical problem solving.

Constraint satisfaction requires finding a solution that satisfies given conditions (constraints). Logic (deduction) and GOFAI generally belong to it. In constraint satisfaction, statistical information can be used for efficiency.

Mathematics is a deductive system, so its operation requires constraint satisfaction.

Statistics uses mathematics (but may not use deduction while in action).

Causal inference uses both statistics and constraint satisfaction.

While hypothesis generation (abduction) is constraint satisfaction in nature, statistical information helps hypotheses generation.
In mathematics, hypotheses are created to be proved by deduction.

Algorithm generation (programming) is a kind of constraint satisfaction and is a key element for self-improving superintelligence.

Human beings have all the problem solving capabilities mentioned here.

Scientific practice is a (social) activity in which all of the problem-solving capabilities are put to use.

3. Human-specific Cognitive Capabilities

The bottom center of the figure lists human-specific cognitive capabilities (i.e., non-human animals do not have them). If you go after human cognitive functions, you have to realize these capabilities.

Linguistic functions have been considered the hallmark of human intelligence (cf. Turing Test).
Certain social intelligence such as intention understanding and theory of mind is also considered to be unique to humans.
According to [Tomasello 2009], causal thinking is also unique to humans (humans always always ask about causes).
As human children grow, they also develop a concept of quantity that is not found in other animals (mathematical intuition).

With regard to language, the subfields of linguistics, i.e., syntax, semantics, and pragmatics, are listed (phonology is omitted). If you are for generative grammar, you would go for constructive semantics as well. Meanwhile, the semantics successfully used in machine learning is distributional semantics (and embedding). Since constructive semantics is necessary for precise interpretation of sentence meaning, these semantics would have to be integrated.

If you go after development, you would go after language acquisition as well, where a system acquires language by interacting with existing language speakers in the environment (as human infants do); it learns the meaning of linguistic expressions by inferring the intent of others to use language. If you don’t go after development, you might go after systems that learn from corpora (as current large-scale language models do).

4. Essential Elements and Development Priorities

All the capabilities listed in the problem-solving section are required for AGI.

Some human-specific cognitive capabilities are optional when you pursue not-necessarily-human-like AGI; for example, an AGI agent that communicates with humans in logical formulae may not need human social intelligence nor human language acquisition capabilities.

The arrows in the figure show the relationship between the use of functions. You would have to develop those which are used before those which use them. For example, the mathematical capability would require the implementation of a deductive engine beforehand.

5. Capability Settings and Testability

In designing an artifact, you have to specify its capabilities (functional specifications) in advance. While the settings of capabilities in AGI design must be specific enough for designing tasks to test them, the tasks must be "large enough" to cover functional generality. The trade-off between specificity and generality is subject to discussion with regard to the definition of generality in AGI research.

Reference

[Tomasello 2009] Tomasello, M.: The Cultural Origins of Human Cognition, Harvard University Press (2009).