Saturday, November 16, 2024

Implementation of Neural Sequence Memory

I was forgetting to report on the neural sequence memory implemented in May.

GitHub: https://github.com/rondelion/SequenceMemory

Animals (including human beings) can keep things in (short-term) memory for performing tasks.  The 'working memory' includes sequence memory; we can memorize sequences of events. Though it seems that it can be implemented with an associative memory that associates an input with the next input, it is not the case, because different input may follow the same input: e.g., A⇒B, A⇒C in ABAACCDAB…  Thus, a proper sequence memory must have ‘latent’ states to represent states in sequences.

The specifications of the implementation are as follows:

  • A latent state is represented as a one-hot vector, which guarantees the independence among the states.  The number of states corresponds to the number of events to be memorized.
  • Latent states have mutual associative links with the input.
  • Latent states have forward and backward associative links among themselves to represent a sequence.
  • It memorizes a sequence with a one-shot exposure by the instant reinforcement of the associative links (as in 'short-term potentiation' of synapses).
  • It can ‘replay’ a sequence with an input stimulus.
  • Latent states have decaying activation so that the least activated state can be ‘recycled.’

The idea here is similar to the competitive queuing model (see Bullock, 2003; Houghton, 1990; Burgess, 1999).

The figure below shows an input sequence (above) and remembered sequence (bottom):

Thursday, November 7, 2024

CBOW and Part-of-Speech Clustering

Word embeddings used (introduced) in Word2Vec are known to represent semantic clusters of words.  While its semantic aspect has been largely focused, as the distribution hypothesis, on which the word embedding is based, is 'syntactic' in the sense it is only concerned with the formal features (distribution) of words, the embedding should represent parts-of-speech (POS) as well.  So I made an experiment described below (perhaps, similar experiments may have been done elsewhere, but anyway).

  1. made a small set of CFG grammar (see below) and generated sample sentences.
  2. created embeddings with the continuous bag of words (CBOW) learning.
  3. clustered the embeddings to compare with the 'ground truth' (the word-POS correspondence in the grammar).

Set-up

Number of words (voc. size): 20
Number of POS: 7
Number of sentences: 500 (100 was too small)
CBOW learning and the embedding: set up a simple perceptron like predictor that predicts a word from the two adjacent words.  Weights to predict a word from the hidden layer (number of cells: 10) was used as embedding.

Clustering

The figure shows an Isomap clustering of embeddings.  Words are clustered according to their parts of speech.

I tried neural network based clustering methods.  As a sparse autoencoder did not work for this purpose, I tried a SOM-like method and got the following  (number of cells: 14, the same training data as for the CBOW training: 500 sentences/2099 words, one epoch).

Adv   7 [0.   0.   0.   0.   0.   0.   0.13 0.   0.   0.   0.   0.   0.   0.  ]
PN   10 [0.   0.   0.   0.   0.   0.   0.   0.   0.06 0.13 0.   0.   0.   0.  ]
IV    6 [0.   0.   0.   0.   0.   0.11 0.   0.   0.   0.   0.   0.   0.   0.  ]
Adj   2 [0.   0.09 0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.  ]
Det   1 [0.18 0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.  ]
TV    4 [0.   0.   0.   0.13 0.   0.   0.   0.   0.   0.   0.   0.   0.   0.  ]
N    13 [0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.06 0.12 0.  ]

It shows the correspondence between the cells and parts-of-speech (the second column represents the index of the most correlated cell).
Though clustering does not work always (it depends on the initialized weights), it is confirmed that CBOW embeddings generally represent parts-of-speech in this set-up.

SOM-like learning code:

min_index = np.argmin(((self.weights - feature) ** 2).sum(axis=1))
    for i in range(-2, 3):
        try:
            if i != 0:
                self.weights[min_index + i] += self.alpha * (feature - self.weights[min_index + i]) / abs(i)
        except:
            pass

Grammar

S : NP + VP
NP : Det + N1
N1 : Adj + N
N1 : N
NP : PN
VP : Adv + VP
VP : IV
VP : TV + NP
Det - le, un
N - n1, n2, n3
PN - Pn1, Pn2, Pn3
Adj - aj1, aj2, aj3
Adv - av1, av2, av3
IV - iv1, iv2, iv3
TV - tv1, tv2, tv3