# How does a neural Turing machine work

## Neural Turing machine - how does switching work?

### Introduction to deep learning

\$ \ begingroup \$

I read the newspaper on Neural Turing Machine, page 9, and just got stuck on one annoying spot.

I can't understand how the layer is made:

Each head outputs a displacement weighting \$ \ vec {s_t} \$, which defines a normalized distribution over the permissible integer displacements (Authors mean that entries in s result in 1.). For example, if shifts between -1 and 1 are allowed, \$ \ vec {s_t} \$ contains three elements corresponding to the degree to which shifts of -1, 0, and 1 are performed.

If we index the \$ N \$ locations from 0 to \$ N-1 \$, the rotation applied by \$ \ vec {s_t} \$ to \$ \ vec {w_t ^ g} \$ can be expressed as the following circular convolution :

\$\$ w_ {new} (i) \ leftarrow \ sum_ {j = 0} ^ {N-1} w ^ g_t (j) s_t (i-j) \$\$

where all index arithmetic is calculated modulo \$ N \$.

The authors use the notation \$ w_t (i) \$ to denote the i-th element in the vector.

I can't understand how \$ s_t \$ is applied to \$ w ^ g_t \$ and what makes it possible to actually do a shift, for example by +1

Suppose we are working on the ith element, which is at index 3, and \$ w ^ g_t \$ has 5 entries.

As in the publication, make \$ s \$ shift -1, 0, or 1. So \$ s \$ has three dimensions. Now I want to move each element in \$ w \$ forward by +1.

I feel like I'm going to be out of line and modulo isn't going to help all that much. We can unfold this calculation for the 3rd element:

simplified this means:

As you can see I have s & lsqb; 3 & rsqb; and at the end s & lsqb; 4 & rsqb ;. ... but we just agreed s will only have 3 entries {-1, 0, 1} and this means an error out of bounds.

And I can't really understand how what we did would "move" the entire \$ w ^ g \$ to \$ w_ {new} \$ once we do it for all 5 entries (we only have the above for the 3rd done entry). Can someone give me the intuition for it too?

\$ \ endgroup \$ 2
• \$ \ begingroup \$ Your problem is not specifically about this paper. So read en.wikipedia.org/wiki/Circular_convolution or a DSP textbook. \$ \ endgroup \$
• \$ \ begingroup \$ My concern is mainly the part W (j) S (i-j), where it seems like we are crossing boundaries when W is 10-dimensional and S is only 3-dimensional. For example, if \$ i = 0 \$ and \$ j = 1 \$, we have to walk around modulo 10 but end up with an index that is much larger than the dimension of S \$ \ endgroup \$

\$ \ begingroup \$

Ok, it makes sense if we don't think about the formula

Take a look at the bottom half of the following slide, taken from the Neural Turing Machine page presentation by Kiho Suh. 19th June 2017

\$ \ endgroup \$