How does a neural Turing machine work

Neural Turing machine - how does switching work?

Introduction to deep learning

$ \ begingroup $

I read the newspaper on Neural Turing Machine, page 9, and just got stuck on one annoying spot.

I can't understand how the layer is made:

Each head outputs a displacement weighting $ \ vec {s_t} $, which defines a normalized distribution over the permissible integer displacements (Authors mean that entries in s result in 1.). For example, if shifts between -1 and 1 are allowed, $ \ vec {s_t} $ contains three elements corresponding to the degree to which shifts of -1, 0, and 1 are performed.

If we index the $ N $ locations from 0 to $ N-1 $, the rotation applied by $ \ vec {s_t} $ to $ \ vec {w_t ^ g} $ can be expressed as the following circular convolution :

$$ w_ {new} (i) \ leftarrow \ sum_ {j = 0} ^ {N-1} w ^ g_t (j) s_t (i-j) $$

where all index arithmetic is calculated modulo $ N $.

The authors use the notation $ w_t (i) $ to denote the i-th element in the vector.

I can't understand how $ s_t $ is applied to $ w ^ g_t $ and what makes it possible to actually do a shift, for example by +1


Suppose we are working on the ith element, which is at index 3, and $ w ^ g_t $ has 5 entries.

As in the publication, make $ s $ shift -1, 0, or 1. So $ s $ has three dimensions. Now I want to move each element in $ w $ forward by +1.

I feel like I'm going to be out of line and modulo isn't going to help all that much. We can unfold this calculation for the 3rd element:

simplified this means:

As you can see I have s & lsqb; 3 & rsqb; and at the end s & lsqb; 4 & rsqb ;. ... but we just agreed s will only have 3 entries {-1, 0, 1} and this means an error out of bounds.

And I can't really understand how what we did would "move" the entire $ w ^ g $ to $ w_ {new} $ once we do it for all 5 entries (we only have the above for the 3rd done entry). Can someone give me the intuition for it too?

$ \ endgroup $ 2
  • $ \ begingroup $ Your problem is not specifically about this paper. So read en.wikipedia.org/wiki/Circular_convolution or a DSP textbook. $ \ endgroup $
  • $ \ begingroup $ My concern is mainly the part W (j) S (i-j), where it seems like we are crossing boundaries when W is 10-dimensional and S is only 3-dimensional. For example, if $ i = 0 $ and $ j = 1 $, we have to walk around modulo 10 but end up with an index that is much larger than the dimension of S $ \ endgroup $

$ \ begingroup $

Ok, it makes sense if we don't think about the formula

Take a look at the bottom half of the following slide, taken from the Neural Turing Machine page presentation by Kiho Suh. 19th June 2017

$ \ endgroup $