Abstract: Systolic arrays of processing elements are widely used to massively parallelise neural network layers. However, the execution of traditional convolutional and fully-connected layers on such ...