## Categories

11/16/21, by artyom ; Posted in: Internals; 0 comments

Lots of deep learning operations can be implemented as simple element-by-element operations over different tensors with numpy broadcasting and reduction afterwards. For example:

Adding Bias `[C]` to `[B,C,H,W]` image is can be seen in numpy as:

`````` x + bias.reshape((C,1,1))
``````

Gradient of bias can be calculated as:

`````` np.sum(dy,dims=(0,2,3))
``````

That is simple reduction operations. Calculation of mean and variance in batch normalisation requires calculation of `x` and `x*x` over all dims but `C`.

Observing this I implemented a `broadcast`/`reduce` templates API to simplify development. http://dlprimitives.org/docs/pointwise_8hpp_source.html

The idea is following:

• You provide input tensors and scalar parameters
• You define the operation need to performed on each operand
• You provide reduction operation

The OpenCL kernel code is auto-generated for you. For example calculations of x and x*x sums over all dims but channels would look like:

``````    auto op = dlprim::core::PointwiseOperationBroadcastReduce::create(
ctx,
{X.specs()},{Xsum.specs(),X2sum.specs()},
0,dlprim::float_data,
"y0=x0; y1=x0*x0;", // operations
"reduce_y0 = 0; reduce_y1 = 0", // reduce init
"reduce_y0 += y0; reduce_y1 += y1"
);
op->enqueue({X},{Xsum,X2sum},s,{},{1,1},{0,0},q);
``````

So - 1st output is just x - sum and second is `x*x` - sum. So if you provide X in shape of `[B,C,H,W]` and Xsum, X2sum in shape `[C,1,1]` that is broadcast-able to X you'll get the sums you need without writing custom reduction code of manually writing kernels.

This vastly simplified writing multiple operators especially ones that are expected to support numpy style broadcasting in pytorch.