I’m not very good at math, numpy and, while i am an experienced coder, i am still a beginner at python.

I have this function that take a small np.array (< 100 elements),

called 100000+ times.

It take 23~25% total cpu time using numpy (whole program profiling in pycharm) :

```
def call(self, _input):
output = np.array([_input[i] * self.weights[i] for i in range(len(self.weights))]) # <- 4.5% on <listcomp>
output += self.bias
output = np.array([np.sum(output[:, i]) for i in range(len(self.weights[0]))]) # <- slowest (11% total, 9.8% on sum)
output = np.array([self.activation(val) for val in output]) # <- 2.9% on <listcomp>, it's fine.
self.output = output
return output
```

It take 17~19% cpu time without numpy (sum take comparatively more cpu than np.sum but it’s still a win) :

```
def call(self, _input):
output = [_input[i] * self.weights[i] for i in range(len(self.weights))]
output += self.bias
output = [sum(output[:, i]) for i in range(len(self.weights[0]))]
output = [self.activation(val) for val in output]
self.output = output
return output
```

The problem with my numpy version is, probably, that i take a np.array, make it a list comprehension, make a np.array again, then take this np.array, make it a list comprehension again, then a np.array again, and so on. And converting back an forth on such a tiny array isn’t worth it.

Can the numpy version be optimized ? The code is messed up. There might be some kind of specialized numpy map/reduce function somewhere specialized for this.

Source: Python Questions