Dear Fellow Scholars, this is Two Minute Papers

with Károly Zsolnai-Fehér. When we are talking about deep learning, we

are talking about neural networks that have tens, sometimes hundreds of layers, and hundreds

of neurons within these layers. This is an enormous number of parameters to

train, and clearly, there should be some redundancy, some duplication in the information within. This paper is trying to throw out many of

these neurons of the network without affecting its accuracy too much. This process we shall call pruning and it

helps creating neural networks that are faster and smaller. The accuracy term I used typically means a

score on a classification task, in other words, how good this learning algorithm is in telling

what an image or video depicts. This particular technique is specialized for

pruning Convolutional Neural Networks, where the neurons are endowed with a small receptive

field and are better suited for images. These neurons are also commonly referred to

as filters. So here, we have to provide a good mathematical

definition of a proper pruning. The authors proposed a definition where we

can specify a maximum accuracy drop that we deem to be acceptable, which will be denoted

with the letter “b” in a moment. And the goal is to prune as many filters as

we can, without going over the specified accuracy loss budget. The pruning process is controlled by an accuracy

and efficiency term, and the goal is to have some sort of balance between the two. To get a more visual understanding of what

is happening, here, the filters you see outlined with the red border are kept by the algorithm,

and the rest are discarded. As you can see, the algorithm is not as trivial

as many previous approaches that just prune away filters with weaker responses. Here you see the table with the b numbers. Initial tests reveal that around a quarter

of the filters can be pruned with an accuracy loss of 0.3%, and with a higher b, we can

prune more than 75% of the filters with a loss of around 3%. This is incredible. Image segmentation tasks are about finding

the regions that different objects inhabit. Interestingly, when trying the pruning for

this task, it not only introduces a minimal loss of accuracy, in some cases, the pruned

version of the neural network performs even better. How cool is that! And of course, the best part is that we can

choose a tradeoff that is appropriate for our application. For instance, if we are we looking for a light

cleanup, we can use the first option at a minimal penalty, or, if we wish to have a

tiny tiny neural network that can run on a mobile device, we can look for the more heavy-handed

approach by sacrificing just a tiny bit more accuracy. And, we have everything in between. There is plenty more validation for the method

in the paper, make sure to have a look! It is really great to see that new research

works make neural networks not only more powerful over time, but there are efforts in making

them smaller and more efficient at the same time. Great news indeed. Thanks for watching and for your generous

support, and I’ll see you next time!

A great step towards more accessible neural networks

Coming soon: Dynamic growing and pruning neural networks! π

this is going to my ai resources archive , next to evolutionary training of sparse neural networks

Great video as always! Just an English tip: The word accuracy has the strong syllable at the first "a". Accuracy, rather than aCCUracy π

I have known this one or similar from before the 2mn paper era of deep learning, that's how those mobile deep learning derived apps operate.

Good to know that as much as 79% pruning leads to a loss of only 3 % in terms of accuracy. That would save on training effort as well.

You pronounced "accuracy" correct the last time you said it. Your neural networks are remarkable!

It seems there are comparable , but earlier works that the paper failed to make mention of

I wonder if you could make this fast enough to eventually run each frame of a video game through something like NeuralStyleTransfer.

Here is a nice blog on the topic, https://jacobgil.github.io/deeplearning/pruning-deep-learning. I have also played around with the implementation on pyCaffe. If anyone is interested I can point out a couple of github repos, including mine

Great vid as always, but i couldn't help to notice that this channel is becoming "2 minute neural net news"

That's pretty impressive. I wonder if additional training after the pruning step would get some of the lost accuracy back.

Also check out dense-sparse-dense approach π

Not your typical "hey guys" intro

But can you train bigger neuronetwork and get better accuracy for fixed size ?

There is always exciting work in machine learning being published, but im still waiting for the "alpha zero" or "alpha go" of 2018! Cant wait to see what it is

Really cool, man, thums up, keep the papers coming! Pruning rocks, I wjsh to try it asap.

Great video. Amazing paper.

I have a serious question though: How can you keep up with this pace of evolvement and not going crazy?

Do you have an LTC wallet address?

Now this is really exciting. Optimization just makes it easier to run on older Tech, for starters.

They should call the pruned networks Bonsai Networks!

See "Pyramid Vector Quantization for Deep Learning":

https://arxiv.org/abs/1704.02681

when Pyramid Vector quantization (PVQ) is applied to neural network:

1) Weights are reduced to small integers with a Laplacian distribution

2) Dot product (or tensor product) can be calculated with additions only

3) For particular non-linearities such as ReLu, all the inference is reduced to integer arithmetic with no multiplications

4) The weights are now highly compressible (~1 bit/weight with arithmetic encoding)

5) Loss of accuracy is, in most cases a few %

Is this basically a really advanced way of optimizing smaller neural networks?

The way I see it, you start with a big neural network, train that, then get rid of nodes to make a smaller network – resulting in a configuration that wouldn't normally be obtainable by gradient descent had we only used the small network to begin with. Am I understanding this right?

It's insane how cutting away 80% of the nodes may result in a

betterperforming network. It really makes you think whether we're designing these networks inefficiently from the very beginning."Acurioacy" xD

New paper called "Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks" has been released. More improvement, basically and in some cases, more changce that 40% pruned flops actually cause slight improvements.

Can someone please explain it to me that how is it different from Dropout??