Pruning Makes Faster and Smaller Neural Networks | Two Minute Papers #229

Pruning Makes Faster and Smaller Neural Networks | Two Minute Papers #229

Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. When we are talking about deep learning, we
are talking about neural networks that have tens, sometimes hundreds of layers, and hundreds
of neurons within these layers. This is an enormous number of parameters to
train, and clearly, there should be some redundancy, some duplication in the information within. This paper is trying to throw out many of
these neurons of the network without affecting its accuracy too much. This process we shall call pruning and it
helps creating neural networks that are faster and smaller. The accuracy term I used typically means a
score on a classification task, in other words, how good this learning algorithm is in telling
what an image or video depicts. This particular technique is specialized for
pruning Convolutional Neural Networks, where the neurons are endowed with a small receptive
field and are better suited for images. These neurons are also commonly referred to
as filters. So here, we have to provide a good mathematical
definition of a proper pruning. The authors proposed a definition where we
can specify a maximum accuracy drop that we deem to be acceptable, which will be denoted
with the letter “b” in a moment. And the goal is to prune as many filters as
we can, without going over the specified accuracy loss budget. The pruning process is controlled by an accuracy
and efficiency term, and the goal is to have some sort of balance between the two. To get a more visual understanding of what
is happening, here, the filters you see outlined with the red border are kept by the algorithm,
and the rest are discarded. As you can see, the algorithm is not as trivial
as many previous approaches that just prune away filters with weaker responses. Here you see the table with the b numbers. Initial tests reveal that around a quarter
of the filters can be pruned with an accuracy loss of 0.3%, and with a higher b, we can
prune more than 75% of the filters with a loss of around 3%. This is incredible. Image segmentation tasks are about finding
the regions that different objects inhabit. Interestingly, when trying the pruning for
this task, it not only introduces a minimal loss of accuracy, in some cases, the pruned
version of the neural network performs even better. How cool is that! And of course, the best part is that we can
choose a tradeoff that is appropriate for our application. For instance, if we are we looking for a light
cleanup, we can use the first option at a minimal penalty, or, if we wish to have a
tiny tiny neural network that can run on a mobile device, we can look for the more heavy-handed
approach by sacrificing just a tiny bit more accuracy. And, we have everything in between. There is plenty more validation for the method
in the paper, make sure to have a look! It is really great to see that new research
works make neural networks not only more powerful over time, but there are efforts in making
them smaller and more efficient at the same time. Great news indeed. Thanks for watching and for your generous
support, and I’ll see you next time!

Author: Kevin Mason

27 thoughts on “Pruning Makes Faster and Smaller Neural Networks | Two Minute Papers #229

  1. Great video as always! Just an English tip: The word accuracy has the strong syllable at the first "a". Accuracy, rather than aCCUracy πŸ™‚

  2. I have known this one or similar from before the 2mn paper era of deep learning, that's how those mobile deep learning derived apps operate.

  3. Good to know that as much as 79% pruning leads to a loss of only 3 % in terms of accuracy. That would save on training effort as well.

  4. I wonder if you could make this fast enough to eventually run each frame of a video game through something like NeuralStyleTransfer.

  5. Here is a nice blog on the topic, I have also played around with the implementation on pyCaffe. If anyone is interested I can point out a couple of github repos, including mine

  6. That's pretty impressive. I wonder if additional training after the pruning step would get some of the lost accuracy back.

  7. There is always exciting work in machine learning being published, but im still waiting for the "alpha zero" or "alpha go" of 2018! Cant wait to see what it is

  8. Great video. Amazing paper.
    I have a serious question though: How can you keep up with this pace of evolvement and not going crazy?

  9. See "Pyramid Vector Quantization for Deep Learning":
    when Pyramid Vector quantization (PVQ) is applied to neural network:
    1) Weights are reduced to small integers with a Laplacian distribution
    2) Dot product (or tensor product) can be calculated with additions only
    3) For particular non-linearities such as ReLu, all the inference is reduced to integer arithmetic with no multiplications
    4) The weights are now highly compressible (~1 bit/weight with arithmetic encoding)
    5) Loss of accuracy is, in most cases a few %

  10. Is this basically a really advanced way of optimizing smaller neural networks?
    The way I see it, you start with a big neural network, train that, then get rid of nodes to make a smaller network – resulting in a configuration that wouldn't normally be obtainable by gradient descent had we only used the small network to begin with. Am I understanding this right?

  11. It's insane how cutting away 80% of the nodes may result in a better performing network. It really makes you think whether we're designing these networks inefficiently from the very beginning.

  12. New paper called "Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks" has been released. More improvement, basically and in some cases, more changce that 40% pruned flops actually cause slight improvements.

Leave a Reply

Your email address will not be published. Required fields are marked *