Adversarial Attacks on Neural Networks – Bug or Feature?

Adversarial Attacks on Neural Networks – Bug or Feature?

Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. This will be a little non-traditional video
where the first half of the episode will be about a paper, and the second part will be
about…something else. Also a paper. Well, kind of. You’ll see. We’ve seen in the previous years that neural
network-based learning methods are amazing at image classification, which means that
after training on a few thousand training examples, they can look at a new, previously
unseen image and tell us whether it depicts a frog or a bus. Earlier we have shown that we can fool neural
networks by adding carefully crafted noise to an image, which we often refer to as an
adversarial attack on a neural network. If done well, this noise is barely perceptible
and, get this, can fool the classifier into looking at a bus and thinking that it is an
ostrich. These attacks typically require modifying
a large portion of the input image, so when talking about a later paper, we were thinking,
what could be the lowest number of pixel changes that we have to perform to fool a neural network? What is the magic number? Based on the results of previous research
works, an educated guess would be somewhere around a hundred pixels. A followup paper gave us an unbelievable answer
by demonstrating the one pixel attack. You see here that by changing only one pixel
in an image that depicts a horse, the AI will be 99.9% sure that we are seeing a frog. A ship can also be disguised as a car, or,
amusingly, with a properly executed one-pixel attack, almost anything can be seen as an
airplane by the neural network. And, this new paper discusses whether we should
look at these adversarial examples as bugs or not, and of course, does a lot more than
that! It argues that most datasets contain features
that are predictive, meaning that they provide help for a classifier to find cats, but also
non-robust, which means that they provide a rather brittle understanding that falls
apart in the presence adversarial changes. We are also shown how to find and eliminate
these non-robust features from already existing datasets and that we can build much more robust
classifier neural networks as a result. This is a truly excellent paper that sparked
quite a bit of discussion. And here comes the second part of the video
with the something else. An interesting new article was published within
the Distill journal, a journal where you can expect clearly worded papers with beautiful
and interactive visualizations. But this is no ordinary article, this is a
so-called discussion article where a number of researchers were asked to write comments
on this paper and create interesting back and forth discussions with the original authors. Now, make no mistake, the paper we’ve talked
about was peer-reviewed, which means that independent experts have spent time scrutinizing
the validity of the results, so this new discussion article was meant to add to it by getting
others to replicate the results and clear up potential misunderstandings. Through publishing six of these mini-discussions,
each of which were addressed by the original authors, they were able to clarify the main
takeaways of the paper, and even added a section of non-claims as well. For instance, it’s been clarified that they
don’t claim that adversarial examples arise from software bugs. A huge thanks to the Distill journal and all
the authors who participated in this discussion, and Ferenc Huszár, who suggested the idea
of the discussion article to the journal. I’d love to see more of this, and if you
do too, make sure to leave a comment so we can show them that these endeavors to raise
the replicability and clarity of research works are indeed welcome. Make sure to click the link to both works
in the video description, and spend a little quality time with them. You’ll be glad you did. I think this was a more complex than average
paper to talk about, however, as you have noticed, the usual visual fireworks were not
there. As a result, I expect this to get significantly
fewer views. That’s not a great business model, but no
matter, I made this channel so I can share with you all these important lessons that
I learned during my journey. This has been a true privilege and I am thrilled
that I am still able to talk about all these amazing papers without worrying too much whether
any of these videos will go viral or not. Videos like this one are only possible because
of your support on If you feel like chipping in, just click the
Patreon link in the video description. This is why every video ends with, you know
what’s coming… Thanks for watching and for your generous
support, and I’ll see you next time!

Author: Kevin Mason

100 thoughts on “Adversarial Attacks on Neural Networks – Bug or Feature?

  1. Hmm, perhaps some research comparing "n-pixel robustness" with "(n-1) pixel robustness" could shed some light on what is actually going on? Also things like "minimum neural depth for given number of inputs" to achieve 99% accuracy, etc. We need a meta-study of neural networks.

  2. Always tought you are saying in the intro: Dear Fellow Scholars, this is Tow Minute Papers with "name" *here*. But it is … this is Tow Minute Papers with Károly Zsolnai-Fehér! thx to commentary

  3. Every time I see this, I have my fun analogy: my key has got a very small scratch. It won't work with my house anymore, but worked on someone else's car instead!

    What happens if someone else's key with small scratches actually unlocks my house?! We should have the unlocking system fixed!

  4. This problem may be fixed by varied size of pixel in an image. Arranging say 3 by 3 pixels into 1 pixel for entire image can help neural network to classify correctly. Or 4 by 4 pixels into 1 pixel. Usually things which want to classify in an image are bigger than 8 by 8 pixels.
    Multiple training sets will have to created, original image, image where each pixel is 3 by 3 of original and another image where each pixel is say 5 by 5 of original.

  5. Honestly, I can see why they were classified as ostritches. I saw the ostrich in the bus picture all the way in the left column at 1:09

  6. This one is very interesting! Could they have naively corrupted the dataset with salt and pepper to address that weakness? That'd probably be more inefficient on training resources and only move the goal post slightly

  7. I prefer interesting conceptual videos like these over "visual fireworks" videos and I'd be very happy if the channel shifted its balance a bit more in this direction… anyone else agree?

  8. If the adversarial features arise from the dataset and can be eliminated after being found, wouldn't it also be possible to do the reverse and poison a dataset with a sort of backdoor?

  9. I wonder if we could reduce the chance of a network getting tricked by these types of attack by adding ur own white noise on top of the image before feeding it into the network. I guess that might also reduce the overall accuracy of the network in some case

  10. The question is: Who uploads noisy cat videos to YouTube to trick the algorithm into recommending me a strange documentary about the history of toilets every few months?

  11. Content-wise: I love the mix you bring. Sometimes icecream for the eyes, sometimes icecream for the mind. I think it's also important to cover AI security, ethics, implications for society. My absolute favorite videos though are when you cover projects where I can download the Python code and put my graphics card to work 😁

  12. If you think about it, a big part of human cognition is those exact non-robust features. All our cognitive and memory biases and a good chunk of our behavior are basically quick hacks our brains have that get in the way of properly abstract reasoning.

  13. Aren't all neural networks technically bugs?

    Bug: An error, flaw, failure or fault in a computer program or system that causes it to produce an incorrect or unexpected result , or to behave in unintended ways.

  14. but if you have two independent networks that are trained to classify images would they fall for the same wrong pixel or would you need to fool them independently? If so can you come up with a noise pattern that fools both networks?

  15. Karoly please keep making videos that interest you and your viewers – I don’t care if it’s lacking the visual “fireworks”, this topic is important

  16. I'm sure most social networks have an aggressive NSFW filter, that provide fast feedback. It would be fun to see if it could be cheated using these methods.

  17. Just add a new kernel that decides which pixel will be chosen for pooling instead of pooling directly. The CNN before was not designed to prevent this trick, if they want they can easily came up with some mechanism to deny this attack…

  18. This idea the paper has about creating mini discussions is crazy awesome! I need to look more into it but it could solve a lot of replication issues

  19. 1.sooo could this be used in a similar way to capcha? (stopping advanced bots from spamming and stuff)
    2.what about an AI with the goal to fool another generic image recognition AI while making the less changes possible?

  20. Karol, how these noise patterns perform if the image is greyscale and pre-processed to make better contrast between lines and surfaces?

    I noticed, that in all these examples, neural networks work on color images. But human perception has a split between color and shape.

  21. You don't need "visual fireworks" to get me in here every time again. You do splendid work nonetheless. Keep it up! You are an intriguing source for new insights.

  22. Have you ever considered when you sell Machine Intelligence, not to show studies of warfare?

    It's good to see that for ppl that think, the obvious thrust of machine intelligence is for warfare.

    You don't need to add robustness to detect the correct things in a world that's not at war.

    Just like Boston Dynamics engineers wouldn't be trying to kick their robots over if the robots were for peaceful non combat purposes.

    Thanks for murdering me in the future. I thank you now because as Musk said, when the machines come down the street killing everyone it will be too late.

    And yes I know the difference between robotics that progress at linear rates and machine intelligence that progresses at double exponential.

  23. Just came across this channel with this video. Instantly subscribed, going to industry z one doesn't have that much time to read that many papers anymore. Please keep up the good work!

  24. I shouldn't be drunk-commenting, prolly gonna regret this but… the passion, the sheer relentlessness with which this guy engages every single facet of the discipline… brings a tear to me eye. I'll shut up now. Don't do ethanol kids. Thanks Károly.

  25. Sometimes a paper is not the best way to pass forth our knowledge, the structure is very important, its pretty bad to create something good and don't have visualizations , or to create something not that great and be famous, most papers with machine learning should have a link to github or something like that for example.

  26. Wasnt there a paper about how adversarial neural networks encode information in the noise so that they could cheat? Something about satellite images to maps? Because it looks like that got modified in the noise attack.

  27. I'm a high school student researcher affiliated with the lab that published the paper. One would find the "visual firepower" (i.e., applications of robust models) you mentioned in a related work also published by Mądry Lab: (Santurkar et al. 2019). Demos at

  28. While i understand how it works and… still feels amazing… the point we reached with ai.. and how easy it is to manipulate…

  29. I would love to see more journals with a discussions sections where other experts can publically discuss research.
    There are so many unreplicatable studies that make it in to peer reviewed journals that deserve to be scrutinized publically as flawed research papers waste other researchers time when they try to use said research!

  30. Google's reCAPTCHA apparently sometimes uses adversarial attacks on their images of cars, traffic lights etc. I noticed some very artificial looking noise on some of the images.

  31. Your name is… next to IMPOSSIBLE… for native English speakers. I have no idea whats even happening there. If I didn't see it written, I would assume its "Kara Jonah Iffa". You need to simplify that name, man. Simplify.

  32. Holy shit ive seen that at 2:44! There’s a great website called Explorable Explanations that may be of interest to you

  33. Dude, you would’ce got way more views on this if you had made the title something like “One Weird Pixel Makes This AI Think Everything is an Ostrich”

  34. What is you have two independently trained classifiers (identical except for their initial state before training)? How hard would it be to fool both with the same alteration?

  35. That is interesting and shows how important a proper set is since the algorithm will go with whatever is the most consistent even if that thing has nothing to do with the actual material.

Leave a Reply

Your email address will not be published. Required fields are marked *