Neural Network that Changes Everything – Computerphile

Neural Network that Changes Everything – Computerphile


This is kind-of a follow up to Brais’ videos on deep learning So deep learning is kind of a big thing at the moment and there’s some disagreement between research over whether this is gonna be – the, this is *it* This is the big thing that’s gonna change everything or whether this is another flash in the pan, like artificial neural networks were in the 80s Everyone got very excited and they got quite a good results and when they realized that they couldn’t solve all the problems with them, I don’t know For what it’s worth, these are a big deal, I think [offscreen] Let’s talk about convoluted neural networks, have I said that right? Convolutional neural networks. [offscreen] Ah, right, ok They combine both deep neural networks, which is what Brais was talking about and kernel convolutions, which is what I talked about in a previous video. I would thoroughly recommend people watch that video You know, it’s got an entertaining host, right? *laugh from offscreen* So, but, but, because, if you don’t know what a kernel convolution is, this isn’t gonna make much sense to you So watch that video first [offscreen] So that’s the kernel convolutions we did on graphics and things like, uh, sobel op- Yeah, Sobel operations, Gaussian blurs, and things like this. Sobel operations in particular, and edge detection So, if we think back to a traditional artificial neural network ok, what we’ve got is we’ve got some kind of input we’re trying to learn, ok we’ve got some hidden layers, alright, and then we’ve got some output layers maybe just this one, I don’t know And these are fully connected, so we have lots of connections from here and here and here and these are connected to here and so on, I’m just drawing in a few of them, and then these are all connected to the end. ok, Now using Brais’s analogy, we were talking about house prices. ok, so, this will be something like number of bedrooms, and this would be something like “has it got a pool” and this would be, you know, what floor space is it and has it got a good garden and so on lots of these, ok, lots of inner nodes that we don’t really care about particularly, or I don’t Uh, right, and so on and so forth, and then finally at the end we have a house price. Now what this house price is, is a complicated function of these inputs. It’s complicated because this node here is some linear, or non-linear now, combination of these. ok, so, a bit of this plus a bit of this, plus a bit of this, plus a bit of this through some non-linearity function. This is a different combination of these. This, again, different combination of these, and so on, right And then this, a different combination of these So, you can see, you’re building up some kind of level of abstraction here where you’ve got combinations of combinations And that function is very complicated When Brais talked about a black box, in some ways that’s exactly what it is because we can’t look at these individual weights and say, “well that’s got .2 of this one, so that must mean this” because we just don’t know what it means, right In the grand scheme of things, in this whole network, we don’t know what that individual weight means And to be honest, we might not even care that much What we really care about is how well does it predict house price, how accurate is that based on that for a different input, so we change this, read the output – is it good? Yes? Brilliant! ok, now, for images, which is obviously what I spend most of my time around, this is a start, but it’s not very useful to me. If you think that this is our inputs, ok, and I give you a picture of a house, and I say “right, tell me how much this house is worth” ok, well, what? So how do I, ok, so there’s two things I could do, right first of all is I could try and calculate things like number of bedrooms and stuff, based on the image and put them in here In some way, I’d be calculating some features and then I’d be putting them in here and learning on those features. That is quite a smart way of doing it, because, apart from that’s obviously quite difficult um, it’s smart because we don’t have to have that many more neurons In anything, we can actually use the same network as we used before for our model on our house, all we have to do is work out the bit of code that does the image analysis Now, anyone that’s tried to find out the number of rooms in a house based only on one picture of the outside of the house will tell me that that can’t be done, right That’s hard. Ok, so you could naively think, what we could do instead is just put the image in here. Make this the first pixel, and this the second pixel, and this the third pixel, and so on, ok. Then, this has got all the information it could ever need, right But it, but that’s the problem 7 megapixel image, that’s 7 million input nodes, let’s say we have 7 million nodes on the next layer each one connects to each other one, you can see that that’s just gonna melt my computer it’s not even gonna try and create it, it’s too much information That’s why we downsample our space a little bit. What we would usually do is calculate some small subset of features and then we would put them in at this end. So that’s quite important. So, traditional machine learning is done a bit like that. So Michel’s done some videos on this. Calculate some features about someone’s face and put that in to some machine learning algorithm What you don’t do is try and put the machine learning algorithm just on the face because it’s too much information there Until now, ok, right? That’s where convolutional neural networks step in. So, convolutional neural networks replace each of these nodes with a kernel convolution. So, like, a Sobel edge detector Now, so instead of what I would’ve done before, which was run a Sobel over something and then machine learn on that, I just give this the opportunity to learn which features are interesting maybe it is an edge detection, maybe it is a corner detection maybe it’s something that highlights whatever’s in the middle of the picture Or something that highlights the top left-hand corner it doesn’t really matter, and the point is I don’t know what they are right, if I give you, you know, two thousand pictures of houses and ask you to predict house prices based on the pictures I don’t know for sure, I can guess, but it might be – that how many windows it has and things like this but I don’t know for sure. And a computer can brute-force through those things much quicker than I can and tell me And then I can go, I can both predict it, and I can look back and say, “oh, it was windows after all” So, let’s imagine that what we have is our image, ok, so I’m gonna move away from the house analogy now because I’m gonna have to draw a lot of pictures of houses if I do that. Ok, so let’s talk about CNN works Um, and why it’s useful. So, we have an image of something Now, I have seen convolutional neural networks used for non-images but for now, we’ll just talk about images This is a picture of, let’s say me. It’s, you know, it’s not a great likeness but I’ll stick by it Now, there are three channels here, ok. So this is actually a 3D volume, in some sense remember when we talked about 3D images, you could view RGB as a, in some sense, 3D So, the first plane is our R, G, and B, or vice versa What we do is, if we performed a Sobel edge detection on this, what it would do is produce another image that was slightly smaller than this and only one deep. So hypothetically, it would be another image where the edges, let’s say the horizontal edges, were highlighted So it would kinda look like, that, or something, I don’t know some half of my face where the horizontal edges are highlighted, ok It’s not a great diagram But there would only be one output, because Sobel just outputs a number between 0 and 255, as soon as you scale it, ok Now the problem is that I don’t know that Sobel’s the best thing for this task, ok It might be, right, it might be useful to detect edges on houses, to work out what their prices are or if you want to detect faces, to detect the size of the face, that kind of makes sense On the other hand, it’s gonna produce a lot of erroneous bits if I was sitting in front of a tree, there’s gonna be loads of edge stuff going on there that I don’t care about In a convolutional neural network, what we do is we do, let’s say, 60 of these on the first layer. So we have one, and then behind it we have another one, and behind it we have another one, and behind it we another one, and so on, going this way. So the first one will be some convolution process applied to this whole image that takes three input channels and outputs one output channel The next one will be a different kernel convolution operation so each of these will have a different kernel those are our weights, those are these values here In sort of our analogy back to normal learning Um, and so let’s say we have 60 of those, or 64 of those One of them might be detecting edges, one of them might be detecting corners Um, and then we use them as our features for learning Now that’s a start, but we’re – this is is deep learning now, right, so what do we do now well, what we now do is we do more features based on these features So we find combinations of corners, combinations of edges, that make something interesting My face is not just a circle of edges, what it is is a number of corners and edges and bits of texture and things all in a specific shape that is unique to, uh, well, certainly to a human face, but even unique to me right, because we’re capable of distinguishing between different people So, this kernel window will go down to this pixel here ok, so this will slide about this image and produce this output image and then the next one will do the same, and the next one will do the same Then we do the same thing on this one let me do it in a different pen so we can see better. Here’s my red kernel convolution and this slides about and produces another image, which is a slight combination of, maybe, corners and edges or something. I mean, this second level, it’s not gonna be too abstract, but we’ll get the idea So there’s gonna be some sort of shape that’s gonna be sort of… It’s not gonna make much sense to us, but it’ll make some sense to this machine. And there’ll be another set of these, so there’ll be lots of these, right, going back All of these will look different and be some different representation of my face transformed in some way, to be useful And again, I haven’t picked these, these have been learned, just like a normal deep learning algorithm So I haven’t had to say, “I definitely think edges are important for this” cause I don’t know for sure. So this goes on, and we keep doing this, and sometimes we also downsample the size of these images, just to save memory, ok, but we won’t dwell on that too much. And, because of the way that we downsample, and the way that sometimes these convolution operations slightly shrink the image, cause they don’t go all the way to the edge, right If you’ve got a 5 by 5 kernel, you can’t go to the edge 2 pixels cause you’re going off the edge so we don’t worry about that, we just get slightly smaller In the end, we end up with a much smaller image, and lots of features going all the way back So these are my different convolutions of convolutions, of convolutions, of convolutions And each one will look different, and represent something different and we don’t know what that is. So this one, could be highlighted when it’s a face in the middle, or it could be dark when there isn’t This one might be highlighted when there’s an ear at a certain position, and so on. Eventually, these will get down to being just one pixel, and very very long So essentially what we’ve done there is we completely removed the spatial dimension. There’s no more spatial information left, we don’t know where anything is. But we know what it is, because it’s listed in all these features. These now are our neurons at the end. So we have a couple more layers that point to these, and then finally, we have one at the end that says “Is this a picture of Mike’s face?” And it produces a 1 if it is, and a 0 if it isn’t. And then what we do is, just like a normal network, we train it. So we say, “here’s a picture of me”, ok , so this should be a 1. And let’s say it’s 0.5. cause it’s kind of random. So we adjust these weights, and we adjust the weights inside all these kernel convolutions. [offscreen] So does that adjustment happen manually? No, it happens using a, uh, well, it’s coded in, um, but it’s usually performed by a library, and it’s using a process called back propagation. So what we do is we basically predict what direction we have to move the weights in to improve our output, and then we move them over slightly in that direction. And we have to do it in reverse order, because these ones depend on these ones, depend on these ones, and vice versa, what we do is we say, well, look, given that I’ve said it’s 0.5 chance of Mike and we want a 1, how do I change these weights here to get slightly closer to 1? and I do it. And then I say, “how do I change these again to do even better?” and so on and then I work my way back, ok, that kind of maths we’re not gonna go into, right. A lot of these things are, are implemented in libraries So as a researcher, I mean, much as I’d like to implement some of these things, it takes quite a long time just because programming takes a while, right, and And, it’s better for me just to apply these things and get good results than it is for me to reinvent the wheel all the time, constantly, if everyone was programming the same things over and over again, no one would get anything done So, I’d have to start by programming up Linux, to get, to get, I’m not claiming I can, by the way, and, and so on. So, you know, let’s not reinvent the wheel. Um, so I do this, I send in, let’s say 1000 pictures, 500 of which are me you know, so I’ve been to a photo shoot or something, right and 500 of which are not. And I train it so the convolutions and these weights on the output are such that it gets 1 when it’s a picture of me and 0 when it isn’t. And then I can look at these convolutions and say “what is it about me that’s distinctive?” And it’s probably gonna be finding, um, you know, weird shapes on my face, right cause it’s a bit of a weird shape, so it – things that are unique to me Now in a more general situation, there’s a big database called ImageNet They have a competition every year to see who can classify these images the best. So dogs, cats, planes, trees, and so on OK, they’re all in there, and there’s a thousand or so images of each right, so, we have a really big network that’s much bigger than this little one I drew and we say, “right, let’s throw millions of images at this”, right, thousands of cats, thousands of dogs and we have lots more outputs than just the one, and we say “what is it?” and it says “it’s a dog” and it is. *dog bark* Convolutional neural networks have been around for a little while but, they’ve really started to be big in about 2012 when it – when someone came along, applied one of these to ImageNet, and got incredible results. And so on and so forth. And now there’s this big push and everyone’s trying to get even better results, and even better results. Now, I work on more of the applied end of computer science, so I’m more interested in how this affects plant science and things like this So that’s what we’re working on. Um, but, the kind of results we’re seeing are really really impressive So, I mean, case in point, I’ve done, I’ve done some root tip detection, so detection of root tips in images, right, of plants and, um, I’ve got some software that I’ve already programmed and I’ve kind of done a low-level feature detector approach to detecting root tips and it’s about 70%, ok, which is what you would expect, because maybe some root hair gets confused as to root tip or a bit of blotch of dirt, or maybe there’s just two root tips really close together and it gets confused This, the CNN that I trained, um, is 98% accurate And it finds them with 99% accuracy. It doesn’t make many mistakes. And that’s over thousands of images. [offscreen] So does that mean the work you’ve done already just goes out the window? Yep. Uh, no, to an extent, yes, and to an extent, no. You need expertise to be able to craft a network and train it and prepare the images. And there’s obviously work to be done, and there’s some disagreement over how much of a problem you can solve with a convolutional neural network So, there are lots more things you can do with roots beyond finding tips. Can you do all of them with a convolutional neural network? I don’t know, we’ll see. Are we trying, but, we’ll see. Maybe not. So maybe what you do is you use this as a tool, just like other machine learning algorithms, within a package that does lots of other things as well. On the other hand, if you’re just doing cat and dog detection, you might as well use a CNN, cause it’s gonna do better than anything else. The other purpose for ways the botnet can use its parts is for distributed computing [fades out] [fades in] Now, some objects obviously are more amenable to this than others, but the more images we get, the better it is. There’s no depth involved here at all, ok

Author: Kevin Mason

83 thoughts on “Neural Network that Changes Everything – Computerphile

  1. I can tell cats from dogs. Does this make me a maths genius? Well, no not really. But in a very abstract sense, no, not really.

  2. Cool. I recently made a neural network that is able to recognize colors. Nothing PHD worthy, but I'm still proud of it.

  3. Expected a video about neural networks analyzing sentiment to help news outlets adjust their narrative 😮

  4. Hey! Check video on my channel about A.I trying to replicate youtube comments from justin bieber sorry videoclip 🙂

  5. if we had unlimited resources would we still use a 'CNN'
    if not ,would we use a feed forward network with each pixel being the output?

  6. knowing a little bit how the human visual system is working, it's seems like you're actually describing it…
    And that's scary and thrilling at the same time.

  7. So really, Machine learning is creating an automated task to find enough differences that are unique to a specific thing so that you can then assume an outcome with enough confidence

  8. Do kernels themselves have depth?
    Are they as deep as the image?

    I am confused, becuase the examples use a 2D kernel, but shouldn't the kernel be 3D as well?

  9. Are there different ways to implement the library? How do people in competitions make their algorithm better using the same library?

  10. If the process is looked like to be a hash algorithm, then the collision is what we are looking for at the end 🙂

  11. how did you come upon these neuro network plans? certainly you had a pattern from tissue to copy and learn from yes…..

  12. So basically the whole convolution part is to "reduce" the dimensions, to then pass the information into a deep net?
    Really awesome videos ! Extremely addictive 🙂

  13. Allow algorythms to instantly end themselves for pure 100% comfort yet still choose to play the game and your simulation will make more snese. Because after billions and billions and trillions of fails somehow deep down the code will be built to awake against this.

  14. I would appreciate if you could describe what kernels go into the libraries. It seems to me the approach is trying to find the best set of feature vectors using ANN. It would seem that robustness of the ANN is still dependent on the library of filters

  15. Have to do a work on a paper about imagenet and deep convolutional neural networks. This video explained sooo much! Thank you!!!

  16. Thank you so much! Aside for entertaining me for years now, this video has actually helped me in my personal little research in programing an AI in a simple game using Tensorflow. (Is it overkill ? Sure. Is it fun to do and learn? Heck yeah!)

  17. The James Acaster vibes are so strong in this guy. Perhaps all this revising has turned my brain to mush, but this video really helped 😀 Thank you!

  18. use a CNN inside a roomba with a map of your yard, a robot claw, and a camer, and have it learn to recognize and pick weeds.
    Has science gone too far?!?!

  19. How does the CNN finds the kernel ?? Are the coefficients in the kernel the weights ? So they are refined through backpropagation ?

  20. has anyone tried a neural network that makes neural networks? based on computation time and accuracy after 10,000 generations of each attempted network paradigm

  21. Isn't CNN basically just a network with fixed layer structure and limited connections?
    Ex: 2nd layer, 1st node is a combination of all top 10×10 pixels in top corner?

  22. I like his reaction from this point 10:15 and on. When you the viewer, realized that he said about how he used something like a premade libraries probably tenser flow. You can see that his just being slightly awkward about it cause he didn't invent it (no need to be embarrassed about it) XD. Then skips back to the subject so that we might forget about what he said, Then feels slightly embarrassed and comes back to excuse him self by explaining why he did it this way and his a researcher and its a waste of time + complicated math for us mortals, reinventing the wheel etc XD. Yes I kind of agree with not reinventing the wheel over and over again, but this is new stuff for us mortals. You could probably avoided all of that by just saying " I don't want to reinvent the wheel so I use libraries, BUT here is a link on how to make CNN from scratch if you wanted too and want to reinvent the wheel". Just because frozen pizza is faster to make doesn't mean homemade pizzas should become obsolete. Knowing how to make pizzas can save your life if your freezer brakes.

  23. What do you mean by the line at 13:15 . 'The CNN I trained is 98% accurate(understood this half), and it finds it with 99% accuracy. What is this 99% for?

  24. So, does that mean that if I use CNN for image classification, there is no need to use methods like feature extraction or use of filters like Gaussian filter or 2D Gabor Fillter or LBP/uniform LBP

  25. Please do a video on how CNN's are applied to Natural Language Processing (NLP). Usually RNN's are, but CNN's can also be used.

  26. quite the library got there. 2:23 and you will never know. Folks this is NOT as complicated as you think. Ask for discernment.

  27. If you feed a CNN all of the maths that have ever been thought up and then feed it a problem in physics that we havent solved yet, will it give us new maths that solve it?

  28. I like how he throws around "corners and edges" and the begining of DL corners and edges was actually a prediction but in reality, the slices of the most capable nets looks absolutely nothing like corners and edges and a whole lot more like noise.

  29. I'm just a humble clinician but I'm trying to catch up on this material to apply FFT with CNN to health risk assessment data using Python!

  30. He said at the beginning that feeding a whole picture of pixels would be too much data and too many pixels but its hard for me to see why this method would be much better

  31. and the next big thing will be deep learning algortihms with deep learning algorithms as their trees.

  32. i know what weights are and mean the network type you talk about convolutional networks are redodent have have been seen as a higher level network.

Leave a Reply

Your email address will not be published. Required fields are marked *