Introducing convolutional neural networks (ML Zero to Hero, part 3)

Introducing convolutional neural networks (ML Zero to Hero, part 3)

♪ (music) ♪ Hi, and welcome to episode three
of Zero to Hero with TensorFlow. In the previous episode,
you saw how to do basic computer vision using a deep neural network that matched the pixels
of an image to a label. So an image like this was matched to a numeric label
that represented it like this. But there was a limitation to that. The image you were looking at
had to have the subject centered in it and it had to be
the only thing in the image. So the code you wrote would
work for that shoe, but what about these? It wouldn’t be able
to identify all of them because it’s not trained to do so. For that we have to use something called
a convolutional neural network, which works a little differently
than what you’ve just seen. The idea behind
a convolutional neural network is that you filter the images before
training the deep neural network. After filtering the images, features within the images
could then come to the forefront and you would then spot
those features to identify something. A filter is simply a set of multipliers. So, for example, in this case,
if you’re looking at a particular pixel that has the value 192, and the filter is the values
in the red box, then you multiply 192 by 4.5, and each of its neighbors
by the respective filter value. So it’s neighbor above
and to the left is zero, so you multiply that by -1. Its upper neighbor is 64, so you
multiply that by zero and so on. Sum up the result, and you get
the new value for the pixel. Now this might seem a little odd, but check out the results
for some filters like this one that when multiplied over
the contents of the image, it removes almost everything
except the vertical lines. And this one, that removes almost
everything except the horizontal lines. This can then be combined
with something called pooling, which groups up the pixels in the image
and filters them down to a subset. So, for example, max pooling two by two will group the image
into sets of 2×2 pixels and simply pick the largest. The image will be reduced
to a quarter of its original size but the features can still be maintained. So the previous image after being filtered
and then max pooled could look like this. The image on the right is one quarter
the size of the one on the left, but the vertical line features
were maintained and indeed they were enhanced. So where did these filters come from? That’s the magic
of a convolutional neural network. They’re actually learned. They are just parameters
like those in the neurons of a neural network that
we saw in the last video. So as our image is fed
into the convolutional layer, a number of randomly initialized filters
will pass over the image. The results of these are fed
into the next layer and matching is performed
by the neural network. And over time, the filters
that give us the image outputs that give the best matches
will be learned and the process
is called feature extraction. Here is an example of how
a convolutional filter layer can help a computer visualize things. You can see across the top row here
that you actually have a shoe, but it has been filtered down
to the sole and the silhouette of a shoe by filters that learned
what a shoe looks like. You’ll run this code for yourself
in just a few minutes. Now, let’s take a look at the code to build a convolutional neural
network like this. So this code is very similar
to what you used earlier. We have a flattened input
that’s fed into a dense layer that in turn in fed into the final
dense layer that is our output. The only difference here is that
I haven’t specified the input shape. That’s because I’ll put a convolutional
layer on top of it like this. This layer takes the input
so we specify the input shape, and we’re telling it to generate
64 filters with this parameter. That is, it will generate 64 filters and multiply each
of them across the image, then each epoch, it will figure out
which filters gave the best signals to help match the images to their labels in much the same way it learned
which parameters worked best in the dense layer. The max pooling to compress the image
and enhance the features looks like this, and we can stack convolutional layers
on top of each other to really break down the image and try to learn
from very abstract features like this. With this methodology,
your network starts to learn based on the features of the image instead of just
the raw patterns of pixels. Two sleeves, it’s a shirt.
Two short sleeves, it’s a t-shirt. Sole and laces, it’s a shoe–
that type of thing. Now we are still looking
at just the simple image as a fashion at the moment but the principles will extend
into more complex images and you’ll see that in the next video. But before going there try out the notebook to see
convolutions for yourself. I’ve made a link to it
in the description below. Before we get to the next video,
don’t forget to hit that subscribe button. Thank you. ♪ (music) ♪

Author: Kevin Mason

54 thoughts on “Introducing convolutional neural networks (ML Zero to Hero, part 3)

  1. I've been following the series and I must say it's an insightful video about CNN,could you also shed some light on backpropagation calculus.

  2. Hehe i already visited github repo and tried all up coming parts of the video 😂 so smart of me… But these videos are really really very simple for anybody to get started for the first time

  3. I had no idea that convolution and max puling were actually done like that (technically, or mathematically). I just imagined little sets of little neural networks. But I bet that it would be very heavy on processing resources.

    (complete programing noob btw)

  4. Hi Laurence, how do you choose in keras the filters (kernels)?, like the ones on your examples to filter vertical and horizontal lines?

  5. i face one problem . i have 5 category of images i gather data set and ground truth and train cnn model and do prediction all well . but as soon as new category and when i run throught my trained model it categorize the new category into one of 5 category existing how . how can i prevent this false positive behaviour . the unseen image it predicts with high confidence which is a big problem

  6. please create a playlist of only Introducing convolutional neural networks (ML Zero to Hero, parts) otherwise, i like your work. learning more from this.

  7. Hello Laurence! First off, thank you so much for these videos, they're really informative and helpful.
    I tried implementing the code shown in this video, but I get this error:
    ValueError: Error when checking input: expected conv2d_input to have 4 dimensions, but got array with shape (60000, 28, 28)
    Please do help.


  8. Laurence sir, am enjoying your series very well but am absolutely newbie to ML, i hope you recommend some beginner guide that i can put my hands on it for more understanding. Thanks, regards.

  9. I've got one query, if I have 32 filters in first CNN and 64 filters in 2nd CNN. When I pass an single image through first CNN block, then I will get 32 outputs from 32 filters . So does each output will go through those 64 filters in 2nd CNN block?
    That makes total outputs from 2nd CNN block will be 64×32= 2048 outputs by the end of 2nd CNN layer for one single image?

    Help me to sort out this issue Mr.moroney.

  10. Another great video from Laurence! Thanks for the great lecture! Have a splendid day everyone. Good luck with your journey on conquering the world of Machine Learning!😁👍

  11. Can Anybody Please Tell Me What Is (3,3) In Conv 2D Code And Also Why The Input Shape Is (28,28,1) Not Just (28,28)?

  12. I do have a follow-up question and I have spent days to figure it out. I used your example to train a model that can recognize pictures in a book. I works perfectly as long as I am in the python environment. But I can't seem to get it out of there.
    I used, but I have not found a method to convert that data into any format I can use in any other application. I am aiming for Tensorflow sharp, because I want to use it in Unity, but I also failed in tesorflowjs – where at least I got a comprehensible error message about the shape. Is this approach even meant to do anything outside this environment?

  13. Outstanding explanation!! Clear and effective communications skills.
    Google is really making a huge contribution to improving the human condition by making tensorflow accessible to everyone.
    Sort of the same way IBM released the FFT to the public !

Leave a Reply

Your email address will not be published. Required fields are marked *