We can try to see how K means can be applied on images on one easy easily available images Data set is this one Everybody stops Never this date I said What is this car No it’s not a major off number that nothing like that This these are really pages Each of them is in usual images 100 business and the data said the name is and missed M and I STE Mixed Nationals National Institute of Standards and Technology When that’s again nobody knows that turned him It’s written over there but it’s and mysterious it And you will use it quite often because it’s a small distance So you can try out things on your CPU without the need for a GPU And so and this data set was really released by a gentleman named John Lee Run so and he’s one of the prominent names in the field of an I.
How many of you have heard of YAGNI Nobody Okay so I mean, but these are the guys who because of which we are actually here right now we are reading You know where we want to learn AI So young Lee Jun is currently the head of Research at Facebook on a very prominent IR along with Geoffrey Hinton was thought to be a father When who’s taken is the father of deep learning So Jeff Yang Run actually released the status it long back Ana So we’re going to use this data set so you know we have.
What we have here is we have unusual images off digits 0123456 Again all of It’s very hard to actually build a program without machine learning in this case as well Right That’s everybody writes 123 in its own way right I mean you can see I mean this is also one This is one and this is also apparently one I don’t know how somebody can write this but waken criticized but you know we can uh So what kind of human accuracy will be there on this data set 99 More than 90 Names right I mean it should be that should be the case, but there might be a few you know they are just written by doctors old doctors like you know especially it’s hard to understand what they’re writing so but we should have on dumb the best possible You know accuracy that has been achieved by a deep learning model is around 99.7% on the status it So we’re pretty close to what humans can do.
What we will do We will use this there said why One thing is it’s ah it’s The images are very small So you know Dennis it is not very big On the second is it’s easily available right It’s easily available That’s another thing You know we need data right So but you can try out these examples on any of the, so think is in this case this endless data set is not I mean it’s not that you know you need to use K means right We know this data is that is labeled this device that is label basically bit means each image has a label So what we’reg going Toto do is we’re not so just toe Try it out with K Means we are guard going to use labors and see if gaming’s can still figure it out What kind of images are they.
You know if you know sometimes because sometimes it becomes very difficult to know what cluster got created right A z were discussing yesterday One suite is a cluster We mean it concerned with the domain experts to understand it does this cluster make any sense to you Right But in this case because we have the neighbors and we know how 123 looks like we can Actually we are the domain experts or we need more don’t worry about going to the domain experts so we will use that on Duh So, So now another data set is clear So what do you want to do So what does an image How does an image look like to a computer excels right do you want to see how is a computer looks at image right now It’s not 01 are not better Heart.
This is a picture sit for exam What is that picture What is that visit I mean if you look a little maybe you go a little back I think it looks like eight Rights I know it sometimes Right This is eight Basically on you See what This is What It appears to us eight But actually these are pixels with values between zero and 2 55 using usually that’s the and these are great state I’m in black and white pictures Essentially So you do not have RGB three channels Usually if you have colored pictures you will have three sets off pictures You know three sets of pictures are rather 33 set off numbers to represent each picture ESO.
We’re gonna basically use So we have the data on what you will do Well try to see how come in skin Can I try identifying different type of images Indigenous it so And we can then apply the same approach where we may not even have labels right I mean here we have the label so we can verify the results but you can use the same approach as we discussed on images where we do not have labor All right so let’s go ahead with that So all right so we will spur us a start with you know we want to load the data on and uh so I mean I think other data is already there in the but you can download em list in multiple ways ESO There is a way to download the data set using psychic learned library itself But somehow it was given an issue to me I mean somehow the site was down or whatever the case may be s so I had to actually install this back It’s called by thorn hyphen m NIST.
If you want to install that you can install that as well on Duh Then basically download these two files but is in majors as well as the labels Again We need not have the labels but we’re gonna just compare it compared to those things on Uh so this amnesty package is basically to read em this data only All right Okay So what we will do but I’ve already downloaded These are two fires on DE So I’m not going to execute that again What I will do I will go ahead and load the training A load The data on may take a few seconds, Okay It’s done on I will then convert those images on Labor’s and toe The number varies Right.
Once we’re done with that then we can check in the shape of the images Right What information We will get from shape So how many examples are there 60,000 right 60,000 on each example has how many features So 7 80 for how can an image have just one diamond ship Right It usually had either to our three-dimension to is like for black and white on three for a colored image like here we saw here we’re seeing their right This is a wonder You know the data, in this case, is that one dimension or two dimensions But it has rules And it has columns right It should be two dimensional I should be two dimensional because this is black And right now in this case I don’t know if you can count but how many rows and how many columns do we have All right.
I think it’s hard to count, but these are actually we need by 28 pictures 28 Picks in the rose 28 columns Ukrainian rose 28 Columns are very, very small pictures usually on what is the size of pictures we take from our cameras I mean you know you have a lot of pixels maybe 2000 by 2000 pixels or whatever right I mean you have a lot of pictures, so these are very, very small pictures 28 by 28 pixels So that’s where the duplicitous small and you can still work with on the CPU So it should be a two dimensional But the data we have as one dimension for each Each example 60,000 comma 7 84 That’s what the shop So right The shape of the data is 7 60,084 So 60,000, of course, is the number of images on each image has 784 values So one day 77 84 value correspond to so that these are pixels These are definitely pixel values the pixel values And maybe we can see some of them may be there.