K-Fold Cross Validation

Let a student during exam, his teacher said that, learn 5 chapter from math book. He learned and goto school, When the exam start, he saw that, question were from whole book. So, he scored less.
Again, think another situation, he read whole book and question were from only 5 chapter. He scored good.
In this two situation, we can't say that, he a bad student and also he is a good student, analyze his result.

If the question were from full book and he performed well, then we can told him a good student.

We know that, From dataset we split data for trainning and teesting.
Lets our dataset contains 100 data. We divide it in 5 category.
D1= 20, D2= 20, D3= 20, D4= 20, D5 = 20
So, now, D1,D2,D3,D4 are trainning set and D5 is testing set. If D5 perform good in evaluation. Is it really good? Answer is No. Its looks like my 1st example. Because, we can get unknown data from trainning set, which is not in testing set.

This is wherer the cross validation came from. We need to divide the data equally (K-set). This dividation will be randomly. This is called K-fold cross validation. We need to train (K-1) set and test the rest. Using a loop, we need to shuffle K-time and done the process.

At last, take avarage of performance value. That is the result how much good or bad is you model.


so, from our dataset it will be,


sum = performace1+performace2+performace3+performace4+performace5

evaluation score = sum / 5








Comments