Introduction to Linear Classifiers, Learning and evaluating the performance !!
Problem Class : There are many different problem classes in ML. They vary according to input data and what kind of conclusions are to be drawn from it. Five standard problems classes are described below :
- Supervised learning
- unsupervised learning
- Reinforcement learning
- Sequence learning
- Other settings : Like Semi supervised learning , Active learning etc.
So in this blog I will be explaining what is classification problem and simplest algorithm to solve the same. This falls under Supervised learning Class .
Classification problem : This is a task of assigning category to new data based on knowledge gained from a trained model . So in simplest way its way of tagging new data point in some class . For example : Predicting a mail Spam/Not Spam , predicting weather today will be rainy or sunny, Tagging a picture cat or not cat etc.
Training data set (Dn) is in the form of a set of pairs {(x (1) , y (1) ), . . . ,(x (n) , y (n) )} where x (i) represents an object to be classified, most typically a d-dimensional vector of real and/or discrete values, and y (i) is an element of a discrete set of values. Y is also called a Target variable and X(i) called as independent variables.
A classification problem is binary if target class has only two possible outcomes (eg. Yes/No, Spam/Not Spam) , Otherwise it is called Multi-Class problem
The goal in a classification problem is ultimately, given a new input value x (n+1) , to predict the value of y (n+1) .
1. Binary Linear Classifier
first of all going into detail of Linear classifier lets define what is Learning Algorithm(Hypothesis Class, H).
Learning Algorithm is a procedure that takes a data set(D(n)) as input and return an element h of H(i).
So choice of Learning Algorithm have big impact on Test Error. So we will discuss Linear classifier in this blog . These are relatively easy to understand , Simple in Mathematical computation and Powerful in their Own.
A linear Classifier in d dimensions is defined by a vector of parameters θ ∈ R(d) and scalar θ0 ∈ R. So, the hypothesis class H of linear classifiers in d dimensions is the set of all vectors in R(d+1) . We’ll assume that θ is a d × 1 column vector. Given particular values for θ and θ0, the classifier is defined by:
Remember that we can think of θ, θ0 as specifying a hyperplane. So these divides the given space in two half-spaces . The one that is on the same side as normal vector is positive-space and we classify that positive . The other side is negative-Space and we classify that as Negative. As shown in below Figure .
So in fig1 we can see that its Binary classification problem . All points which are in in same side as Normal Vector(V1) are Positive and The points which are opposite side to Normal Vector are Negatively classified .
2. Learning linear classifiers( Random Linear Classifier)
So in this post we will start by considering a very simple Learning Algorithm Random classifier . We can call this approach as be Dumb , The idea is to generate k possible hypothesis by generating their vector at random. Then we can evaluate the training-set error on each hypothesis and pick which has min training-set error out of all .
So algorithm for Random Classifier will be :
here k is Hyper parameter of Algorithm . Hyper parameter of ML Algorithm is something that effect how algo work . So for example we are taking k=2 , then only 2 Hypothesis will be randomly running on data and so in this case as we have only two output may be those two will not give us optimized good algorithm .In other case if K=1000 then we have good options of 1000 hypothesis which will definitely give us the good as compared to K=2.
as we can see in Fig2 error(En)is decreased as K increased
3. Evaluating a learning algorithm
The best method is to measure test error on data that was not used to train the model. Generally , We would like to execute following process multiple times:
- Train on new training data set .
- Evaluate resulting h on a testing set that does not overlap training set
As practically sometimes its not possible every time to get new training data set for testing , So in that cases mostly we can use Cross-Validation
In simple words ,the algorithm of the k-Fold technique works as below:
- Pick a number of folds — k. Usually, k is 5 or 10 but you can choose any number which is less than the dataset’s length.
- Split the dataset into k equal (if possible) parts (they are called folds)
- Choose k — 1 folds as the training set. The remaining fold will be the test set
- Train the model on the training set. On each iteration of cross-validation, you must train a new model independently of the model trained on the previous iteration
- Validate on the test set
- Save the result of the validation
- Repeat steps 3–6 k times. Each time use the remaining fold as the test set. In the end, you should have validated the model on every fold that you have.
- To get the final score average the results that you got on step 6.
It’s very important to understand that cross-validation neither delivers nor evaluates a single particular hypothesis h. It evaluates the algorithm that produces hypotheses.
So in this blog we have discussed what is Linear classifier , How to learn the same and Random classifier model .
In next blog I Will share another Learning Algorithm which is Clever one , Called The Perceptron.
Stay tunned !! Keep Reading !!!
Thanks for reading!