Missclassification: Techniques
M.R
Class1: Cancer
Class2: No Cancer
Two way to error:
1. False Possitive, find something not there.
2. False Neg, Not find something there.
Test 1 for cancer
-Test on 1000 ppl
Classify
Cancer No cancer
Actual Cancer 10 40**
No Cancer 5* 945
* type 1
** type 2
Misclassification rate: 0.045
Test 2 for cancer
-Test on 1000 ppl
Classify
Cancer No cancer
Actual Cancer 48 2**
No Cancer 50* 900
Misclassification rate: 0.052
RECALL & PRECISION
Predicted Class
Class=Yes Class=No
Actual Class=Yes a (TP) b (FN)
Class=No c (FP) d (TN)
Recall = a/(a+b) = TP / (TP+FN) - tell us how we do on the ppl that are actual Possitive
Precision = a/(a+c) = TP / (TP+FP) - tell us how we do on the ppl that are predict Possitive
Before we just used accuracy = a+d /(a+b+c+d) = TP+TN / (TP+TN+FP+FN)
THE F MEASURE
F combines recall and precision into one number
F= 2rp/(r+p) = 2TP / (2TP+FP+FN)
its equals the harmonic mean of recall and precision
Book calls it F_1 measure because its weight both recall and precision equally
The ROC curve
ROC stands for Receiver Operating Characteristic
Since we can "turn up" or "turn down" the number of observations being classified as the possitive class, we can have many different values of true possitive rate (TPR) and false positive rate (FPR) for the same classifier.
TPR = TP / (TP+FN) a/(a+b)
FPR= FP / (FP+TN) c/(c+d)
The ROC curve plots TPR on the y-axis and FPR on the x-axis
No comments:
Post a Comment