Wednesday, April 22, 2009

Data Mining

Test question #1 Calculate the Misclassification Rate
Split on A & B

Test question #2
Numeric Variable to split on.
out of all the split which is best to split?
class Smallest to biggest

split on all the places in between
then calculate Misclassification Rate

Test question #3
Predicting by reading the Tree output

age= middle
number = 5
start = 10
class = ?

look at level 1 @ number 2 or 3 follow down the tree then classify it.
whatever the word is, is the majority

test Question #4
ROC curve

Model M2
sort largest to smallest

True Possitive Rate =TP/(TP+FN)
(TP+FN) = same as # of actual positive

False Positive Rate = FP/(FP+TN)
(FP+TN) = same as # of actual negative

P Class TPR FPR
.68 - 0/5
.61 +
.45 +
.38 -
.31 -
.09 +
.05 -
.04 -
.03 +
.01 +


question #8
a) boosting
b) correct, bagging use repeated samples
c) boosting also

#9

No comments: