Monday, March 16, 2009

Data Mining

Miss classification Error
Training Testing
Tree 1(depth4) 11.5% 30.8%
Tree2(depth1) 22.3% 28.2%
Tree3(depth6) o.% 26.7%

-In general, favor the simplest model "Principal of Parsimony" Occam's Razor

on a Training data, a more complicated tree gives better results.

How are classification Tree Generated?
*many algorithms use a version of a "top-down" or "divide-and-conquer" approach known as "Hunt's Algorithm (Page 152):
Use an attribute test to split the data into smaller subset untill there's only pure nodes.

All 30 D_t are "metal"
prediction: metal
"Pure" node

*Usually it is done in a "greedy" fashion.
*Greedy" meanings t hat the optimal split is chosen at each stage according to some criterion
*This may not be optimal at the end event for the same criterion.
However, the greedy approach is the computationally efficient

How to Apply Hunt's Algorithm

Using greedy approach we still have to decide 3 things:
1) what attribute test condidtions to consider
2) what criterion to tuse to select the "best" split
3) When to stop splitting

* For #1 will consider only binary splits for both numeric categorical predictors as discussed on the next slide
* for #2 we will consider miscassification error, gini index and entropy
* #3 is a subtle business involving model selection. It is tricky because we don't want to overfit or underfit.

Misclassification Error
Error(t)=1-maxP(i|t)
*misclassificiation error is usually our final metric which we want to minimize on the test set, so there is a logical argument for using it as the split criterion
*it is simply the fraction of the total cases misclassification
*1-Misclassification error="Accuracy"

No comments: