Classification
Start out with "training test" then classify it
Class--Categories we want to predict
Model--Equations used to test data for predict the class of each person
Ex. IRS
Training data set-build model on 1990-2000 (know the class)
Validation data set-test prediction 2001-2007 (know...)
Testing data set-applied to new data 2008 (dont know)
Training>Model>validating>Final Model>Testing
99/1 test. look at 99 test on 1.
Classification Tree
Training set: given a collection of records, attribute =x, with one addition attribute class (y)
find a model and predict the class as a function of the values of the attributes
Goal: previously unseen records should be assigned a class as accurately as possible.
a test set is used to determine accuracy.
EX.
classifying credit card, legits or fraud
classifying structure of protein
categorizing news stories as finance, weather, entertainment...
predict tumer cells as benign or malignant.
Classification Tree
splitting attributes.
"rpart" the function rpart() in the library "rpart" generates classification trees in R
whatever 1 u'r trying to predict make sure its a factor.
slide #16 look at indentation,
class = % of each class. ex. out of all the ppl in our set 79%
at the root node, its for the whole data set.
Prediction: predict whichever class with higher percentage. *choose class w/ high %"
a) age = middle, number = 5, start = 10 (class 2)
b) class 1
c) class
No comments:
Post a Comment