Detecting outliers:
Method 1
|z| >= 3
ex. 60
use Pivot table in excel
count IP address
how many time the address show up, see the standard dev, calculate the z score
z= x - mu / sigma
if Z is > than 3 its an outlier
mu = mean or average = 11.6
Standard deviation = 141.1525
2nd Popular method.
IQR 3rd Quater - 1st Quarter
Median 2nd quarter
1.5 x IQR = outlier
outliers 1st 2nd 3rd outliers
|---------|====|====|-----------|
1.5xIQR 3rd-1st
25% 50% 25%
q1<-quantile(data[,3],.25,na.rm=true)
q3<-quantile(data[,3],.75,na.rm=true)
Q1= 150
Q3=175
interquartile range = (175-150)/2 = 26
iqr<-q3-q1
iqr = 26
26*1.5 = 39
q1 = 150 - 39 = 111 (any score below 111 is an outlier)
q3 = 176 +39 = 215 (any score above 215 is an outlier)
Proximity-Based Outlier Detection.
K-nearest neighbor
ex.62
Lm - regression
a "linear model"
x_2 = Beta_o + B_1, X_1 + Ephsilon
model<-lm(data[,3]~data[,2]
predict data 3 using data 2
x_2 = -36+1.1x_1
No comments:
Post a Comment