Outlier removal techniques from an array

  numpy, outliers, pandas, python, scipy

I know there’s a ton resources online for outlier removal, but I haven’t yet managed to obtain what I exactly want, so posting here, I have an array (or DF) of 4 columns. Now I want to remove the rows from the DF based on a column’s outlier values. The following is what I have tried, but they are not perfect.

def outliers2(data2, m = 4.5):
c=[]
data = data2[:,1] # Choosing the column
d = np.abs(data - np.median(data)) # deviation comoutation
mdev = np.median(d) # mean deviation
for i in range(len(data)):
    if (abs(data[i] - mdev) < m * np.std(data)):
        c.append(data2[i])            
return c

x = pd.DataFrame(outliers2(np.array(b)))
column = ['t','orig_w','filt_w','smt_w']
x.columns = column

#Plot
plt.rcParams['figure.figsize'] = [10,8]
plt.plot(b.t,b.orig_w,'o',label='Original',alpha=0.8) # Original
plt.plot(x.t,x.orig_w,'.',c='r',label='Outlier removed',alpha=0.8) # After outlier removal
plt.legend()

the plot illustrates how the results looks, red points after the outlier treatment over the blue original points. I would really like to get rid of those vertical group of points around the x~0 mark. What to do ?

A link to the data file is provided here : Full data
enter image description here

Source: Python Questions

LEAVE A COMMENT