python - Use groups in scipy.stats.kruskal similar to R cran kruskal.test -
i'm trying replace rpy2 code in python script python (scipy). in context need replace kruskal-wallis test (r:kruskal.test()) (python:scipy.stats.kruskal).
scipy.stats.kruskal returns similar h-statistic , p-value when comparing integers/floats only. however, have difficulty applying groups represented strings.
below subsample of data:
y = [4.33917022422, 2.96541899883, 6.70475220836, 9.19889096119, 2.14087398016, 5.39520023918, 1.58443224287, 3.59625224078, 4.01998599966, 2.58058624352] x = ['high_o2', 'high_o2', 'high_o2', 'high_o2', 'low_o2', 'low_o2', 'low_o2', 'low_o2', 'mid_o2', 'mid_o2'] in r 1 type:
kruskal.test(y,as.factor(x)) doing same thing in python (2.7) using scipy (0.17):
from scipy import stats stats.kruskal(y,x) however, low p values (p<e-07) , quite high h-statistics (26) when using scipy, incorrect. have tried replace x list {0,1,2} no improvement.
how can tell scipy treat x groups during ranking?
each non-keyword argument passed scipy.stats.kruskal treated separate group of y-values. passing x 1 of arguments, kruskal attempts treat label strings though second group of y-values. strings cast nans (which ought raise runtimewarning).
instead, need group y values label, pass them separate input arrays kruskal. example:
# convert `y` numpy array more convenient indexing y = np.array(y) # find unique group labels , corresponding indices label, idx = np.unique(x, return_inverse=true) # make list of arrays containing y-values corresponding each unique label groups = [y[idx == i] i, l in enumerate(label)] # use `*` unpack list sequence of arguments `stats.kruskal` h, p = stats.kruskal(*groups) print(h, p) # 2.94545454545 0.22929927
Comments
Post a Comment