python - Numpy Structured Arrays by Name AND Index -

April 15, 2014

i can never seem numpy arrays work nicely me. :(

my dataset simple: 150 rows of 4 floats followed 1 string. tried following:

data = np.genfromtxt("iris.data2", delimiter=",", names=["sl", "sw", "pl", "pw", "class"], dtype=[float, float, float, float, '|s16'])  print(data.shape) ---> (150, 0) print(data["pl"]) print(data[:, 0:3]) <---error

so changed 5 floats doing simple file replace. because couldn't non-homogenous array work nicely both column name , index accessing. have made homogenous, still gives me shape of (150, 0) , error.

data = np.genfromtxt("iris.data", delimiter=",", names=["sl", "sw", "pl", "pw", "class"])  print(data.shape) ---> (150, 0) print(data["pl"]) print(data[:, 0:3]) <--- error

when remove names entirely, works index-column acces, not names anymore.

data = np.genfromtxt("iris.data", delimiter=",")  print(data.shape) ---> (150, 5) # print(data["pl"]) print(data[:, 0:3]) ---> works great!!!

why , how fix it? ideally both name , index column access without replacing string float-code, if need in order name , index column access.

there's clear distinction between fields of 1d structured array, , columns of 2d array. aren't interchangeable. field names aren't column labels. if isn't clear many need read dtype or structured array docs in more detail.

define pseudo file:

in [93]: txt=b"""1,2,3,4,txt    ....: 5,6,7,8,abc"""  in [94]: np.genfromtxt(txt.splitlines(),delimiter=',',dtype=none) out[94]:  array([(1, 2, 3, 4, 'txt'), (5, 6, 7, 8, 'abc')],        dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4'), ('f3', '<i4'), ('f4', 's3')])

with mixed columns default way load structured array, 2 rows (shape=(2,)), , 5 fields, indexed data['f0'] or data[['f0','f2']]. ability index several fields @ once limited.

but can define compound dtype, such as:

in [102]: dt=np.dtype([('data',float,(4,)),('lbl','|s5')])  in [103]: dt out[103]: dtype([('data', '<f8', (4,)), ('lbl', 's5')])  in [104]: np.genfromtxt(txt.splitlines(),delimiter=',',dtype=dt) out[104]:  array([([1.0, 2.0, 3.0, 4.0], 'txt'), ([5.0, 6.0, 7.0, 8.0], 'abc')],        dtype=[('data', '<f8', (4,)), ('lbl', 's5')])  in [105]: data=np.genfromtxt(txt.splitlines(),delimiter=',',dtype=dt)  in [106]: data['data'] out[106]:  array([[ 1.,  2.,  3.,  4.],        [ 5.,  6.,  7.,  8.]])  in [107]: data['lbl'] out[107]:  array(['txt', 'abc'],        dtype='|s5')  in [108]: data[0] out[108]: ([1.0, 2.0, 3.0, 4.0], 'txt')

now data['data'] 2d array, containing numeric values original text.

the field names can fetched tuple:

in [112]: data.dtype.names out[112]: ('data', 'lbl')

so possible perform usual list/tuple indexing on them, , convoluted viewing fields in reverse order:

in [115]: data[list(data.dtype.names[::-1])] out[115]:  array([('txt', [1.0, 2.0, 3.0, 4.0]), ('abc', [5.0, 6.0, 7.0, 8.0])],        dtype=[('lbl', 's5'), ('data', '<f8', (4,))])

Search This Blog

Color

python - Numpy Structured Arrays by Name AND Index -

Comments

Post a Comment

Popular posts from this blog

Redirect to a HTTPS version using .htaccess -

Unlimited choices in BASH case statement -

javascript - jQuery: Add class depending on URL in the best way -