python - Numpy Structured Arrays by Name AND Index -
i can never seem numpy arrays work nicely me. :(
my dataset simple: 150 rows of 4 floats followed 1 string. tried following:
data = np.genfromtxt("iris.data2", delimiter=",", names=["sl", "sw", "pl", "pw", "class"], dtype=[float, float, float, float, '|s16']) print(data.shape) ---> (150, 0) print(data["pl"]) print(data[:, 0:3]) <---error
so changed 5 floats doing simple file replace. because couldn't non-homogenous array work nicely both column name , index accessing. have made homogenous, still gives me shape of (150, 0) , error.
data = np.genfromtxt("iris.data", delimiter=",", names=["sl", "sw", "pl", "pw", "class"]) print(data.shape) ---> (150, 0) print(data["pl"]) print(data[:, 0:3]) <--- error
when remove names entirely, works index-column acces, not names anymore.
data = np.genfromtxt("iris.data", delimiter=",") print(data.shape) ---> (150, 5) # print(data["pl"]) print(data[:, 0:3]) ---> works great!!!
why , how fix it? ideally both name , index column access without replacing string float-code, if need in order name , index column access.
there's clear distinction between fields of 1d structured array, , columns of 2d array. aren't interchangeable. field names aren't column labels. if isn't clear many need read dtype
or structured array
docs in more detail.
define pseudo file:
in [93]: txt=b"""1,2,3,4,txt ....: 5,6,7,8,abc""" in [94]: np.genfromtxt(txt.splitlines(),delimiter=',',dtype=none) out[94]: array([(1, 2, 3, 4, 'txt'), (5, 6, 7, 8, 'abc')], dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4'), ('f3', '<i4'), ('f4', 's3')])
with mixed columns default way load structured array, 2 rows (shape=(2,)), , 5 fields, indexed data['f0']
or data[['f0','f2']]
. ability index several fields @ once limited.
but can define compound dtype, such as:
in [102]: dt=np.dtype([('data',float,(4,)),('lbl','|s5')]) in [103]: dt out[103]: dtype([('data', '<f8', (4,)), ('lbl', 's5')]) in [104]: np.genfromtxt(txt.splitlines(),delimiter=',',dtype=dt) out[104]: array([([1.0, 2.0, 3.0, 4.0], 'txt'), ([5.0, 6.0, 7.0, 8.0], 'abc')], dtype=[('data', '<f8', (4,)), ('lbl', 's5')]) in [105]: data=np.genfromtxt(txt.splitlines(),delimiter=',',dtype=dt) in [106]: data['data'] out[106]: array([[ 1., 2., 3., 4.], [ 5., 6., 7., 8.]]) in [107]: data['lbl'] out[107]: array(['txt', 'abc'], dtype='|s5') in [108]: data[0] out[108]: ([1.0, 2.0, 3.0, 4.0], 'txt')
now data['data']
2d array, containing numeric values original text.
the field names can fetched tuple:
in [112]: data.dtype.names out[112]: ('data', 'lbl')
so possible perform usual list/tuple indexing on them, , convoluted viewing fields in reverse order:
in [115]: data[list(data.dtype.names[::-1])] out[115]: array([('txt', [1.0, 2.0, 3.0, 4.0]), ('abc', [5.0, 6.0, 7.0, 8.0])], dtype=[('lbl', 's5'), ('data', '<f8', (4,))])
Comments
Post a Comment