python - Why is the order in dictionaries and sets arbitrary? -
i don't understand how looping on dictionary or set in python done 'arbitrary' order.
i mean, it's programming language in language must 100% determined, correct? python must have kind of algorithm decides part of dictionary or set chosen, 1st, second , on.
what missing?
the order not arbitrary, depends on insertion , deletion history of dictionary or set, on specific python implementation. remainder of answer, 'dictionary', can read 'set'; sets implemented dictionaries keys , no values.
keys hashed, , hash values assigned slots in dynamic table (it can grow or shrink based on needs). , mapping process can lead collisions, meaning key have slotted in next slot based on there.
listing contents loops on slots, , keys listed in order currently reside in table.
take keys 'foo'
, 'bar'
, example, , lets assume table size 8 slots. in python 2.7, hash('foo')
-4177197833195190597
, hash('bar')
327024216814240868
. modulo 8, means these 2 keys slotted in slots 3 , 4 then:
>>> hash('foo') -4177197833195190597 >>> hash('foo') % 8 3 >>> hash('bar') 327024216814240868 >>> hash('bar') % 8 4
this informs listing order:
>>> {'bar': none, 'foo': none} {'foo': none, 'bar': none}
all slots except 3 , 4 empty, looping on table first lists slot 3, slot 4, 'foo'
listed before 'bar'
.
bar
, baz
, however, have hash values 8 apart , map exact same slot, 4
:
>>> hash('bar') 327024216814240868 >>> hash('baz') 327024216814240876 >>> hash('bar') % 8 4 >>> hash('baz') % 8 4
their order depends on key slotted first; second key have moved next slot:
>>> {'baz': none, 'bar': none} {'bar': none, 'baz': none} >>> {'bar': none, 'baz': none} {'baz': none, 'bar': none}
the table order differs here, because 1 or other key slotted first.
the technical name underlying structure used cpython (the commonly used python implemenation) hash table, 1 uses open addressing. if curious, , understand c enough, take @ c implementation (well documented) details. watch pycon 2010 presentation brandon rhodes how cpython dict
works, or pick copy of beautiful code, includes chapter on implementation written andrew kuchling.
note of python 3.3, random hash seed used well, making hash collisions unpredictable prevent types of denial of service (where attacker renders python server unresponsive causing mass hash collisions). means order of given dictionary also dependent on random hash seed current python invocation.
other implementations free use different structure dictionaries, long satisfy documented python interface them, believe implementations far use variation of hash table.
cpython 3.6 introduces new dict
implementation maintains insertion order, , faster , more memory efficient boot. rather keep large sparse table each row references stored hash value, , key , value objects, new implementation adds smaller hash array references indices in dense table (one contains many rows there actual key-value pairs), , dense table happens list contained items in order. see proposal python-dev more details. note considered implementation detail, python-the-language not specify other implementations have retain order.
python 2.7 , newer provides ordereddict
class, subclass of dict
adds additional data structure record key order. @ price of speed , memory, class remembers in order inserted keys; listing keys, values or items in order. uses doubly-linked list stored in additional dictionary keep order up-to-date efficiently. see post raymond hettinger outlining idea. note set
type still unordered.
if wanted ordered set, can install oset
package; works on python 2.5 , up.
Comments
Post a Comment