python - tempfile.TemporaryFile vs. StringIO -
i've written little benchmark compare different string concatenating methods zocache.
so looks here tempfile.temporaryfile faster else:
$ python src/zocache/tmp_benchmark.py 3.00407409668e-05 temporaryfile 0.385630846024 spooledtemporaryfile 0.299962997437 bufferedrandom 0.0849719047546 io.stringio 0.113346099854 concat the benchmark code i've been using:
#!/usr/bin/python __future__ import print_function import io import timeit import tempfile class error(exception): pass def bench_temporaryfile(): tempfile.temporaryfile(bufsize=10*1024*1024) out: in range(0, 100): out.write(b"value = ") out.write(bytes(i)) out.write(b" ") # string. out.seek(0) contents = out.read() out.close() # test first letter. if contents[0:5] != b"value": raise error def bench_spooledtemporaryfile(): tempfile.spooledtemporaryfile(max_size=10*1024*1024) out: in range(0, 100): out.write(b"value = ") out.write(bytes(i)) out.write(b" ") # string. out.seek(0) contents = out.read() out.close() # test first letter. if contents[0:5] != b"value": raise error def bench_bufferedrandom(): # 1. bufferedrandom io.open('out.bin', mode='w+b') fp: io.bufferedrandom(fp, buffer_size=10*1024*1024) out: in range(0, 100): out.write(b"value = ") out.write(bytes(i)) out.write(b" ") # string. out.seek(0) contents = out.read() # test first letter. if contents[0:5] != b'value': raise error def bench_stringio(): # 1. use stringio. out = io.stringio() in range(0, 100): out.write(u"value = ") out.write(unicode(i)) out.write(u" ") # string. contents = out.getvalue() out.close() # test first letter. if contents[0] != 'v': raise error def bench_concat(): # 2. use string appends. data = "" in range(0, 100): data += u"value = " data += unicode(i) data += u" " # test first letter. if data[0] != u'v': raise error if __name__ == '__main__': print(str(timeit.timeit('bench_temporaryfile()', setup="from __main__ import bench_temporaryfile", number=1000)) + " temporaryfile") print(str(timeit.timeit('bench_spooledtemporaryfile()', setup="from __main__ import bench_spooledtemporaryfile", number=1000)) + " spooledtemporaryfile") print(str(timeit.timeit('bench_bufferedrandom()', setup="from __main__ import bench_bufferedrandom", number=1000)) + " bufferedrandom") print(str(timeit.timeit("bench_stringio()", setup="from __main__ import bench_stringio", number=1000)) + " io.stringio") print(str(timeit.timeit("bench_concat()", setup="from __main__ import bench_concat", number=1000)) + " concat") edit python3.4.3 + io.bytesio
python3 ./src/zocache/tmp_benchmark.py 2.689500024644076e-05 temporaryfile 0.30429405899985795 spooledtemporaryfile 0.348170792000019 bufferedrandom 0.0764778530001422 io.bytesio 0.05162201000030109 concat new source io.bytesio:
#!/usr/bin/python3 __future__ import print_function import io import timeit import tempfile class error(exception): pass def bench_temporaryfile(): tempfile.temporaryfile() out: in range(0, 100): out.write(b"value = ") out.write(bytes(str(i), 'utf-8')) out.write(b" ") # string. out.seek(0) contents = out.read() out.close() # test first letter. if contents[0:5] != b"value": raise error def bench_spooledtemporaryfile(): tempfile.spooledtemporaryfile(max_size=10*1024*1024) out: in range(0, 100): out.write(b"value = ") out.write(bytes(str(i), 'utf-8')) out.write(b" ") # string. out.seek(0) contents = out.read() out.close() # test first letter. if contents[0:5] != b"value": raise error def bench_bufferedrandom(): # 1. bufferedrandom io.open('out.bin', mode='w+b') fp: io.bufferedrandom(fp, buffer_size=10*1024*1024) out: in range(0, 100): out.write(b"value = ") out.write(bytes(i)) out.write(b" ") # string. out.seek(0) contents = out.read() # test first letter. if contents[0:5] != b'value': raise error def bench_bytesio(): # 1. use stringio. out = io.bytesio() in range(0, 100): out.write(b"value = ") out.write(bytes(str(i), 'utf-8')) out.write(b" ") # string. contents = out.getvalue() out.close() # test first letter. if contents[0:5] != b'value': raise error def bench_concat(): # 2. use string appends. data = "" in range(0, 100): data += "value = " data += str(i) data += " " # test first letter. if data[0] != 'v': raise error if __name__ == '__main__': print(str(timeit.timeit('bench_temporaryfile()', setup="from __main__ import bench_temporaryfile", number=1000)) + " temporaryfile") print(str(timeit.timeit('bench_spooledtemporaryfile()', setup="from __main__ import bench_spooledtemporaryfile", number=1000)) + " spooledtemporaryfile") print(str(timeit.timeit('bench_bufferedrandom()', setup="from __main__ import bench_bufferedrandom", number=1000)) + " bufferedrandom") print(str(timeit.timeit("bench_bytesio()", setup="from __main__ import bench_bytesio", number=1000)) + " io.bytesio") print(str(timeit.timeit("bench_concat()", setup="from __main__ import bench_concat", number=1000)) + " concat") is true every platform? , if why?
edit: results fixed benchmark (and fixed code):
0.2675984420002351 temporaryfile 0.28104681999866443 spooledtemporaryfile 0.3555715570000757 bufferedrandom 0.10379689100045653 io.bytesio 0.05650951399911719 concat
your biggest problem: per tdelaney, never ran temporaryfile test; omitted parens in timeit snippet (and test, others ran). timing time taken lookup name bench_temporaryfile, not call it. change:
print(str(timeit.timeit('bench_temporaryfile', setup="from __main__ import bench_temporaryfile", number=1000)) + " temporaryfile") to:
print(str(timeit.timeit('bench_temporaryfile()', setup="from __main__ import bench_temporaryfile", number=1000)) + " temporaryfile") (adding parens make call) fix.
some other issues:
io.stringio fundamentally different other test cases. specifically, other types you're testing operate in binary mode, reading , writing str, , avoiding line ending conversions. io.stringio uses python 3 style strings (unicode in python 2), tests acknowledge using different literals , converting unicode instead of bytes. adds lot of encoding , decoding overhead, using lot more memory (unicode uses 2-4x memory of str same data, means more allocator overhead, more copy overhead, etc.).
the other major difference you're setting huge bufsize temporaryfile; few system calls need occur, , writes appending contiguous memory in buffer. contrast, io.stringio storing individual values written, , joining them when ask them getvalue().
also, lastly, think you're being forward compatible using bytes constructor, you're not; in python 2 bytes alias str, bytes(10) returns '10', in python 3, bytes totally different thing, , passing integer returns 0 initialized bytes object of size, bytes(10) returns b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'.
if want fair test case, @ least switch cstringio.stringio or io.bytesio instead of io.stringio , write bytes uniformly. typically, wouldn't explicitly set buffer size temporaryfile , yourself, might consider dropping that.
in own tests on linux x64 python 2.7.10, using ipython's %timeit magic, ranking is:
io.bytesio~48 μs per loopio.stringio~54 μs per loop (sounicodeoverhead didn't add much)cstringio.stringio~83 μs per looptemporaryfile~2.8 ms per loop (note units; ms 1000x longer μs)
and that's without going default buffer sizes (i kept explicit bufsize tests). suspect behavior of temporaryfile vary lot more (depending on os , how temporary files handled; systems might store in memory, others might store in /tmp, of course, /tmp might ramdisk anyway).
something tells me may have setup temporaryfile plain memory buffer never goes file system, mine may ending on persistent storage (if short periods); stuff happening in memory predictable, when involve file system (which temporaryfile can, depending on os, kernel settings, etc.), behavior differ great deal between systems.
Comments
Post a Comment