python - tempfile.TemporaryFile vs. StringIO -


i've written little benchmark compare different string concatenating methods zocache.

so looks here tempfile.temporaryfile faster else:

$ python src/zocache/tmp_benchmark.py  3.00407409668e-05 temporaryfile 0.385630846024 spooledtemporaryfile 0.299962997437 bufferedrandom 0.0849719047546 io.stringio 0.113346099854 concat 

the benchmark code i've been using:

#!/usr/bin/python __future__ import print_function import io import timeit import tempfile   class error(exception):     pass   def bench_temporaryfile():     tempfile.temporaryfile(bufsize=10*1024*1024) out:         in range(0, 100):             out.write(b"value = ")             out.write(bytes(i))             out.write(b" ")          # string.         out.seek(0)         contents = out.read()         out.close()         # test first letter.         if contents[0:5] != b"value":             raise error   def bench_spooledtemporaryfile():     tempfile.spooledtemporaryfile(max_size=10*1024*1024) out:         in range(0, 100):             out.write(b"value = ")             out.write(bytes(i))             out.write(b" ")          # string.         out.seek(0)         contents = out.read()         out.close()         # test first letter.         if contents[0:5] != b"value":             raise error   def bench_bufferedrandom():     # 1. bufferedrandom     io.open('out.bin', mode='w+b') fp:         io.bufferedrandom(fp, buffer_size=10*1024*1024) out:             in range(0, 100):                 out.write(b"value = ")                 out.write(bytes(i))                 out.write(b" ")              # string.             out.seek(0)             contents = out.read()             # test first letter.             if contents[0:5] != b'value':                 raise error   def bench_stringio():     # 1. use stringio.     out = io.stringio()     in range(0, 100):         out.write(u"value = ")         out.write(unicode(i))         out.write(u" ")      # string.     contents = out.getvalue()     out.close()     # test first letter.     if contents[0] != 'v':         raise error   def bench_concat():     # 2. use string appends.     data = ""     in range(0, 100):         data += u"value = "         data += unicode(i)         data += u" "     # test first letter.     if data[0] != u'v':         raise error   if __name__ == '__main__':     print(str(timeit.timeit('bench_temporaryfile()', setup="from __main__ import bench_temporaryfile", number=1000)) + " temporaryfile")     print(str(timeit.timeit('bench_spooledtemporaryfile()', setup="from __main__ import bench_spooledtemporaryfile", number=1000)) + " spooledtemporaryfile")     print(str(timeit.timeit('bench_bufferedrandom()', setup="from __main__ import bench_bufferedrandom", number=1000)) + " bufferedrandom")     print(str(timeit.timeit("bench_stringio()", setup="from __main__ import bench_stringio", number=1000)) + " io.stringio")     print(str(timeit.timeit("bench_concat()", setup="from __main__ import bench_concat", number=1000)) + " concat") 

edit python3.4.3 + io.bytesio

python3 ./src/zocache/tmp_benchmark.py  2.689500024644076e-05 temporaryfile 0.30429405899985795 spooledtemporaryfile 0.348170792000019 bufferedrandom 0.0764778530001422 io.bytesio 0.05162201000030109 concat 

new source io.bytesio:

#!/usr/bin/python3 __future__ import print_function import io import timeit import tempfile   class error(exception):     pass   def bench_temporaryfile():     tempfile.temporaryfile() out:         in range(0, 100):             out.write(b"value = ")             out.write(bytes(str(i), 'utf-8'))             out.write(b" ")          # string.         out.seek(0)         contents = out.read()         out.close()         # test first letter.         if contents[0:5] != b"value":             raise error   def bench_spooledtemporaryfile():     tempfile.spooledtemporaryfile(max_size=10*1024*1024) out:         in range(0, 100):             out.write(b"value = ")             out.write(bytes(str(i), 'utf-8'))             out.write(b" ")          # string.         out.seek(0)         contents = out.read()         out.close()         # test first letter.         if contents[0:5] != b"value":             raise error   def bench_bufferedrandom():     # 1. bufferedrandom     io.open('out.bin', mode='w+b') fp:         io.bufferedrandom(fp, buffer_size=10*1024*1024) out:             in range(0, 100):                 out.write(b"value = ")                 out.write(bytes(i))                 out.write(b" ")              # string.             out.seek(0)             contents = out.read()             # test first letter.             if contents[0:5] != b'value':                 raise error   def bench_bytesio():     # 1. use stringio.     out = io.bytesio()     in range(0, 100):         out.write(b"value = ")         out.write(bytes(str(i), 'utf-8'))         out.write(b" ")      # string.     contents = out.getvalue()     out.close()     # test first letter.     if contents[0:5] != b'value':         raise error   def bench_concat():     # 2. use string appends.     data = ""     in range(0, 100):         data += "value = "         data += str(i)         data += " "     # test first letter.     if data[0] != 'v':         raise error   if __name__ == '__main__':     print(str(timeit.timeit('bench_temporaryfile()', setup="from __main__ import bench_temporaryfile", number=1000)) + " temporaryfile")     print(str(timeit.timeit('bench_spooledtemporaryfile()', setup="from __main__ import bench_spooledtemporaryfile", number=1000)) + " spooledtemporaryfile")     print(str(timeit.timeit('bench_bufferedrandom()', setup="from __main__ import bench_bufferedrandom", number=1000)) + " bufferedrandom")     print(str(timeit.timeit("bench_bytesio()", setup="from __main__ import bench_bytesio", number=1000)) + " io.bytesio")     print(str(timeit.timeit("bench_concat()", setup="from __main__ import bench_concat", number=1000)) + " concat") 

is true every platform? , if why?

edit: results fixed benchmark (and fixed code):

0.2675984420002351 temporaryfile 0.28104681999866443 spooledtemporaryfile 0.3555715570000757 bufferedrandom 0.10379689100045653 io.bytesio 0.05650951399911719 concat 

your biggest problem: per tdelaney, never ran temporaryfile test; omitted parens in timeit snippet (and test, others ran). timing time taken lookup name bench_temporaryfile, not call it. change:

print(str(timeit.timeit('bench_temporaryfile', setup="from __main__ import bench_temporaryfile", number=1000)) + " temporaryfile") 

to:

print(str(timeit.timeit('bench_temporaryfile()', setup="from __main__ import bench_temporaryfile", number=1000)) + " temporaryfile") 

(adding parens make call) fix.

some other issues:

io.stringio fundamentally different other test cases. specifically, other types you're testing operate in binary mode, reading , writing str, , avoiding line ending conversions. io.stringio uses python 3 style strings (unicode in python 2), tests acknowledge using different literals , converting unicode instead of bytes. adds lot of encoding , decoding overhead, using lot more memory (unicode uses 2-4x memory of str same data, means more allocator overhead, more copy overhead, etc.).

the other major difference you're setting huge bufsize temporaryfile; few system calls need occur, , writes appending contiguous memory in buffer. contrast, io.stringio storing individual values written, , joining them when ask them getvalue().

also, lastly, think you're being forward compatible using bytes constructor, you're not; in python 2 bytes alias str, bytes(10) returns '10', in python 3, bytes totally different thing, , passing integer returns 0 initialized bytes object of size, bytes(10) returns b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'.

if want fair test case, @ least switch cstringio.stringio or io.bytesio instead of io.stringio , write bytes uniformly. typically, wouldn't explicitly set buffer size temporaryfile , yourself, might consider dropping that.

in own tests on linux x64 python 2.7.10, using ipython's %timeit magic, ranking is:

  1. io.bytesio ~48 μs per loop
  2. io.stringio ~54 μs per loop (so unicode overhead didn't add much)
  3. cstringio.stringio ~83 μs per loop
  4. temporaryfile ~2.8 ms per loop (note units; ms 1000x longer μs)

and that's without going default buffer sizes (i kept explicit bufsize tests). suspect behavior of temporaryfile vary lot more (depending on os , how temporary files handled; systems might store in memory, others might store in /tmp, of course, /tmp might ramdisk anyway).

something tells me may have setup temporaryfile plain memory buffer never goes file system, mine may ending on persistent storage (if short periods); stuff happening in memory predictable, when involve file system (which temporaryfile can, depending on os, kernel settings, etc.), behavior differ great deal between systems.


Comments

Popular posts from this blog

java - pagination of xlsx file to XSSFworkbook using apache POI -

Unlimited choices in BASH case statement -

apache - How do I stop my index.php being run twice for every user -