linux - speeding up GNU find on several folders -

June 15, 2010

on linux 64bit centos server, running gnu find command on several folders, each of them containing similar subfolder structure. structure is:

/my/group/folder/project_123/project_123-12345678/*/*file_pattern_at_this_level* /my/group/folder/project_234/project_234-23456789/*/*file_pattern_at_this_level*

the folder asterisk /*/ indicate there bunch of subfolders inside each project folder, of varying names.

i have tried adding final asterisk , limiting find command -mindepth n , -maxdepth n:

find $folder1 $folder2 $folder3 -mindepth 1 -maxdepth 1 -name "*file_pattern*"

but tests on server node has other running jobs, it's difficult fair performance comparison, due level of caching taking place after first command, makes first type of command slow , second equivalent type faster.

this multicore node, else try make type of commands faster?

"actually commands find , grep io-bound: disk bottleneck, not cpu. in such cases, if run several instances in parallel, compete i/o bandwidth , cache, , slower." - https://unix.stackexchange.com/a/111409

don't worry "finding" files, worry need them. can parallelize "parallel" or "xargs".

if still want pursue that, still try use "parallel" find, passing list of directories. cause parallel spawn bunch of find processes (-j option sets how many "threads" running simultaneously) process "queue". in scenario needing set std out file, review output later, or not, depending on use.

Search This Blog

Color

linux - speeding up GNU find on several folders -

Comments

Post a Comment

Popular posts from this blog

Redirect to a HTTPS version using .htaccess -

Unlimited choices in BASH case statement -

javascript - jQuery: Add class depending on URL in the best way -