linux - speeding up GNU find on several folders -


on linux 64bit centos server, running gnu find command on several folders, each of them containing similar subfolder structure. structure is:

/my/group/folder/project_123/project_123-12345678/*/*file_pattern_at_this_level* /my/group/folder/project_234/project_234-23456789/*/*file_pattern_at_this_level* 

the folder asterisk /*/ indicate there bunch of subfolders inside each project folder, of varying names.

i have tried adding final asterisk , limiting find command -mindepth n , -maxdepth n:

find $folder1 $folder2 $folder3 -mindepth 1 -maxdepth 1 -name "*file_pattern*" 

but tests on server node has other running jobs, it's difficult fair performance comparison, due level of caching taking place after first command, makes first type of command slow , second equivalent type faster.

this multicore node, else try make type of commands faster?

"actually commands find , grep io-bound: disk bottleneck, not cpu. in such cases, if run several instances in parallel, compete i/o bandwidth , cache, , slower." - https://unix.stackexchange.com/a/111409

don't worry "finding" files, worry need them. can parallelize "parallel" or "xargs".

if still want pursue that, still try use "parallel" find, passing list of directories. cause parallel spawn bunch of find processes (-j option sets how many "threads" running simultaneously) process "queue". in scenario needing set std out file, review output later, or not, depending on use.


Comments

Popular posts from this blog

javascript - jQuery: Add class depending on URL in the best way -

caching - How to check if a url path exists in the service worker cache -

Redirect to a HTTPS version using .htaccess -