java - Efficient way of storing and matching names against large data sets -


for data loss prevention tool, have requirement need lookup different types of data such driver's license number, social security number, names etc. while of pattern based , hence looked using pattern matching regular expressions, name happens broad category. there virtually set of characters form name. however, make meaningful lookup, think should lookup them against defined dictionary of names. here thinking.

provide dictionary of names configuration item. looks more sensible each use case, names might vary different geographic regions. looking best practices doing in java. these questions-

  1. what data structure store names. set comes mind first option, there better options in memory databases.
  2. how should go searching these names in large data sets. these data sets large , have facility read them row row.
  3. any other option?

take @ concurrent-trees , cqengine projects.


Comments

Popular posts from this blog

javascript - jQuery: Add class depending on URL in the best way -

caching - How to check if a url path exists in the service worker cache -

Redirect to a HTTPS version using .htaccess -