lucene.net - Lucene - custom analyzer/tokenizer to index JSON key pair values -
i'm aiming store , index json key pair values. ideally store them in constant fieldname. (for simplicity sake, "grades")
an example of incoming json object:
"data": [{ "key": "dp01", "value": "excellent" }, { "key": "dp02", "value": "average" }, { "key": "dp03", "value": "negative" }]
the json object serialized , stored is, index in way enable me search within same field key , value. main idea search multiple values within same lucene field.
any suggestions on how structure indexing? lets imagine example search using following query:
[grades: "key:dp01 uniqueidasdelimiter value:excellent"]
how customer analyzer/tokenizer achieve ?
edit: attempt depict goal more accurately.
think of typical relational type of structure (for simplicity sake).
each document website.
a website can have multiple images (and other important metadata).
each image has multiple sets of free keyvaluepair properties:
{ "key": "scenery", "value": "nature" }, { "key": "style", "value": "vintage" }
another set:
{ "key": "scenery", "value": "industrial" }, { "key": "style", "value": "vintage" }
my challenge come similar type of structure , index in way enables me build queries such as:
a website image of scenery:industrial , style:vintage.
i'm taking wrong approach indicated andy pook. ideas how efficiently flatten out these properties?
a common "problem" think indexes , documents having consistent set of fields. not same relational database tables of fixed set of columns.
in previous life had entity set of "attributes". key/value collection (much grades).
each document created fields named each attribute ie "attr-thing" value added "not_analyzed".
so, in example i'd create fields like
new field("grade-"+gradeid, grade, field.store.no, field.index.not_analyzed)
then can search query "grade-dp01:excellent".
alternatively can have fixed field name (similar @cris-almodovar) , set value "id=grade". again not_analyzed. search "grade:dp01=excellent".
either work. i've used both approaches success typically prefer first.
additional in response edit...
i think understand problem... if had "scenery=industrial style=vintage" , "scenery=nature style=modern" wouldn't want match if searched "nature , vintage", right?
you add "imagetype" field each set value "scenery=industrial style=vintage abc=xyz" keywordanalyzer (just splits space).
then search imagetype:"scenery=industrial style=vintage"~2
. using slop phrase guarantees values in same field , slop allows order different or there values. number you'd have figure out based on number of properties expect in each field. simplistically, if expect there max of n values slop should n too.
Comments
Post a Comment