python - How to improve the retrieve Query performance in ArangoDB 2.7 with increasing the number of documents within a single collection -
i have stored data in arangodb 2.7 in following format:
{"content": "book.xml", "type": "string", "name": "name", "key": 102} {"content": "d:/xmlexample/book.xml", "type": "string", "name": "location", "key": 102} {"content": "xml", "type": "string", "name": "mime-type", "key": 102} {"content": 4130, "type": "string", "name": "size", "key": 102} {"content": "sun aug 25 07:53:32 2013", "type": "string", "name": "created_date", "key": 102} {"content": "wed jan 23 09:14:07 2013", "type": "string", "name": "modified_date", "key": 102} {"content": "catalog", "type": "tag", "name": "root", "key": 102} {"content": "book", "type": "string", "name": "tag", "key": 103} {"content": "bk101", "type": {"py/type": "__builtin__.str"}, "name": "id", "key": 103} {"content": "gambardella, matthew", "type": {"py/type": "__builtin__.str"}, "name": "author", "key": 1031} {"content": "xml developer's guide", "type": {"py/type": "__builtin__.str"}, "name": "title", "key": 1031} {"content": "computer", "type": {"py/type": "__builtin__.str"}, "name": "genre", "key": 1031} {"content": "44.95", "type": {"py/type": "__builtin__.str"}, "name": "price", "key": 1031} {"content": "2000-10-01", "type": {"py/type": "__builtin__.str"}, "name": "publish_date", "key": 1031} {"content": "an in-depth @ creating applications xml.", "type": {"py/type": "__builtin__.str"}, "name": "description", "key": 1031}
as in increasing number of documents 1000, 10000,100000, 1000000, 10000000 , on.. average query response time increases increment in number of documents , varies 0.2 sec 3.0 seconds. have created hash index on collection. question whether can reduces increment in no of documents.
on other hand, have created full text index on content component, same thing happen in full text search, , response time gets vary .05 sec 0.3 sec.
so tell me there way reduce time further..
please tell me can further reduce response time?
one can not utilize indices in first level of nested for
statements. however, starting arangodb 2.8 can utilize array indices:
the values query data.pname[*].name
, data.pname[*].type
lets create indices them:
db.dsp.ensureindex({type:"hash", fields: ['data[*].type']}); db.dsp.ensureindex({type:"hash", fields: ['data[*].name']});
and lets re-formulate query can utilize index. start simple version experiment , use explain revalidate uses index:
db._explain('for k in dsp filter "modified_date" in k.data[*].name return k') query string: k in dsp filter "modified_date" in k.data[*].name return k execution plan: id nodetype est. comment 1 singletonnode 1 * root 6 indexnode 1 - k in dsp /* hash index scan */ 5 returnnode 1 - return k indexes used: type collection unique sparse selectivity fields ranges 6 hash dsp false false 100.00 % [ `data[*].name` ] ("modified_date" in k.`data`[*].`name`)
so see can filter on array conditions documents want inspect inner loop:
for k in dsp filter "modified_date" in k.data[*].name || "string" in k.data[*].type p in k.data filter p.name == "modified_date" || p.type == "string" return p
Comments
Post a Comment