python - How to improve the retrieve Query performance in ArangoDB 2.7 with increasing the number of documents within a single collection -

August 15, 2011

i have stored data in arangodb 2.7 in following format:

    {"content": "book.xml", "type": "string", "name": "name", "key": 102}     {"content": "d:/xmlexample/book.xml", "type": "string", "name": "location", "key": 102}     {"content": "xml", "type": "string", "name": "mime-type", "key": 102}     {"content": 4130, "type": "string", "name": "size", "key": 102}     {"content": "sun aug 25 07:53:32 2013", "type": "string", "name": "created_date", "key": 102}     {"content": "wed jan 23 09:14:07 2013", "type": "string", "name": "modified_date", "key": 102}     {"content": "catalog", "type": "tag", "name": "root", "key": 102}     {"content": "book", "type": "string", "name": "tag", "key": 103}      {"content": "bk101", "type": {"py/type": "__builtin__.str"}, "name": "id", "key": 103}     {"content": "gambardella, matthew", "type": {"py/type": "__builtin__.str"}, "name": "author", "key": 1031}    {"content": "xml developer's guide", "type": {"py/type": "__builtin__.str"}, "name": "title", "key": 1031}     {"content": "computer", "type": {"py/type": "__builtin__.str"}, "name": "genre", "key": 1031}     {"content": "44.95", "type": {"py/type": "__builtin__.str"}, "name": "price", "key": 1031}     {"content": "2000-10-01", "type": {"py/type": "__builtin__.str"}, "name": "publish_date", "key": 1031}     {"content": "an in-depth @ creating applications xml.", "type": {"py/type": "__builtin__.str"}, "name": "description", "key": 1031}

as in increasing number of documents 1000, 10000,100000, 1000000, 10000000 , on.. average query response time increases increment in number of documents , varies 0.2 sec 3.0 seconds. have created hash index on collection. question whether can reduces increment in no of documents.

on other hand, have created full text index on content component, same thing happen in full text search, , response time gets vary .05 sec 0.3 sec.

so tell me there way reduce time further..

please tell me can further reduce response time?

one can not utilize indices in first level of nested for statements. however, starting arangodb 2.8 can utilize array indices:

the values query data.pname[*].name , data.pname[*].type lets create indices them:

db.dsp.ensureindex({type:"hash", fields: ['data[*].type']}); db.dsp.ensureindex({type:"hash", fields: ['data[*].name']});

and lets re-formulate query can utilize index. start simple version experiment , use explain revalidate uses index:

db._explain('for k in dsp filter "modified_date" in k.data[*].name return k') query string:  k in dsp filter "modified_date" in k.data[*].name return k  execution plan:  id   nodetype        est.   comment   1   singletonnode      1   * root   6   indexnode          1     - k in dsp   /* hash index scan */   5   returnnode         1       - return k  indexes used:    type   collection   unique   sparse   selectivity   fields               ranges   6   hash   dsp          false    false       100.00 %   [ `data[*].name` ]                                                ("modified_date" in k.`data`[*].`name`)

so see can filter on array conditions documents want inspect inner loop:

for k in dsp filter "modified_date" in k.data[*].name || "string" in k.data[*].type   p in k.data filter p.name == "modified_date" || p.type == "string" return p

Search This Blog

Color

python - How to improve the retrieve Query performance in ArangoDB 2.7 with increasing the number of documents within a single collection -

Comments

Post a Comment

Popular posts from this blog

Redirect to a HTTPS version using .htaccess -

Unlimited choices in BASH case statement -

javascript - jQuery: Add class depending on URL in the best way -