performance - MongoDb's Sharding does not improve application in a Lab setup -

i'm developing mobile application powered mongo database, , working fine right now, want add sharding prepared future.

in order test this, we've created lab environment (running in hyper-v) test various scenario's:

the following servers have been created:

ubuntu server 14.04.3 non-sharding (database server) (256 mb ram / limit 10% cpu).
ubuntu server 14.04.3 sharding (configuration server) (256 mb ram / limit 10% cpu).
ubuntu server 14.04.3 sharding (query router server) (256 mb ram / limit 10% cpu).
ubuntu server 14.04.3 sharding (database server 01) (256 mb ram / limit 10% cpu).
ubuntu server 14.04.3 sharding (database server 02) (256 mb ram / limit 10% cpu).

a small console application have been created in c# able measure time perform insert.

this console application import 10.000 persons following properties: - name - firstname - full name - date of birth - id

all 10.000 records differs '_id', other fields same records.

it's important note every test exactely run 3 times. after every test, database removed system clean again.

find results of test below:

insert 10.000 records without sharding

writing 10.000 records | non-sharding environment - full disk io #1: 14 seconds. writing 10.000 records | non-sharding environment - full disk io #2: 14 seconds. writing 10.000 records | non-sharding environment - full disk io #3: 12 seconds.

insert 10.000 records single database shard

note: sharding key has been set hashed _id field. see json below (partial) sharding information:

shards:   {  "_id" : "shard0000",  "host" : "192.168.137.12:27017" }  databases:   {  "_id" : "demodatabase",  "primary" : "shard0000",  "partitioned" : true }           demodatabase.persons                   shard key: { "_id" : "hashed" }                   unique: false                   balancing: true                   chunks:                           shard0000       2                   { "_id" : { "$minkey" : 1 } } -->> { "_id" : numberlong(0) } on : shard0000 timestamp(1, 1)                   { "_id" : numberlong(0) } -->> { "_id" : { "$maxkey" : 1 } } on : shard0000 timestamp(1, 2)

results:

writing 10.000 records | single sharding environment - full disk io #1: 1 minute, 59 seconds. writing 10.000 records | single sharding environment - full disk io #2: 1 minute, 51 seconds. writing 10.000 records | single sharding environment - full disk io #3: 1 minute, 52 seconds.

insert 10.000 records double database shard

note: sharding key has been set hashed _id field. see json below (partial) sharding information:

shards:   {  "_id" : "shard0000",  "host" : "192.168.137.12:27017" }   {  "_id" : "shard0001",  "host" : "192.168.137.13:27017" }  databases:   {  "_id" : "demodatabase",  "primary" : "shard0000",  "partitioned" : true }           demodatabase.persons                   shard key: { "_id" : "hashed" }                   unique: false                   balancing: true                   chunks:                           shard0000       2                   { "_id" : { "$minkey" : 1 } } -->> { "_id" : numberlong("-4611686018427387902") } on : shard0000 timestamp(2, 2)                   { "_id" : numberlong("-4611686018427387902") } -->> { "_id" : numberlong(0) } on : shard0000 timestamp(2, 3)                   { "_id" : numberlong(0) } -->> { "_id" : numberlong("4611686018427387902") } on : shard0001 timestamp(2, 4)                   { "_id" : numberlong("4611686018427387902") } -->> { "_id" : { "$maxkey" : 1 } } on : shard0001 timestamp(2, 5)

results:

writing 10.000 records | single sharding environment - full disk io #1: 49 seconds. writing 10.000 records | single sharding environment - full disk io #2: 53 seconds. writing 10.000 records | single sharding environment - full disk io #3: 54 seconds.

according tests executed above, sharding work, more shards add, better performance. however, don't understand why i'm facing such huge performance drop when working shards rather using single server.

i need blazing fast reading , writing s tought sharding solution, seems i'm missing here.

anyone why can point me in right direction?

kind regards

the layers between routing server , config server, routing server , data nodes add latency. if have 1ms ping * 10k inserts, have 10 seconds of latency not appear in unsharded setup.

depending on configured level of write-concern (if configured level of write-acknowledgement), have additional 10 seconds benchmarks on sharded environment due blocking until acknowledgement received data node.

if write-concern set acknowledge , have replica nodes, have wait write propagate replica nodes, adding additional network latency. (you don't appear have replica nodes though). , depending on network topology, write-concern add multiple layers of network latency if use default setting allow chained replication (secondaries sync other secondaries). https://docs.mongodb.org/manual/tutorial/manage-chained-replication/. if have additional indexes , write concern, each replica node have write index before returning write-acknowledgement (it possible disable indexes on replica nodes though)

without sharding , without replication (but write-acknowledgement), while inserts still block on insert, there no additional latency due network layer.

hashing _id field has cost accumulates maybe few seconds total 10k. can use _id field high degree of randomness avoid hashing, don't think affects performance much.

Search This Blog

Color

performance - MongoDb's Sharding does not improve application in a Lab setup -

Comments

Post a Comment

Popular posts from this blog

Redirect to a HTTPS version using .htaccess -

Unlimited choices in BASH case statement -

javascript - jQuery: Add class depending on URL in the best way -