performance - MongoDb's Sharding does not improve application in a Lab setup -
i'm developing mobile application powered mongo database, , working fine right now, want add sharding
prepared future.
in order test this, we've created lab environment (running in hyper-v) test various scenario's:
the following servers have been created:
- ubuntu server 14.04.3 non-sharding (database server) (256 mb ram / limit 10% cpu).
- ubuntu server 14.04.3 sharding (configuration server) (256 mb ram / limit 10% cpu).
- ubuntu server 14.04.3 sharding (query router server) (256 mb ram / limit 10% cpu).
- ubuntu server 14.04.3 sharding (database server 01) (256 mb ram / limit 10% cpu).
- ubuntu server 14.04.3 sharding (database server 02) (256 mb ram / limit 10% cpu).
a small console application have been created in c# able measure time perform insert.
this console application import 10.000 persons following properties: - name - firstname - full name - date of birth - id
all 10.000 records differs '_id', other fields same records.
it's important note every test exactely run 3 times. after every test, database removed system clean again.
find results of test below:
insert 10.000 records without sharding
writing 10.000 records | non-sharding environment - full disk io #1: 14 seconds. writing 10.000 records | non-sharding environment - full disk io #2: 14 seconds. writing 10.000 records | non-sharding environment - full disk io #3: 12 seconds.
insert 10.000 records single database shard
note: sharding key has been set hashed _id
field. see json below (partial) sharding information:
shards: { "_id" : "shard0000", "host" : "192.168.137.12:27017" } databases: { "_id" : "demodatabase", "primary" : "shard0000", "partitioned" : true } demodatabase.persons shard key: { "_id" : "hashed" } unique: false balancing: true chunks: shard0000 2 { "_id" : { "$minkey" : 1 } } -->> { "_id" : numberlong(0) } on : shard0000 timestamp(1, 1) { "_id" : numberlong(0) } -->> { "_id" : { "$maxkey" : 1 } } on : shard0000 timestamp(1, 2)
results:
writing 10.000 records | single sharding environment - full disk io #1: 1 minute, 59 seconds. writing 10.000 records | single sharding environment - full disk io #2: 1 minute, 51 seconds. writing 10.000 records | single sharding environment - full disk io #3: 1 minute, 52 seconds.
insert 10.000 records double database shard
note: sharding key has been set hashed _id
field. see json below (partial) sharding information:
shards: { "_id" : "shard0000", "host" : "192.168.137.12:27017" } { "_id" : "shard0001", "host" : "192.168.137.13:27017" } databases: { "_id" : "demodatabase", "primary" : "shard0000", "partitioned" : true } demodatabase.persons shard key: { "_id" : "hashed" } unique: false balancing: true chunks: shard0000 2 { "_id" : { "$minkey" : 1 } } -->> { "_id" : numberlong("-4611686018427387902") } on : shard0000 timestamp(2, 2) { "_id" : numberlong("-4611686018427387902") } -->> { "_id" : numberlong(0) } on : shard0000 timestamp(2, 3) { "_id" : numberlong(0) } -->> { "_id" : numberlong("4611686018427387902") } on : shard0001 timestamp(2, 4) { "_id" : numberlong("4611686018427387902") } -->> { "_id" : { "$maxkey" : 1 } } on : shard0001 timestamp(2, 5)
results:
writing 10.000 records | single sharding environment - full disk io #1: 49 seconds. writing 10.000 records | single sharding environment - full disk io #2: 53 seconds. writing 10.000 records | single sharding environment - full disk io #3: 54 seconds.
according tests executed above, sharding work, more shards add, better performance. however, don't understand why i'm facing such huge performance drop when working shards rather using single server.
i need blazing fast reading , writing s tought sharding solution, seems i'm missing here.
anyone why can point me in right direction?
kind regards
the layers between routing server , config server, routing server , data nodes add latency. if have 1ms ping * 10k inserts, have 10 seconds of latency not appear in unsharded setup.
depending on configured level of write-concern (if configured level of write-acknowledgement), have additional 10 seconds benchmarks on sharded environment due blocking until acknowledgement received data node.
if write-concern set acknowledge , have replica nodes, have wait write propagate replica nodes, adding additional network latency. (you don't appear have replica nodes though). , depending on network topology, write-concern add multiple layers of network latency if use default setting allow chained replication (secondaries sync other secondaries). https://docs.mongodb.org/manual/tutorial/manage-chained-replication/. if have additional indexes , write concern, each replica node have write index before returning write-acknowledgement (it possible disable indexes on replica nodes though)
without sharding , without replication (but write-acknowledgement), while inserts still block on insert, there no additional latency due network layer.
hashing _id field has cost accumulates maybe few seconds total 10k. can use _id field high degree of randomness avoid hashing, don't think affects performance much.
Comments
Post a Comment