indexing - MySql - Index optimization -
we having analytics product. each of our customer give 1 javascript code, put in web sites. if user visit our customer site java script code hit our server store page visit on behalf of our customer. each of our customer contains unique domain name means customer determined domain nam
database server : mysql 5.6 table rows : 400 million
following our table schema.
+---------------+------------------+------+-----+---------+----------------+ | field | type | null | key | default | | +---------------+------------------+------+-----+---------+----------------+ | id | int(10) unsigned | no | pri | null | auto_increment | | domain | varchar(50) | no | mul | null | | | guid | binary(16) | yes | | null | | | sid | binary(16) | yes | | null | | | url | varchar(2500) | yes | | null | | | ip | varbinary(16) | yes | | null | | | is_new | tinyint(1) | yes | | null | | | ref | varchar(2500) | yes | | null | | | user_agent | varchar(255) | yes | | null | | | stats_time | datetime | yes | | null | | | country | char(2) | yes | | null | | | region | char(3) | yes | | null | | | city | varchar(80) | yes | | null | | | city_lat_long | varchar(50) | yes | | null | | | email | varchar(100) | yes | | null | | +---------------+------------------+------+-----+---------+----------------+ in above table guid represents visitor of our customer site , sid represents visitor session of our customer site. means every sid there should associated guid.
we need queries following
query 1 : find unique,total visitors
select count(distinct guid) count,count(guid) total page_views domain = 'abc' , stats_time between '2015-10-05 00:00:00' , '2015-10-04 23:59:59' composite index planning : domain,stats_time,sid
query 2 : find unique,total sessions
select count(distinct sid) count,count(sid) total page_views domain = 'abc' , stats_time between '2015-10-05 00:00:00' , '2015-10-04 23:59:59' composite index planning : domain,stats_time,guid
query 3: find visitors,sessions country ,by region, city
composite index planning : domain,country
composite index planning : domain,region
each combination requiring new composite index. means huge index file, can't keep in memory performance of queries low.
is there way optimize index combinations reduce index size , improve performance.
just grins, run see type of spread have...
select country, region, city, date_format(colname, '%y-%m-%d') dateonly, count(*) yourtable group country, region, city, date_format(colname, '%y-%m-%d') order count(*) desc and see how many rows returns. also, sort of range count column generate. instead of index, make sense create separate aggregation table on key elements trying provide data mining.
if so, recommend looking @ similar post also on stack here. shows sample on how, first @ counts before suggesting further. if have broken down on daily basis, might reduced to.
additionally, might want create pre-aggregate tables once started, have nightly procedure builds new records based on day completed. way never running through 400m records.
if pre-aggregate tables store based on date (y,m,d only), queries rolled-up per day shorten querying requirements. count(*) example basis, add count( distinct whatevercolumn ) needed. then, query sum( aggregatecolumn ) based on domain, date range, etc. if 400m records gets reduced down 7m records, have minimum index on (domain, dateonlyfield, , maybe country) optimize domain, date-range queries. once narrowed down @ whatever level make sense, drill raw data granular level.
Comments
Post a Comment