indexing - MySql - Index optimization -


we having analytics product. each of our customer give 1 javascript code, put in web sites. if user visit our customer site java script code hit our server store page visit on behalf of our customer. each of our customer contains unique domain name means customer determined domain nam

database server : mysql 5.6 table rows : 400 million

following our table schema.

+---------------+------------------+------+-----+---------+----------------+ | field         | type             | null | key | default |          | +---------------+------------------+------+-----+---------+----------------+ | id            | int(10) unsigned | no   | pri | null    | auto_increment | | domain        | varchar(50)      | no   | mul | null    |                | | guid          | binary(16)       | yes  |     | null    |                | | sid           | binary(16)       | yes  |     | null    |                | | url           | varchar(2500)    | yes  |     | null    |                | | ip            | varbinary(16)    | yes  |     | null    |                | | is_new        | tinyint(1)       | yes  |     | null    |                | | ref           | varchar(2500)    | yes  |     | null    |                | | user_agent    | varchar(255)     | yes  |     | null    |                | | stats_time    | datetime         | yes  |     | null    |                | | country       | char(2)          | yes  |     | null    |                | | region        | char(3)          | yes  |     | null    |                | | city          | varchar(80)      | yes  |     | null    |                | | city_lat_long | varchar(50)      | yes  |     | null    |                | | email         | varchar(100)     | yes  |     | null    |                | +---------------+------------------+------+-----+---------+----------------+ 

in above table guid represents visitor of our customer site , sid represents visitor session of our customer site. means every sid there should associated guid.

we need queries following

query 1 : find unique,total visitors

select count(distinct guid) count,count(guid) total page_views domain = 'abc' , stats_time between '2015-10-05 00:00:00' , '2015-10-04 23:59:59' 

composite index planning : domain,stats_time,sid

query 2 : find unique,total sessions

select count(distinct sid) count,count(sid) total page_views domain = 'abc' , stats_time between '2015-10-05 00:00:00' , '2015-10-04 23:59:59' 

composite index planning : domain,stats_time,guid

query 3: find visitors,sessions country ,by region, city

composite index planning : domain,country

composite index planning : domain,region

each combination requiring new composite index. means huge index file, can't keep in memory performance of queries low.

is there way optimize index combinations reduce index size , improve performance.

just grins, run see type of spread have...

select        country, region, city,        date_format(colname, '%y-%m-%d') dateonly, count(*)           yourtable     group       country, region, city,        date_format(colname, '%y-%m-%d')    order       count(*) desc 

and see how many rows returns. also, sort of range count column generate. instead of index, make sense create separate aggregation table on key elements trying provide data mining.

if so, recommend looking @ similar post also on stack here. shows sample on how, first @ counts before suggesting further. if have broken down on daily basis, might reduced to.

additionally, might want create pre-aggregate tables once started, have nightly procedure builds new records based on day completed. way never running through 400m records.

if pre-aggregate tables store based on date (y,m,d only), queries rolled-up per day shorten querying requirements. count(*) example basis, add count( distinct whatevercolumn ) needed. then, query sum( aggregatecolumn ) based on domain, date range, etc. if 400m records gets reduced down 7m records, have minimum index on (domain, dateonlyfield, , maybe country) optimize domain, date-range queries. once narrowed down @ whatever level make sense, drill raw data granular level.


Comments

Popular posts from this blog

java - pagination of xlsx file to XSSFworkbook using apache POI -

Unlimited choices in BASH case statement -

apache - How do I stop my index.php being run twice for every user -