regex - Mysql query performance issues while using several REGEXP? -
in query need have several regular expression in order filter user name or email because not interesting on them. so, have written query in mysql , after running , took many time give me result. got problem wit performance. after running query, don't filtering information in correct way. not sure how can improve query in order :
- speed query response time
- applying correct filtering regular expression
i appreciated help.
select distinct t.user, vg_product_id,t.`platform`, pd.`mail`, substring_index(group_concat(p.ts order p.ts desc separator ','), ',', 1) sub_start_ts, substring_index(group_concat(t.`expires_at`order t.`expires_at` desc separator ','), ',', 1) expired_time users u inner join tariff_subs_info t on (t.`user` = u.`user_xmpp_login` , t.`user` not regexp ('^([a-za-z]{2,3}(produsero|usero)+[0-9]{1,3})$' or '(\w+|\d+)?test(\w+|\d+)?' ) , t.vg_product_id regexp "^(europe?|usa?|unlimited?|basic?)([a-za-z0-9]+|\_)+(and?|ios?)+$" ) left join plus_data pd on (u.`user_xmpp_login` = pd.`user`) inner join purchase_log p on (p.purchase_id = t.purchase_id) (pd.mail not '%guerrillamail.com' or pd.mail null) group 1,2 order date(p.ts);
and result:
noadstestuser basic_xxx_ios ios null 2015-10-26 14:00:32 2015-10-26 14:05:24 brusero2 unlimited_xx_ios ios brusero2@yhx.yg 2015-11-03 15:41:57 2015-11-03 15:46:45 brusero3 bxx_uscios ios brusero3@tb.fff 2015-11-03 15:43:53 2015-11-03 15:48:42 esusero1 unlixx_usc ios esusero1@es.userr 2015-11-03 13:51:54 2015-11-03 13:56:41 esusero3 basic_x_i os esusero3@yn.yyf 2015-11-03 13:55:08 2015-11-03 14:00:02 esusero4 basic_x ios esusero4@yn.ttx 2015-11-03 14:01:50 2015-11-03 14:06:38 esusero5 unxxxed_us ios esusero5@uh.hhb 2015-11-03 14:45:38 2015-11-03 14:50:24 esusero6 basic_xx ios esusero6@yh.hvv 2015-11-03 14:51:22 2015-11-03 14:56:09 esusero7 unlimxx_ ios esusero7@yh.yyh 2015-11-03 15:20:35 2015-11-03 15:25:24 esusero8 basxx_usc ios esusero8@ij.iih 2015-11-03 15:22:29 2015-11-03 15:27:14 flusero2 unlxxxe ios flusero2@yh.yog 2015-11-03 16:57:58 2015-11-03 17:02:45 nlprodusero1 baxicxx_x ios nlprodusero1@yh.rof 2015-11-03 14:06:52 2015-11-03 14:11:44 nlprodusero2 unlixxxeds ios nlprodusero2@uoh.df 2015-11-03 14:08:28 2015-11-03 14:13:16 prodpurchasetest baxxc_usxc ios null 2015-11-03 09:20:51 2015-11-03 09:25:41 ukusero1 basicxxsca ios ukusero1@uj.uoh 2015-11-03 15:45:59 2015-11-03 15:48:42 ukusero2 baxxsca ios gbuser@yb.jov 2015-11-03 17:00:14 2015-11-03 17:05:07 ukusero4 unlxxd_usc ios ukusero4@uoh.jv 2015-11-03 17:02:10 2015-11-03 17:02:45 usprodusero1 uxxited_us ios usprodusero1@ook.ok 2015-11-03 13:30:25 2015-11-03 13:35:14 usprodusero2 bxxxs ios usprodusero2@ok.iob 2015-11-03 13:33:39 2015-11-03 13:38:31 usprodusero5 unlxxsc ios usprodusero5@rou.tf 2015-11-03 15:34:35 2015-11-03 15:39:26
this result unexpected me , don't want have them. despite of using not regexp
, these lines came results. how can solve these situation?
after edit:
select t.user, vg_product_id,t.`platform`, pd.`mail`, substring_index(group_concat(p.ts order p.ts desc separator ','), ',', 1) sub_start_ts, substring_index(group_concat(t.`expires_at`order t.`expires_at` desc separator ','), ',', 1) expired_time users u inner join tariff_subs_info t on (t.`user` = u.`user_xmpp_login` , t.`user` not regexp ('^([a-za-z]{2,3}(produsero|usero)+[0-9]{1,3})$') , t.`user` not regexp ('test') , t.vg_product_id regexp ("^(europe?|usa?|unlimited?|basic?)([a-za-z0-9_]+)+(and?|ios?)+$" )) left join plus_data pd on (u.`user_xmpp_login` = pd.`user`) inner join purchase_log p on (p.purchase_id = t.purchase_id) (pd.mail not '%guerrillamail.com' , pd.mail not '%test%' or pd.mail null) group 1,2 order date(p.ts);
i still have following result , , 'test' in user.
noadstestuser basixxxf_ios ios null 2015-10-26 14:00:32 2015-10-26 14:05:24 prodpurchasetest basic_uscaxxs ios null 2015-11-03 09:20:51 2015-11-03 09:25:41 esusertest basic_uscxxxs ios esusertest@ixn.ib 2015-11-04 13:53:48 2015-11-04 13:58:44 esusertest2 basic_uxxxx ios esusertedt2@iu.ycx 2015-11-04 14:11:12 2015-11-04 14:13:44
mysql regexp not support \w
, \d
shorthand character classes. so, (\w+|\d+)?
subpattern invalid in mysql. since ?
quantifier makes subpatterns optional (repeat 1 or 0 times), can remove them altogether.
thus, or '(\w+|\d+)?test(\w+|\d+)?'
turn and t.`user` not regexp ('test')
, equal in meaning and t.`user` not '%test%'
.
next, ([a-za-z0-9]+|\_)+
problematic since there nested quantifiers (a +
inside alternation group has +
quantifier applied). classical scenario when catastrophical backtracking may occur. suggest replacing subpattern [a-za-z0-9_]+
match letters, digits or underscore. or equivalent [[:alnum:]_]+
.
Comments
Post a Comment