hadoop - How to use the results of one Hive query as input for another in a hive script? -

August 15, 2012

i want use count(*) result hive query input second hive query. query simplified as:

set lim = select count(*) default.mytable* 0.8; select * default.mytable limit ${hiveconf:lim};

the above code lead error first query not executed , there lim variable not substituted numeric value.

is there way force hive substitute variable lim have numeric value in second query?

## warning - verbose explanation follows; short answer "no way" ##

in terms of architecture, kind of tricks not done in database tier in application tier.

since don't know nuthin' teradata stack (fondly nicknamed "taratata" of french-speaking colleagues) i'll take oracle stack example.

a. inside pl/sql block, can retrieve (scalar) result of query variable, , use later -- input bind variable in prepared statement, or way build dynamically string parsed dynamically sql query. pl/sql block "application", application logic of arbitrary complexity; happens run inside oracle session, on same host runs database tier.

b. inside sql*plus client (and maybe compatible tools e.g. sql developer) can use weird syntax retrieve value in kind of macro-variable, can used stuff value as-is in further sql queries. trick allows crude "application" logic applied otherwise static sql script, client-side. non-portable trick.

bottom line - since hive has no procedural language, , (hopefully) never have one, best way want develop own custom hive client yourself, whatever business logic want. after all, there must thousands of people around world developing java code access hive jdbc, not alone...

Search This Blog

Color

hadoop - How to use the results of one Hive query as input for another in a hive script? -

Comments

Post a Comment

Popular posts from this blog

Redirect to a HTTPS version using .htaccess -

Unlimited choices in BASH case statement -

javascript - jQuery: Add class depending on URL in the best way -