*** 349,727 **** *: Analytical functions -> UnsupportedSqlAnalyticalFunctions - (What is?) - SQL is a very capable language and there are very few questions that it cannot - answer. I find that I can - come up with some convoluted SQL query to answer virtually any question you - could ask from the data. - However, the performance of some of these queries is not what it should be - - nor is the query itself - easy to write in the first place. Some of the things that are hard to do in - straight SQL are actually very - commonly requested operations, including: - - Calculate a running total - Show the cumulative salary within a department row - by row, with each row - including a summation of the prior rows' salary. - - Find percentages within a group - Show the percentage of the total salary paid - to an individual in a - certain department. Take their salary and divide it by the sum of the salary in - the department. - - Top-N queries - Find the top N highest-paid people or the top N sales by - region. - - Compute a moving average - Average the current row's value and the previous N - rows values together. - - Perform ranking queries - Show the relative rank of an individual's salary - within their department. - - Analytic functions, are designed to address these issues. They add extensions - to the SQL language that - not only make these operations easier to code; they make them faster than could - be achieved with the - pure SQL approach. These extensions are currently under review by the ANSI SQL - committee for - inclusion in the SQL specification. - - The syntax of the analytic function is rather straightforward in appearance, - but looks can be deceiving. - It starts with: - - FUNCTION_NAME(<argument>,<argument>,) - OVER - (<Partition-Clause> <Order-by-Clause> <Windowing Clause>) - - The PARTITION BY clause logically breaks a single result set into N groups, - according to the criteria set - by the partition expressions. The words 'partition' and 'group' are used - synonymously. - - The ORDER BY clause specifies how the data is sorted within each group - (partition). - - The windowing clause gives us a way to define a sliding or anchored window of - data, on which the - analytic function will operate, within a group. This clause can be used to have - the analytic function - compute its value based on any arbitrary sliding or anchored window within a - group. - - - Ex: - - This example shows how to use the analytical function SUM to perform a - cumulative sum. First, we fill - some values in a table. The table is very simple and consists of the field dt - and xy only. Note, that for a - given date it is possible to insert multiple rows which is exactly what I do - here. What I am interested is - to extract the cumulative sum for each day in the table. That is, if I have - three entries for the same - date, for example 3, 4 and 5, I don't want the sum to only be 3+4+5 for each - row, but 3 for the first - row, 3+4 for the second row and 3+4+5 for the third row. - create table sum_example ( - dt date, - xy number - ); - - insert into sum_example values (to_date('27.08.1970','DD.MM.YYYY'),4); - {linebreak} - insert into sum_example values (to_date('02.09.1970','DD.MM.YYYY'),1); - {linebreak} - insert into sum_example values (to_date('09.09.1970','DD.MM.YYYY'),5); - {linebreak} - insert into sum_example values (to_date('26.08.1970','DD.MM.YYYY'),3); - {linebreak} - insert into sum_example values (to_date('28.08.1970','DD.MM.YYYY'),4); - {linebreak} - insert into sum_example values (to_date('26.08.1970','DD.MM.YYYY'),6); - {linebreak} - insert into sum_example values (to_date('29.08.1970','DD.MM.YYYY'),9); - {linebreak} - insert into sum_example values (to_date('30.08.1970','DD.MM.YYYY'),2); - {linebreak} - insert into sum_example values (to_date('12.09.1970','DD.MM.YYYY'),7); - {linebreak} - insert into sum_example values (to_date('23.08.1970','DD.MM.YYYY'),2); - {linebreak} - insert into sum_example values (to_date('27.08.1970','DD.MM.YYYY'),5); - {linebreak} - insert into sum_example values (to_date('09.09.1970','DD.MM.YYYY'),9); - {linebreak} - insert into sum_example values (to_date('01.09.1970','DD.MM.YYYY'),3); - {linebreak} - insert into sum_example values (to_date('07.09.1970','DD.MM.YYYY'),1); - {linebreak} - insert into sum_example values (to_date('12.09.1970','DD.MM.YYYY'),4); - {linebreak} - insert into sum_example values (to_date('03.09.1970','DD.MM.YYYY'),5); - {linebreak} - insert into sum_example values (to_date('03.09.1970','DD.MM.YYYY'),8); - {linebreak} - insert into sum_example values (to_date('07.09.1970','DD.MM.YYYY'),7); - {linebreak} - insert into sum_example values (to_date('04.09.1970','DD.MM.YYYY'),8); - {linebreak} - insert into sum_example values (to_date('09.09.1970','DD.MM.YYYY'),1); - {linebreak} - insert into sum_example values (to_date('29.08.1970','DD.MM.YYYY'),3); - {linebreak} - insert into sum_example values (to_date('30.08.1970','DD.MM.YYYY'),7); - {linebreak} - insert into sum_example values (to_date('24.08.1970','DD.MM.YYYY'),7); - {linebreak} - insert into sum_example values (to_date('07.09.1970','DD.MM.YYYY'),9); - {linebreak} - insert into sum_example values (to_date('26.08.1970','DD.MM.YYYY'),2); - {linebreak} - insert into sum_example values (to_date('09.09.1970','DD.MM.YYYY'),8); - {linebreak} - - select dt, sum(xy) over (partition by trunc(dt) order by dt rows between - unbounded preceding and current row) s, xy from sum_example; - - drop table sum_example; - - The select statement will return: - - 23.08.70 2 2{linebreak} - 24.08.70 7 7{linebreak} - 26.08.70 3 3{linebreak} - 26.08.70 5 2{linebreak} - 26.08.70 11 6{linebreak} - 27.08.70 4 4{linebreak} - 27.08.70 9 5{linebreak} - 28.08.70 4 4{linebreak} - 29.08.70 9 9{linebreak} - 29.08.70 12 3{linebreak} - 30.08.70 2 2{linebreak} - 30.08.70 9 7{linebreak} - 01.09.70 3 3{linebreak} - 02.09.70 1 1{linebreak} - 03.09.70 5 5{linebreak} - 03.09.70 13 8{linebreak} - 04.09.70 8 8{linebreak} - 07.09.70 1 1{linebreak} - 07.09.70 8 7{linebreak} - 07.09.70 17 9{linebreak} - 09.09.70 5 5{linebreak} - 09.09.70 14 9{linebreak} - 09.09.70 15 1{linebreak} - 09.09.70 23 8{linebreak} - 12.09.70 7 7{linebreak} - 12.09.70 11 4{linebreak} - - The third column correspondents to xy (the values inserted with the insert - into ... above). The - interesting column is the second. For example on the 26th of August in 1970, - the first row for that date - is 3 (equals xy), the second is 5 (equals xy+3) and the third is 11 (equals - xy+3+5). - - List of analitic functions: - - AVG (<distinct|all> expression ) - Used to compute an average of an expression within a group and window. - Distinct may be used to find - the average of the values in a group after duplicates have been removed. - - CORR (expression, expression) - Returns the coefficient of correlation of a pair of expressions that return - numbers. It is shorthand for: - - COVAR_POP(expr1, expr2) / - - STDDEV_POP(expr1) * STDDEV_POP(expr2)). - - Statistically speaking, a correlation is the strength of an association between - variables. An association - between variables means that the value of one variable can be predicted, to - some extent, by the value - of the other. The correlation coefficient gives the strength of the association - by returning a number - between -1 (strong inverse correlation) and 1 (strong correlation). A value of - 0 would indicate no - correlation. - - COUNT (<distinct> <*> <expression>) - This will count occurrences within a group. If you specify * or some non-null - constant, count will count - all rows. If you specify an expression, count returns the count of non-null - evaluations of expression. - You may use the DISTINCT modifier to count occurrences of rows in a group after - duplicates have been - removed. - - COVAR_POP (expression, expression) - This returns the population covariance of a pair of expressions that return - numbers. - - COVAR_SAMP (expression, expression) - This returns the sample covariance of a pair of expressions that return - numbers. - - CUME_DIST - This computes the relative position of a row in a group. CUME_DIST will always - return a number - greater then 0 and less then or equal to 1. This number represents the - 'position' of the row in the - group of N rows. In a group of three rows, the cumulate distribution values - returned would be 1/3, 2/3, - and 3/3 for example. - - DENSE_RANK - This function computes the relative rank of each row returned from a query - with respect to the other - rows, based on the values of the expressions in the ORDER BY clause. The data - within a group is sorted - by the ORDER BY clause and then a numeric ranking is assigned to each row in - turn starting with 1 and - continuing on up. The rank is incremented every time the values of the ORDER BY - expressions change. - Rows with equal values receive the same rank (nulls are considered equal in - this comparison). A dense - rank returns a ranking number without any gaps. This is in comparison to RANK - below. - - FIRST_VALUE - This simply returns the first value from a group. - - LAG (expression, <offset>, <default>) - LAG gives you access to other rows in a resultset without doing a self-join. - It allows you to treat the - cursor as if it were an array in effect. You can reference rows that come - before the current row in a - given group. This would allow you to select 'the previous rows' from a group - along with the current - row. See LEAD for how to get 'the next rows'. - - Offset is a positive integer that defaults to 1 (the previous row). Default is - the value to be returned if - the index is out of range of the window (for the first row in a group, the - default will be returned) - - LAST_VALUE - This simply returns the last value from a group. - - LEAD (expression, <offset>, <default>) - LEAD is the opposite of LAG. Whereas LAG gives you access to the a row - preceding yours in a group - - LEAD gives you access to the a row that comes after your row. - - Offset is a positive integer that defaults to 1 (the next row). Default is the - value to be returned if the - index is out of range of the window (for the last row in a group, the default - will be returned). - - MAX(expression) - Finds the maximum value of expression within a window of a group. - - MIN(expression) - Finds the minimum value of expression within a window of a group. - - NTILE (expression) - Divides a group into 'value of expression' buckets. - - For example; if expression = 4, then each row in the group would be assigned a - number from 1 to 4 - putting it into a percentile. If the group had 20 rows in it, the first 5 would - be assigned 1, the next 5 - would be assigned 2 and so on. In the event the cardinality of the group is not - evenly divisible by the - expression, the rows are distributed such that no percentile has more than 1 - row more then any other - percentile in that group and the lowest percentiles are the ones that will have - 'extra' rows. For example, - using expression = 4 again and the number of rows = 21, percentile = 1 will - have 6 rows, percentile = - 2 will have 5, and so on. - - PERCENT_RANK - This is similar to the CUME_DIST (cumulative distribution) function. For a - given row in a group, it - calculates the rank of that row minus 1, divided by 1 less than the number of - rows being evaluated in - the group. This function will always return values from 0 to 1 inclusive. - - RANK - This function computes the relative rank of each row returned from a query - with respect to the other - rows, based on the values of the expressions in the ORDER BY clause. The data - within a group is sorted - by the ORDER BY clause and then a numeric ranking is assigned to each row in - turn starting with 1 and - continuing on up. Rows with the same values of the ORDER BY expressions receive - the same rank; - however, if two rows do receive the same rank the rank numbers will - subsequently 'skip'. If two rows - are number 1, there will be no number 2 - rank will assign the value of 3 to - the next row in the group. - This is in contrast to DENSE_RANK, which does not skip values. - - RATIO_TO_REPORT (expression) - This function computes the value of expression / (sum(expression)) over the - group. - - This gives you the percentage of the total the current row contributes to the - sum(expression). - - REGR_ xxxxxxx (expression, expression) - These linear regression functions fit an ordinary-least-squares regression - line to a pair of expressions. - There are 9 different regression functions available for use. - - ROW_NUMBER - Returns the offset of a row in an ordered group. Can be used to sequentially - number rows, ordered by - certain criteria. - - STDDEV (expression) - Computes the standard deviation of the current row with respect to the group. - - STDDEV_POP (expression) - This function computes the population standard deviation and returns the - square root of the - population variance. Its return value is same as the square root of the VAR_POP - function. - - STDDEV_SAMP (expression) - This function computes the cumulative sample standard deviation and returns - the square root of the - sample variance. This function returns the same value as the square root of the - VAR_SAMP function - would. - - SUM(expression) - This function computes the cumulative sum of expression in a group. - - VAR_POP (expression) - This function returns the population variance of a non-null set of numbers - (nulls are ignored). - VAR_POP function makes the following calculation for us: - - (SUM(expr*expr) - SUM(expr)*SUM(expr) / COUNT(expr)) / COUNT(expr) - - VAR_SAMP (expression) - This function returns the sample variance of a non-null set of numbers (nulls - in the set are ignored). - This function makes the following calculation for us: - - (SUM(expr*expr) - SUM(expr)*SUM(expr) / COUNT(expr)) / (COUNT(expr) - 1) - - VARIANCE (expression) - This function returns the variance of expression. Oracle will calculate the - variance as follows: - - 0 if the number of rows in expression = 1 - - VAR_SAMP if the number of rows in expression > 1 - - More details on the web ... ask tom !? - ==== FEATURES ADDED IN RECENT VERSIONS ==== --- 349,354 ----