SQLite CVSTrac

*** 349,727 ****
  
  *: Analytical functions -> UnsupportedSqlAnalyticalFunctions
  
- (What is?)
- SQL is a very capable language and there are very few questions that it cannot
- answer. I find that I can
- come up with some convoluted SQL query to answer virtually any question you
- could ask from the data.
- However, the performance of some of these queries is not what it should be -
- nor is the query itself
- easy to write in the first place. Some of the things that are hard to do in
- straight SQL are actually very
- commonly requested operations, including:
- 
- Calculate a running total - Show the cumulative salary within a department row
- by row, with each row
- including a summation of the prior rows' salary.
- 
- Find percentages within a group - Show the percentage of the total salary paid
- to an individual in a
- certain department. Take their salary and divide it by the sum of the salary in
- the department.
- 
- Top-N queries - Find the top N highest-paid people or the top N sales by
- region.
- 
- Compute a moving average - Average the current row's value and the previous N
- rows values together.
- 
- Perform ranking queries - Show the relative rank of an individual's salary
- within their department.
- 
- Analytic functions, are designed to address these issues. They add extensions
- to the SQL language that
- not only make these operations easier to code; they make them faster than could
- be achieved with the
- pure SQL approach. These extensions are currently under review by the ANSI SQL
- committee for
- inclusion in the SQL specification.
- 
- The syntax of the analytic function is rather straightforward in appearance,
- but looks can be deceiving.
- It starts with:
- 
- FUNCTION_NAME(<argument>,<argument>,)
- OVER
- (<Partition-Clause> <Order-by-Clause> <Windowing Clause>)
- 
- The PARTITION BY clause logically breaks a single result set into N groups,
- according to the criteria set
- by the partition expressions. The words 'partition' and 'group' are used
- synonymously.
- 
- The ORDER BY clause specifies how the data is sorted within each group
- (partition).
- 
- The windowing clause gives us a way to define a sliding or anchored window of
- data, on which the
- analytic function will operate, within a group. This clause can be used to have
- the analytic function
- compute its value based on any arbitrary sliding or anchored window within a
- group.
- 
- 
- Ex:
- 
- This example shows how to use the analytical function SUM to perform a
- cumulative sum. First, we fill
- some values in a table. The table is very simple and consists of the field dt
- and xy only. Note, that for a
- given date it is possible to insert multiple rows which is exactly what I do
- here. What I am interested is
- to extract the cumulative sum for each day in the table. That is, if I have
- three entries for the same
- date, for example 3, 4 and 5, I don't want the sum to only be 3+4+5 for each
- row, but 3 for the first
- row, 3+4 for the second row and 3+4+5 for the third row.
- create table sum_example (
-   dt date,
-   xy number
- );
- 
- insert into sum_example values (to_date('27.08.1970','DD.MM.YYYY'),4);
- {linebreak}
- insert into sum_example values (to_date('02.09.1970','DD.MM.YYYY'),1);
- {linebreak}
- insert into sum_example values (to_date('09.09.1970','DD.MM.YYYY'),5);
- {linebreak}
- insert into sum_example values (to_date('26.08.1970','DD.MM.YYYY'),3);
- {linebreak}
- insert into sum_example values (to_date('28.08.1970','DD.MM.YYYY'),4);
- {linebreak}
- insert into sum_example values (to_date('26.08.1970','DD.MM.YYYY'),6);
- {linebreak}
- insert into sum_example values (to_date('29.08.1970','DD.MM.YYYY'),9);
- {linebreak}
- insert into sum_example values (to_date('30.08.1970','DD.MM.YYYY'),2);
- {linebreak}
- insert into sum_example values (to_date('12.09.1970','DD.MM.YYYY'),7);
- {linebreak}
- insert into sum_example values (to_date('23.08.1970','DD.MM.YYYY'),2);
- {linebreak}
- insert into sum_example values (to_date('27.08.1970','DD.MM.YYYY'),5);
- {linebreak}
- insert into sum_example values (to_date('09.09.1970','DD.MM.YYYY'),9);
- {linebreak}
- insert into sum_example values (to_date('01.09.1970','DD.MM.YYYY'),3);
- {linebreak}
- insert into sum_example values (to_date('07.09.1970','DD.MM.YYYY'),1);
- {linebreak}
- insert into sum_example values (to_date('12.09.1970','DD.MM.YYYY'),4);
- {linebreak}
- insert into sum_example values (to_date('03.09.1970','DD.MM.YYYY'),5);
- {linebreak}
- insert into sum_example values (to_date('03.09.1970','DD.MM.YYYY'),8);
- {linebreak}
- insert into sum_example values (to_date('07.09.1970','DD.MM.YYYY'),7);
- {linebreak}
- insert into sum_example values (to_date('04.09.1970','DD.MM.YYYY'),8);
- {linebreak}
- insert into sum_example values (to_date('09.09.1970','DD.MM.YYYY'),1);
- {linebreak}
- insert into sum_example values (to_date('29.08.1970','DD.MM.YYYY'),3);
- {linebreak}
- insert into sum_example values (to_date('30.08.1970','DD.MM.YYYY'),7);
- {linebreak}
- insert into sum_example values (to_date('24.08.1970','DD.MM.YYYY'),7);
- {linebreak}
- insert into sum_example values (to_date('07.09.1970','DD.MM.YYYY'),9);
- {linebreak}
- insert into sum_example values (to_date('26.08.1970','DD.MM.YYYY'),2);
- {linebreak}
- insert into sum_example values (to_date('09.09.1970','DD.MM.YYYY'),8);
- {linebreak}
- 
- select dt, sum(xy) over (partition by trunc(dt) order by dt rows between
- unbounded preceding and current row) s, xy from sum_example;
- 
- drop table sum_example;
- 
- The select statement will return:
- 
- 23.08.70          2          2{linebreak}
- 24.08.70          7          7{linebreak}
- 26.08.70          3          3{linebreak}
- 26.08.70          5          2{linebreak}
- 26.08.70         11          6{linebreak}
- 27.08.70          4          4{linebreak}
- 27.08.70          9          5{linebreak}
- 28.08.70          4          4{linebreak}
- 29.08.70          9          9{linebreak}
- 29.08.70         12          3{linebreak}
- 30.08.70          2          2{linebreak}
- 30.08.70          9          7{linebreak}
- 01.09.70          3          3{linebreak}
- 02.09.70          1          1{linebreak}
- 03.09.70          5          5{linebreak}
- 03.09.70         13          8{linebreak}
- 04.09.70          8          8{linebreak}
- 07.09.70          1          1{linebreak}
- 07.09.70          8          7{linebreak}
- 07.09.70         17          9{linebreak}
- 09.09.70          5          5{linebreak}
- 09.09.70         14          9{linebreak}
- 09.09.70         15          1{linebreak}
- 09.09.70         23          8{linebreak}
- 12.09.70          7          7{linebreak}
- 12.09.70         11          4{linebreak}
- 
- The third column correspondents to xy (the values inserted with the insert
- into ... above). The
- interesting column is the second. For example on the 26th of August in 1970,
- the first row for that date
- is 3 (equals xy), the second is 5 (equals xy+3) and the third is 11 (equals
- xy+3+5).
- 
- List of analitic functions:
- 
- AVG (<distinct|all> expression )
-  Used to compute an average of an expression within a group and window.
- Distinct may be used to find
- the average of the values in a group after duplicates have been removed.
- 
- CORR (expression, expression)
-  Returns the coefficient of correlation of a pair of expressions that return
- numbers. It is shorthand for:
- 
- COVAR_POP(expr1, expr2) /
- 
- STDDEV_POP(expr1) * STDDEV_POP(expr2)).
- 
- Statistically speaking, a correlation is the strength of an association between
- variables. An association
- between variables means that the value of one variable can be predicted, to
- some extent, by the value
- of the other. The correlation coefficient gives the strength of the association
- by returning a number
- between -1 (strong inverse correlation) and 1 (strong correlation). A value of
- 0 would indicate no
- correlation.
- 
- COUNT (<distinct> <*> <expression>)
-  This will count occurrences within a group. If you specify * or some non-null
- constant, count will count
- all rows. If you specify an expression, count returns the count of non-null
- evaluations of expression.
- You may use the DISTINCT modifier to count occurrences of rows in a group after
- duplicates have been
- removed.
- 
- COVAR_POP (expression, expression)
-  This returns the population covariance of a pair of expressions that return
- numbers.
- 
- COVAR_SAMP (expression, expression)
-  This returns the sample covariance of a pair of expressions that return
- numbers.
- 
- CUME_DIST
-  This computes the relative position of a row in a group. CUME_DIST will always
- return a number
- greater then 0 and less then or equal to 1. This number represents the
- 'position' of the row in the
- group of N rows. In a group of three rows, the cumulate distribution values
- returned would be 1/3, 2/3,
- and 3/3 for example.
- 
- DENSE_RANK
-  This function computes the relative rank of each row returned from a query
- with respect to the other
- rows, based on the values of the expressions in the ORDER BY clause. The data
- within a group is sorted
- by the ORDER BY clause and then a numeric ranking is assigned to each row in
- turn starting with 1 and
- continuing on up. The rank is incremented every time the values of the ORDER BY
- expressions change.
- Rows with equal values receive the same rank (nulls are considered equal in
- this comparison). A dense
- rank returns a ranking number without any gaps. This is in comparison to RANK
- below.
- 
- FIRST_VALUE
-  This simply returns the first value from a group.
- 
- LAG (expression, <offset>, <default>)
-  LAG gives you access to other rows in a resultset without doing a self-join.
- It allows you to treat the
- cursor as if it were an array in effect. You can reference rows that come
- before the current row in a
- given group. This would allow you to select 'the previous rows' from a group
- along with the current
- row. See LEAD for how to get 'the next rows'.
- 
- Offset is a positive integer that defaults to 1 (the previous row). Default is
- the value to be returned if
- the index is out of range of the window (for the first row in a group, the
- default will be returned)
- 
- LAST_VALUE
-  This simply returns the last value from a group.
- 
- LEAD (expression, <offset>, <default>)
-  LEAD is the opposite of LAG. Whereas LAG gives you access to the a row
- preceding yours in a group -
- LEAD gives you access to the a row that comes after your row.
- 
- Offset is a positive integer that defaults to 1 (the next row). Default is the
- value to be returned if the
- index is out of range of the window (for the last row in a group, the default
- will be returned).
- 
- MAX(expression)
-  Finds the maximum value of expression within a window of a group.
- 
- MIN(expression)
-  Finds the minimum value of expression within a window of a group.
- 
- NTILE (expression)
-  Divides a group into 'value of expression' buckets.
- 
- For example; if expression = 4, then each row in the group would be assigned a
- number from 1 to 4
- putting it into a percentile. If the group had 20 rows in it, the first 5 would
- be assigned 1, the next 5
- would be assigned 2 and so on. In the event the cardinality of the group is not
- evenly divisible by the
- expression, the rows are distributed such that no percentile has more than 1
- row more then any other
- percentile in that group and the lowest percentiles are the ones that will have
- 'extra' rows. For example,
- using expression = 4 again and the number of rows = 21, percentile = 1 will
- have 6 rows, percentile =
- 2 will have 5, and so on.
- 
- PERCENT_RANK
-  This is similar to the CUME_DIST (cumulative distribution) function. For a
- given row in a group, it
- calculates the rank of that row minus 1, divided by 1 less than the number of
- rows being evaluated in
- the group. This function will always return values from 0 to 1 inclusive.
- 
- RANK
-  This function computes the relative rank of each row returned from a query
- with respect to the other
- rows, based on the values of the expressions in the ORDER BY clause. The data
- within a group is sorted
- by the ORDER BY clause and then a numeric ranking is assigned to each row in
- turn starting with 1 and
- continuing on up. Rows with the same values of the ORDER BY expressions receive
- the same rank;
- however, if two rows do receive the same rank the rank numbers will
- subsequently 'skip'. If two rows
- are number 1, there will be no number 2 - rank will assign the value of 3 to
- the next row in the group.
- This is in contrast to DENSE_RANK, which does not skip values.
- 
- RATIO_TO_REPORT (expression)
-  This function computes the value of expression / (sum(expression)) over the
- group.
- 
- This gives you the percentage of the total the current row contributes to the
- sum(expression).
- 
- REGR_ xxxxxxx (expression, expression)
-  These linear regression functions fit an ordinary-least-squares regression
- line to a pair of expressions.
- There are 9 different regression functions available for use.
- 
- ROW_NUMBER
-  Returns the offset of a row in an ordered group. Can be used to sequentially
- number rows, ordered by
- certain criteria.
- 
- STDDEV (expression)
-  Computes the standard deviation of the current row with respect to the group.
- 
- STDDEV_POP (expression)
-  This function computes the population standard deviation and returns the
- square root of the
- population variance. Its return value is same as the square root of the VAR_POP
- function.
- 
- STDDEV_SAMP (expression)
-  This function computes the cumulative sample standard deviation and returns
- the square root of the
- sample variance. This function returns the same value as the square root of the
- VAR_SAMP function
- would.
- 
- SUM(expression)
-  This function computes the cumulative sum of expression in a group.
- 
- VAR_POP (expression)
-  This function returns the population variance of a non-null set of numbers
- (nulls are ignored).
- VAR_POP function makes the following calculation for us:
- 
- (SUM(expr*expr) - SUM(expr)*SUM(expr) / COUNT(expr)) / COUNT(expr)
- 
- VAR_SAMP (expression)
-  This function returns the sample variance of a non-null set of numbers (nulls
- in the set are ignored).
- This function makes the following calculation for us:
- 
- (SUM(expr*expr) - SUM(expr)*SUM(expr) / COUNT(expr)) / (COUNT(expr) - 1)
- 
- VARIANCE (expression)
-  This function returns the variance of expression. Oracle will calculate the
- variance as follows:
- 
- 0 if the number of rows in expression = 1
- 
- VAR_SAMP if the number of rows in expression > 1
- 
- More details on the web ... ask tom !?
- 
  ====
  FEATURES ADDED IN RECENT VERSIONS
  ====
--- 349,354 ----

sqlite - Unsupported Sql