SQLite CVSTrac

*** 95,101 ****
  a '.' in them. this would fake schemas good enough for me. right now
  it doesnt seem to allow it.
  
! _:::::: You can fake this syntax if you split the "schemas" off into seperate files, then do an ATTACH DATABASE blorg.db AS blorg; SELECT zipcode FROM blorg.address;
  
  *: TRUNCATE (MySQL, Postgresql and Oracle have it... but I dont know if this is a standard command) -
  _SQLite does this automatically when you do a DELETE
--- 95,102 ----
  a '.' in them. this would fake schemas good enough for me. right now
  it doesnt seem to allow it.
  
! _:::::: You can fake this syntax if you split the "schemas" off into seperate files, then do an ATTACH
! DATABASE blorg.db AS blorg; SELECT zipcode FROM blorg.address;
  
  *: TRUNCATE (MySQL, Postgresql and Oracle have it... but I dont know if this is a standard command) -
  _SQLite does this automatically when you do a DELETE
***************
*** 279,289 ****
  *: Analytical functions
  
  (What is?)
! SQL is a very capable language and there are very few questions that it cannot answer. I find that I can come up with some convoluted SQL query to answer virtually any question you could ask from the data. However, the performance of some of these queries is not what it should be - nor is the query itself easy to write in the first place. Some of the things that are hard to do in straight SQL are actually very commonly requested operations, including:
  
! Calculate a running total - Show the cumulative salary within a department row by row, with each row including a summation of the prior rows' salary.
  
! Find percentages within a group - Show the percentage of the total salary paid to an individual in a certain department. Take their salary and divide it by the sum of the salary in the department.
  
  Top-N queries - Find the top N highest-paid people or the top N sales by region.
  
--- 280,296 ----
  *: Analytical functions
  
  (What is?)
! SQL is a very capable language and there are very few questions that it cannot answer. I find that I can
! come up with some convoluted SQL query to answer virtually any question you could ask from the data.
! However, the performance of some of these queries is not what it should be - nor is the query itself
! easy to write in the first place. Some of the things that are hard to do in straight SQL are actually very
! commonly requested operations, including:
  
! Calculate a running total - Show the cumulative salary within a department row by row, with each row
! including a summation of the prior rows' salary.
  
! Find percentages within a group - Show the percentage of the total salary paid to an individual in a
! certain department. Take their salary and divide it by the sum of the salary in the department.
  
  Top-N queries - Find the top N highest-paid people or the top N sales by region.
  
***************
*** 291,314 ****
  
  Perform ranking queries - Show the relative rank of an individual's salary within their department.
  
! Analytic functions, are designed to address these issues. They add extensions to the SQL language that not only make these operations easier to code; they make them faster than could be achieved with the pure SQL approach. These extensions are currently under review by the ANSI SQL committee for inclusion in the SQL specification.
  
! The syntax of the analytic function is rather straightforward in appearance, but looks can be deceiving. It starts with:
  
  FUNCTION_NAME(<argument>,<argument>,)
  OVER
  (<Partition-Clause> <Order-by-Clause> <Windowing Clause>)
  
! The PARTITION BY clause logically breaks a single result set into N groups, according to the criteria set by the partition expressions. The words 'partition' and 'group' are used synonymously.
  
  The ORDER BY clause specifies how the data is sorted within each group (partition).
  
! The windowing clause gives us a way to define a sliding or anchored window of data, on which the analytic function will operate, within a group. This clause can be used to have the analytic function compute its value based on any arbitrary sliding or anchored window within a group.
  
  
  Ex:
  
! This example shows how to use the analytical function SUM to perform a cumulative sum. First, we fill some values in a table. The table is very simple and consists of the field dt and xy only. Note, that for a given date it is possible to insert multiple rows which is exactly what I do here. What I am interested is to extract the cumulative sum for each day in the table. That is, if I have three entries for the same date, for example 3, 4 and 5, I don't want the sum to only be 3+4+5 for each row, but 3 for the first row, 3+4 for the second row and 3+4+5 for the third row.
  create table sum_example (
    dt date,
    xy number
--- 298,333 ----
  
  Perform ranking queries - Show the relative rank of an individual's salary within their department.
  
! Analytic functions, are designed to address these issues. They add extensions to the SQL language that
! not only make these operations easier to code; they make them faster than could be achieved with the
! pure SQL approach. These extensions are currently under review by the ANSI SQL committee for
! inclusion in the SQL specification.
  
! The syntax of the analytic function is rather straightforward in appearance, but looks can be deceiving.
! It starts with:
  
  FUNCTION_NAME(<argument>,<argument>,)
  OVER
  (<Partition-Clause> <Order-by-Clause> <Windowing Clause>)
  
! The PARTITION BY clause logically breaks a single result set into N groups, according to the criteria set
! by the partition expressions. The words 'partition' and 'group' are used synonymously.
  
  The ORDER BY clause specifies how the data is sorted within each group (partition).
  
! The windowing clause gives us a way to define a sliding or anchored window of data, on which the
! analytic function will operate, within a group. This clause can be used to have the analytic function
! compute its value based on any arbitrary sliding or anchored window within a group.
  
  
  Ex:
  
! This example shows how to use the analytical function SUM to perform a cumulative sum. First, we fill
! some values in a table. The table is very simple and consists of the field dt and xy only. Note, that for a
! given date it is possible to insert multiple rows which is exactly what I do here. What I am interested is
! to extract the cumulative sum for each day in the table. That is, if I have three entries for the same
! date, for example 3, 4 and 5, I don't want the sum to only be 3+4+5 for each row, but 3 for the first
! row, 3+4 for the second row and 3+4+5 for the third row.
  create table sum_example (
    dt date,
    xy number
***************
*** 375,386 ****
  12.09.70          7          7{linebreak}
  12.09.70         11          4{linebreak}
  
! The third column correspondents to xy (the values inserted with the insert into ... above). The interesting column is the second. For example on the 26th of August in 1970, the first row for that date is 3 (equals xy), the second is 5 (equals xy+3) and the third is 11 (equals xy+3+5).
  
  List of analitic functions:
  
  AVG (<distinct|all> expression )
!  Used to compute an average of an expression within a group and window. Distinct may be used to find the average of the values in a group after duplicates have been removed.
  
  CORR (expression, expression)
   Returns the coefficient of correlation of a pair of expressions that return numbers. It is shorthand for:
--- 394,408 ----
  12.09.70          7          7{linebreak}
  12.09.70         11          4{linebreak}
  
! The third column correspondents to xy (the values inserted with the insert into ... above). The
! interesting column is the second. For example on the 26th of August in 1970, the first row for that date
! is 3 (equals xy), the second is 5 (equals xy+3) and the third is 11 (equals xy+3+5).
  
  List of analitic functions:
  
  AVG (<distinct|all> expression )
!  Used to compute an average of an expression within a group and window. Distinct may be used to find
! the average of the values in a group after duplicates have been removed.
  
  CORR (expression, expression)
   Returns the coefficient of correlation of a pair of expressions that return numbers. It is shorthand for:
***************
*** 389,398 ****
  
  STDDEV_POP(expr1) * STDDEV_POP(expr2)).
  
! Statistically speaking, a correlation is the strength of an association between variables. An association between variables means that the value of one variable can be predicted, to some extent, by the value of the other. The correlation coefficient gives the strength of the association by returning a number between -1 (strong inverse correlation) and 1 (strong correlation). A value of 0 would indicate no correlation.
  
  COUNT (<distinct> <*> <expression>)
!  This will count occurrences within a group. If you specify * or some non-null constant, count will count all rows. If you specify an expression, count returns the count of non-null evaluations of expression. You may use the DISTINCT modifier to count occurrences of rows in a group after duplicates have been removed.
  
  COVAR_POP (expression, expression)
   This returns the population covariance of a pair of expressions that return numbers.
--- 411,427 ----
  
  STDDEV_POP(expr1) * STDDEV_POP(expr2)).
  
! Statistically speaking, a correlation is the strength of an association between variables. An association
! between variables means that the value of one variable can be predicted, to some extent, by the value
! of the other. The correlation coefficient gives the strength of the association by returning a number
! between -1 (strong inverse correlation) and 1 (strong correlation). A value of 0 would indicate no
! correlation.
  
  COUNT (<distinct> <*> <expression>)
!  This will count occurrences within a group. If you specify * or some non-null constant, count will count
! all rows. If you specify an expression, count returns the count of non-null evaluations of expression.
! You may use the DISTINCT modifier to count occurrences of rows in a group after duplicates have been
! removed.
  
  COVAR_POP (expression, expression)
   This returns the population covariance of a pair of expressions that return numbers.
***************
*** 401,426 ****
   This returns the sample covariance of a pair of expressions that return numbers.
  
  CUME_DIST
!  This computes the relative position of a row in a group. CUME_DIST will always return a number greater then 0 and less then or equal to 1. This number represents the 'position' of the row in the group of N rows. In a group of three rows, the cumulate distribution values returned would be 1/3, 2/3, and 3/3 for example.
  
  DENSE_RANK
!  This function computes the relative rank of each row returned from a query with respect to the other rows, based on the values of the expressions in the ORDER BY clause. The data within a group is sorted by the ORDER BY clause and then a numeric ranking is assigned to each row in turn starting with 1 and continuing on up. The rank is incremented every time the values of the ORDER BY expressions change. Rows with equal values receive the same rank (nulls are considered equal in this comparison). A dense rank returns a ranking number without any gaps. This is in comparison to RANK below.
  
  FIRST_VALUE
   This simply returns the first value from a group.
  
  LAG (expression, <offset>, <default>)
!  LAG gives you access to other rows in a resultset without doing a self-join. It allows you to treat the cursor as if it were an array in effect. You can reference rows that come before the current row in a given group. This would allow you to select 'the previous rows' from a group along with the current row. See LEAD for how to get 'the next rows'.
  
! Offset is a positive integer that defaults to 1 (the previous row). Default is the value to be returned if the index is out of range of the window (for the first row in a group, the default will be returned)
  
  LAST_VALUE
   This simply returns the last value from a group.
  
  LEAD (expression, <offset>, <default>)
!  LEAD is the opposite of LAG. Whereas LAG gives you access to the a row preceding yours in a group - LEAD gives you access to the a row that comes after your row.
  
! Offset is a positive integer that defaults to 1 (the next row). Default is the value to be returned if the index is out of range of the window (for the last row in a group, the default will be returned).
  
  MAX(expression)
   Finds the maximum value of expression within a window of a group.
--- 430,469 ----
   This returns the sample covariance of a pair of expressions that return numbers.
  
  CUME_DIST
!  This computes the relative position of a row in a group. CUME_DIST will always return a number
! greater then 0 and less then or equal to 1. This number represents the 'position' of the row in the
! group of N rows. In a group of three rows, the cumulate distribution values returned would be 1/3, 2/3,
! and 3/3 for example.
  
  DENSE_RANK
!  This function computes the relative rank of each row returned from a query with respect to the other
! rows, based on the values of the expressions in the ORDER BY clause. The data within a group is sorted
! by the ORDER BY clause and then a numeric ranking is assigned to each row in turn starting with 1 and
! continuing on up. The rank is incremented every time the values of the ORDER BY expressions change.
! Rows with equal values receive the same rank (nulls are considered equal in this comparison). A dense
! rank returns a ranking number without any gaps. This is in comparison to RANK below.
  
  FIRST_VALUE
   This simply returns the first value from a group.
  
  LAG (expression, <offset>, <default>)
!  LAG gives you access to other rows in a resultset without doing a self-join. It allows you to treat the
! cursor as if it were an array in effect. You can reference rows that come before the current row in a
! given group. This would allow you to select 'the previous rows' from a group along with the current
! row. See LEAD for how to get 'the next rows'.
  
! Offset is a positive integer that defaults to 1 (the previous row). Default is the value to be returned if
! the index is out of range of the window (for the first row in a group, the default will be returned)
  
  LAST_VALUE
   This simply returns the last value from a group.
  
  LEAD (expression, <offset>, <default>)
!  LEAD is the opposite of LAG. Whereas LAG gives you access to the a row preceding yours in a group -
! LEAD gives you access to the a row that comes after your row.
  
! Offset is a positive integer that defaults to 1 (the next row). Default is the value to be returned if the
! index is out of range of the window (for the last row in a group, the default will be returned).
  
  MAX(expression)
   Finds the maximum value of expression within a window of a group.
***************
*** 431,443 ****
  NTILE (expression)
   Divides a group into 'value of expression' buckets.
  
! For example; if expression = 4, then each row in the group would be assigned a number from 1 to 4 putting it into a percentile. If the group had 20 rows in it, the first 5 would be assigned 1, the next 5 would be assigned 2 and so on. In the event the cardinality of the group is not evenly divisible by the expression, the rows are distributed such that no percentile has more than 1 row more then any other percentile in that group and the lowest percentiles are the ones that will have 'extra' rows. For example, using expression = 4 again and the number of rows = 21, percentile = 1 will have 6 rows, percentile = 2 will have 5, and so on.
  
  PERCENT_RANK
!  This is similar to the CUME_DIST (cumulative distribution) function. For a given row in a group, it calculates the rank of that row minus 1, divided by 1 less than the number of rows being evaluated in the group. This function will always return values from 0 to 1 inclusive.
  
  RANK
!  This function computes the relative rank of each row returned from a query with respect to the other rows, based on the values of the expressions in the ORDER BY clause. The data within a group is sorted by the ORDER BY clause and then a numeric ranking is assigned to each row in turn starting with 1 and continuing on up. Rows with the same values of the ORDER BY expressions receive the same rank; however, if two rows do receive the same rank the rank numbers will subsequently 'skip'. If two rows are number 1, there will be no number 2 - rank will assign the value of 3 to the next row in the group. This is in contrast to DENSE_RANK, which does not skip values.
  
  RATIO_TO_REPORT (expression)
   This function computes the value of expression / (sum(expression)) over the group.
--- 474,500 ----
  NTILE (expression)
   Divides a group into 'value of expression' buckets.
  
! For example; if expression = 4, then each row in the group would be assigned a number from 1 to 4
! putting it into a percentile. If the group had 20 rows in it, the first 5 would be assigned 1, the next 5
! would be assigned 2 and so on. In the event the cardinality of the group is not evenly divisible by the
! expression, the rows are distributed such that no percentile has more than 1 row more then any other
! percentile in that group and the lowest percentiles are the ones that will have 'extra' rows. For example,
! using expression = 4 again and the number of rows = 21, percentile = 1 will have 6 rows, percentile =
! 2 will have 5, and so on.
  
  PERCENT_RANK
!  This is similar to the CUME_DIST (cumulative distribution) function. For a given row in a group, it
! calculates the rank of that row minus 1, divided by 1 less than the number of rows being evaluated in
! the group. This function will always return values from 0 to 1 inclusive.
  
  RANK
!  This function computes the relative rank of each row returned from a query with respect to the other
! rows, based on the values of the expressions in the ORDER BY clause. The data within a group is sorted
! by the ORDER BY clause and then a numeric ranking is assigned to each row in turn starting with 1 and
! continuing on up. Rows with the same values of the ORDER BY expressions receive the same rank;
! however, if two rows do receive the same rank the rank numbers will subsequently 'skip'. If two rows
! are number 1, there will be no number 2 - rank will assign the value of 3 to the next row in the group.
! This is in contrast to DENSE_RANK, which does not skip values.
  
  RATIO_TO_REPORT (expression)
   This function computes the value of expression / (sum(expression)) over the group.
***************
*** 445,474 ****
  This gives you the percentage of the total the current row contributes to the sum(expression).
  
  REGR_ xxxxxxx (expression, expression)
!  These linear regression functions fit an ordinary-least-squares regression line to a pair of expressions. There are 9 different regression functions available for use.
  
  ROW_NUMBER
!  Returns the offset of a row in an ordered group. Can be used to sequentially number rows, ordered by certain criteria.
  
  STDDEV (expression)
   Computes the standard deviation of the current row with respect to the group.
  
  STDDEV_POP (expression)
!  This function computes the population standard deviation and returns the square root of the population variance. Its return value is same as the square root of the VAR_POP function.
  
  STDDEV_SAMP (expression)
!  This function computes the cumulative sample standard deviation and returns the square root of the sample variance. This function returns the same value as the square root of the VAR_SAMP function would.
  
  SUM(expression)
   This function computes the cumulative sum of expression in a group.
  
  VAR_POP (expression)
!  This function returns the population variance of a non-null set of numbers (nulls are ignored). VAR_POP function makes the following calculation for us:
  
  (SUM(expr*expr) - SUM(expr)*SUM(expr) / COUNT(expr)) / COUNT(expr)
  
  VAR_SAMP (expression)
!  This function returns the sample variance of a non-null set of numbers (nulls in the set are ignored). This function makes the following calculation for us:
  
  (SUM(expr*expr) - SUM(expr)*SUM(expr) / COUNT(expr)) / (COUNT(expr) - 1)
  
--- 502,538 ----
  This gives you the percentage of the total the current row contributes to the sum(expression).
  
  REGR_ xxxxxxx (expression, expression)
!  These linear regression functions fit an ordinary-least-squares regression line to a pair of expressions.
! There are 9 different regression functions available for use.
  
  ROW_NUMBER
!  Returns the offset of a row in an ordered group. Can be used to sequentially number rows, ordered by
! certain criteria.
  
  STDDEV (expression)
   Computes the standard deviation of the current row with respect to the group.
  
  STDDEV_POP (expression)
!  This function computes the population standard deviation and returns the square root of the
! population variance. Its return value is same as the square root of the VAR_POP function.
  
  STDDEV_SAMP (expression)
!  This function computes the cumulative sample standard deviation and returns the square root of the
! sample variance. This function returns the same value as the square root of the VAR_SAMP function
! would.
  
  SUM(expression)
   This function computes the cumulative sum of expression in a group.
  
  VAR_POP (expression)
!  This function returns the population variance of a non-null set of numbers (nulls are ignored).
! VAR_POP function makes the following calculation for us:
  
  (SUM(expr*expr) - SUM(expr)*SUM(expr) / COUNT(expr)) / COUNT(expr)
  
  VAR_SAMP (expression)
!  This function returns the sample variance of a non-null set of numbers (nulls in the set are ignored).
! This function makes the following calculation for us:
  
  (SUM(expr*expr) - SUM(expr)*SUM(expr) / COUNT(expr)) / (COUNT(expr) - 1)
  
***************
*** 596,606 ****
  *: To above paragraph. Sometimes it is better pay few hundreds of bucks (just few hours of
  my work rate) and get much more powerful commercial solution which is royalty free. For example I
  very love Valentina database - (http://www.paradigmasoft.com).  Valentina beats anything in 10-100+
! times,
! especially on big dbs . It is not expensive, royalty free. Really full SQL92, yet they have cool Object-
! Relational features.
  
! *: To above paragraph: 'Valentina beats anything in 10-100+ times'. Ok, 10-100+ times of WHAT?. RAM Usage?, CPU Usage?, CPU Cycles?, CPU Count?, Consistent Gets?, Concurrent Users?, Concurrent Transactions?, Parses?, Executes?, Fetches?, Recursive Calls?, Physical Reads?, Documentation? ... If you don't have arguments based on real data in a production environment with real data load (users & transactions) your comment is useless. Benchmark it, prove it and then show us your results, not your 'thoughts'.
  
  I wonder how useful these "remarks" are...
  
--- 660,684 ----
  *: To above paragraph. Sometimes it is better pay few hundreds of bucks (just few hours of
  my work rate) and get much more powerful commercial solution which is royalty free. For example I
  very love Valentina database - (http://www.paradigmasoft.com).  Valentina beats anything in 10-100+
! times, especially on big dbs . It is not expensive, royalty free. Really full SQL92, yet they have cool
! Object-Relational features.
  
! *: To above paragraph: 'Valentina beats anything in 10-100+ times'. Ok, 10-100+ times of WHAT?.
! RAM Usage?, CPU Usage?, CPU Cycles?, CPU Count?, Consistent Gets?, Concurrent Users?, Concurrent
! Transactions?, Parses?, Executes?, Fetches?, Recursive Calls?, Physical Reads?, Documentation? ... If you
! don't have arguments based on real data in a production environment with real data load (users &
! transactions) your comment is useless. Benchmark it, prove it and then show us your results, not your
! 'thoughts'.
! 
! *: Let me answer. I have talk about speed :-) About time of queries execution in seconds. Example. I
! have bench Table with million records. 10 fields of all types. 100MB size. Next query "SELECT DISTINCT
! * From T1" Valentina do in 1.7 seconds, SqlLite in 280 seconds. Difference 180 times. Or query:
! "SELECT * FROM T1 ORDER BY fld_byte, fld_varchar Desc"  Valentina do in 1.99 seconds, SqlLite in 115
! seconds. Difference 55 times. And this is just sigle table queries. I do not mention joins on 3-5-7
! tables with, with GROUP BY, HAVING, ...  And more. If to make database in 10 million records, so db
! grow to 1Gb the you will see even better Valentina win. If you want get information about real
! production power then just go to their users testimonials page and read quotes starting from 1998 year
! it seems.
  
  I wonder how useful these "remarks" are...

sqlite - Unsupported Sql