Small. Fast. Reliable.
Choose any three.

Page History

Turn Off History

There is a list of features that SQLite does not support at http://www.sqlite.org/omitted.html. If you find additional features that SQLite does not support, you may want to list them below.


   update T1 set (theUpdatedValue, theOtherValue) =
(select theTop, theValue from T2 where T2.theKey = T1.theID)

  create table db1.table1 as select * from db2.table1;

  START WITH <conditions> CONNECT BY [PRIOR]<conditions> (ORACLE)

		--
		-- SQLite does not allow "UPDATE ... FROM"
		-- but this is what it might look like
		--
		UPDATE
			t1
		SET
			measure = t2.measure
		FROM
			t2, t1
		WHERE
			t2.key = t1.key
		;

		--
		-- emulating "UPDATE ... FROM" in SQLite
		--
		-- n.b.:  it assumes a PRIMARY KEY !
		--
		-- the INSERT never succeeds because
		-- the JOIN restricts the SELECT to
		-- existing rows, forcing the REPLACE
		--
		INSERT OR REPLACE INTO
			t1( key, measure )
		SELECT
			t2.key, t2.measure
		FROM
			t2, t1
		WHERE
			t2.key = t1.key
		;

		--
		-- emulating "UPDATE ... FROM" in SQLite
		--
		--
		UPDATE
			t1
		SET
			measure = ( SELECT measure FROM t2 WHERE t2.key = t1.key )
		;

      SELECT x.Hours median
      FROM BulbLife x, BulbLife y
      GROUP BY x.Hours
      HAVING
         SUM(CASE WHEN y.Hours <= x.Hours
            THEN 1 ELSE 0 END)>=(COUNT(*)+1)/2 AND
         SUM(CASE WHEN y.Hours >= x.Hours
            THEN 1 ELSE 0 END)>=(COUNT(*)/2)+1

      SELECT a1.a, a1.b, a2.a, a2.b
      FROM a1 LEFT JOIN a2 ON a2.b = a1.a

      SELECT a1.a, a1.b, a2.a, a2.b
      FROM a1, a2
      WHERE a1.a = a2.b(+);

		CREATE TABLE strings (
			string_id INTEGER NOT NULL,
			language_id INTEGER NOT NULL,
			string TEXT,
			PRIMARY KEY (string_id, language_id)
		);

Can someone tell me how to fake describe until something like this is implemented? Sorry, I'm too dependent on Oracle apparently :(

That's not a legal FOREIGN KEY clause; you have to specify what the foreign key references. SQLite parses, but does not enforce, syntactically-legal FOREIGN KEY specifications; there's a PRAGMA that will retrieve foreign-key information from table definitions, allowing you to enforce such constraints with application code.

(What is?) SQL is a very capable language and there are very few questions that it cannot answer. I find that I can come up with some convoluted SQL query to answer virtually any question you could ask from the data. However, the performance of some of these queries is not what it should be - nor is the query itself easy to write in the first place. Some of the things that are hard to do in straight SQL are actually very commonly requested operations, including:

Calculate a running total - Show the cumulative salary within a department row by row, with each row including a summation of the prior rows' salary.

Find percentages within a group - Show the percentage of the total salary paid to an individual in a certain department. Take their salary and divide it by the sum of the salary in the department.

Top-N queries - Find the top N highest-paid people or the top N sales by region.

Compute a moving average - Average the current row's value and the previous N rows values together.

Perform ranking queries - Show the relative rank of an individual's salary within their department.

Analytic functions, are designed to address these issues. They add extensions to the SQL language that not only make these operations easier to code; they make them faster than could be achieved with the pure SQL approach. These extensions are currently under review by the ANSI SQL committee for inclusion in the SQL specification.

The syntax of the analytic function is rather straightforward in appearance, but looks can be deceiving. It starts with:

FUNCTION_NAME(<argument>,<argument>,) OVER (<Partition-Clause> <Order-by-Clause> <Windowing Clause>)

The PARTITION BY clause logically breaks a single result set into N groups, according to the criteria set by the partition expressions. The words 'partition' and 'group' are used synonymously.

The ORDER BY clause specifies how the data is sorted within each group (partition).

The windowing clause gives us a way to define a sliding or anchored window of data, on which the analytic function will operate, within a group. This clause can be used to have the analytic function compute its value based on any arbitrary sliding or anchored window within a group.

Ex:

This example shows how to use the analytical function SUM to perform a cumulative sum. First, we fill some values in a table. The table is very simple and consists of the field dt and xy only. Note, that for a given date it is possible to insert multiple rows which is exactly what I do here. What I am interested is to extract the cumulative sum for each day in the table. That is, if I have three entries for the same date, for example 3, 4 and 5, I don't want the sum to only be 3+4+5 for each row, but 3 for the first row, 3+4 for the second row and 3+4+5 for the third row. create table sum_example ( dt date, xy number );

insert into sum_example values (to_date('27.08.1970','DD.MM.YYYY'),4);
insert into sum_example values (to_date('02.09.1970','DD.MM.YYYY'),1);
insert into sum_example values (to_date('09.09.1970','DD.MM.YYYY'),5);
insert into sum_example values (to_date('26.08.1970','DD.MM.YYYY'),3);
insert into sum_example values (to_date('28.08.1970','DD.MM.YYYY'),4);
insert into sum_example values (to_date('26.08.1970','DD.MM.YYYY'),6);
insert into sum_example values (to_date('29.08.1970','DD.MM.YYYY'),9);
insert into sum_example values (to_date('30.08.1970','DD.MM.YYYY'),2);
insert into sum_example values (to_date('12.09.1970','DD.MM.YYYY'),7);
insert into sum_example values (to_date('23.08.1970','DD.MM.YYYY'),2);
insert into sum_example values (to_date('27.08.1970','DD.MM.YYYY'),5);
insert into sum_example values (to_date('09.09.1970','DD.MM.YYYY'),9);
insert into sum_example values (to_date('01.09.1970','DD.MM.YYYY'),3);
insert into sum_example values (to_date('07.09.1970','DD.MM.YYYY'),1);
insert into sum_example values (to_date('12.09.1970','DD.MM.YYYY'),4);
insert into sum_example values (to_date('03.09.1970','DD.MM.YYYY'),5);
insert into sum_example values (to_date('03.09.1970','DD.MM.YYYY'),8);
insert into sum_example values (to_date('07.09.1970','DD.MM.YYYY'),7);
insert into sum_example values (to_date('04.09.1970','DD.MM.YYYY'),8);
insert into sum_example values (to_date('09.09.1970','DD.MM.YYYY'),1);
insert into sum_example values (to_date('29.08.1970','DD.MM.YYYY'),3);
insert into sum_example values (to_date('30.08.1970','DD.MM.YYYY'),7);
insert into sum_example values (to_date('24.08.1970','DD.MM.YYYY'),7);
insert into sum_example values (to_date('07.09.1970','DD.MM.YYYY'),9);
insert into sum_example values (to_date('26.08.1970','DD.MM.YYYY'),2);
insert into sum_example values (to_date('09.09.1970','DD.MM.YYYY'),8);

select dt, sum(xy) over (partition by trunc(dt) order by dt rows between unbounded preceding and current row) s, xy from sum_example;

drop table sum_example;

The select statement will return:

23.08.70 2 2
24.08.70 7 7
26.08.70 3 3
26.08.70 5 2
26.08.70 11 6
27.08.70 4 4
27.08.70 9 5
28.08.70 4 4
29.08.70 9 9
29.08.70 12 3
30.08.70 2 2
30.08.70 9 7
01.09.70 3 3
02.09.70 1 1
03.09.70 5 5
03.09.70 13 8
04.09.70 8 8
07.09.70 1 1
07.09.70 8 7
07.09.70 17 9
09.09.70 5 5
09.09.70 14 9
09.09.70 15 1
09.09.70 23 8
12.09.70 7 7
12.09.70 11 4

The third column correspondents to xy (the values inserted with the insert into ... above). The interesting column is the second. For example on the 26th of August in 1970, the first row for that date is 3 (equals xy), the second is 5 (equals xy+3) and the third is 11 (equals xy+3+5).

List of analitic functions:

AVG (<distinct|all> expression ) Used to compute an average of an expression within a group and window. Distinct may be used to find the average of the values in a group after duplicates have been removed.

CORR (expression, expression) Returns the coefficient of correlation of a pair of expressions that return numbers. It is shorthand for:

COVAR_POP(expr1, expr2) /

STDDEV_POP(expr1) * STDDEV_POP(expr2)).

Statistically speaking, a correlation is the strength of an association between variables. An association between variables means that the value of one variable can be predicted, to some extent, by the value of the other. The correlation coefficient gives the strength of the association by returning a number between -1 (strong inverse correlation) and 1 (strong correlation). A value of 0 would indicate no correlation.

COUNT (<distinct> <*> <expression>) This will count occurrences within a group. If you specify * or some non-null constant, count will count all rows. If you specify an expression, count returns the count of non-null evaluations of expression. You may use the DISTINCT modifier to count occurrences of rows in a group after duplicates have been removed.

COVAR_POP (expression, expression) This returns the population covariance of a pair of expressions that return numbers.

COVAR_SAMP (expression, expression) This returns the sample covariance of a pair of expressions that return numbers.

CUME_DIST This computes the relative position of a row in a group. CUME_DIST will always return a number greater then 0 and less then or equal to 1. This number represents the 'position' of the row in the group of N rows. In a group of three rows, the cumulate distribution values returned would be 1/3, 2/3, and 3/3 for example.

DENSE_RANK This function computes the relative rank of each row returned from a query with respect to the other rows, based on the values of the expressions in the ORDER BY clause. The data within a group is sorted by the ORDER BY clause and then a numeric ranking is assigned to each row in turn starting with 1 and continuing on up. The rank is incremented every time the values of the ORDER BY expressions change. Rows with equal values receive the same rank (nulls are considered equal in this comparison). A dense rank returns a ranking number without any gaps. This is in comparison to RANK below.

FIRST_VALUE This simply returns the first value from a group.

LAG (expression, <offset>, <default>) LAG gives you access to other rows in a resultset without doing a self-join. It allows you to treat the cursor as if it were an array in effect. You can reference rows that come before the current row in a given group. This would allow you to select 'the previous rows' from a group along with the current row. See LEAD for how to get 'the next rows'.

Offset is a positive integer that defaults to 1 (the previous row). Default is the value to be returned if the index is out of range of the window (for the first row in a group, the default will be returned)

LAST_VALUE This simply returns the last value from a group.

LEAD (expression, <offset>, <default>) LEAD is the opposite of LAG. Whereas LAG gives you access to the a row preceding yours in a group - LEAD gives you access to the a row that comes after your row.

Offset is a positive integer that defaults to 1 (the next row). Default is the value to be returned if the index is out of range of the window (for the last row in a group, the default will be returned).

MAX(expression) Finds the maximum value of expression within a window of a group.

MIN(expression) Finds the minimum value of expression within a window of a group.

NTILE (expression) Divides a group into 'value of expression' buckets.

For example; if expression = 4, then each row in the group would be assigned a number from 1 to 4 putting it into a percentile. If the group had 20 rows in it, the first 5 would be assigned 1, the next 5 would be assigned 2 and so on. In the event the cardinality of the group is not evenly divisible by the expression, the rows are distributed such that no percentile has more than 1 row more then any other percentile in that group and the lowest percentiles are the ones that will have 'extra' rows. For example, using expression = 4 again and the number of rows = 21, percentile = 1 will have 6 rows, percentile = 2 will have 5, and so on.

PERCENT_RANK This is similar to the CUME_DIST (cumulative distribution) function. For a given row in a group, it calculates the rank of that row minus 1, divided by 1 less than the number of rows being evaluated in the group. This function will always return values from 0 to 1 inclusive.

RANK This function computes the relative rank of each row returned from a query with respect to the other rows, based on the values of the expressions in the ORDER BY clause. The data within a group is sorted by the ORDER BY clause and then a numeric ranking is assigned to each row in turn starting with 1 and continuing on up. Rows with the same values of the ORDER BY expressions receive the same rank; however, if two rows do receive the same rank the rank numbers will subsequently 'skip'. If two rows are number 1, there will be no number 2 - rank will assign the value of 3 to the next row in the group. This is in contrast to DENSE_RANK, which does not skip values.

RATIO_TO_REPORT (expression) This function computes the value of expression / (sum(expression)) over the group.

This gives you the percentage of the total the current row contributes to the sum(expression).

REGR_ xxxxxxx (expression, expression) These linear regression functions fit an ordinary-least-squares regression line to a pair of expressions. There are 9 different regression functions available for use.

ROW_NUMBER Returns the offset of a row in an ordered group. Can be used to sequentially number rows, ordered by certain criteria.

STDDEV (expression) Computes the standard deviation of the current row with respect to the group.

STDDEV_POP (expression) This function computes the population standard deviation and returns the square root of the population variance. Its return value is same as the square root of the VAR_POP function.

STDDEV_SAMP (expression) This function computes the cumulative sample standard deviation and returns the square root of the sample variance. This function returns the same value as the square root of the VAR_SAMP function would.

SUM(expression) This function computes the cumulative sum of expression in a group.

VAR_POP (expression) This function returns the population variance of a non-null set of numbers (nulls are ignored). VAR_POP function makes the following calculation for us:

(SUM(expr*expr) - SUM(expr)*SUM(expr) / COUNT(expr)) / COUNT(expr)

VAR_SAMP (expression) This function returns the sample variance of a non-null set of numbers (nulls in the set are ignored). This function makes the following calculation for us:

(SUM(expr*expr) - SUM(expr)*SUM(expr) / COUNT(expr)) / (COUNT(expr) - 1)

VARIANCE (expression) This function returns the variance of expression. Oracle will calculate the variance as follows:

0 if the number of rows in expression = 1

VAR_SAMP if the number of rows in expression > 1

More details on the web ... ask tom !?


FEATURES ADDED IN RECENT VERSIONS

The infrastructure for this syntax now exists, but you have to create a user-defined regex matching function.

        create table mysql_sequences (
            sequence_name char(32) not null primary key,
            sequence_start bigint not null default 1,
            sequence_increment bigint not null default 1,
            sequence_value bigint not null default 1
        )


REMARK
Sqlite is finally a database product that values performance and minimal footprint (disk and memory) above a trashcan strategy that would add whatever feature to make the result so-called 'feature rich', say, a bloated piece of software. Therefore, I would vehemently reject all additions listed above, except for one. It's quite difficult to obtain the result for a correlated 'NOT EXISTS' subquery in any alternative way; which is the choice way to determine a subset of data that doesn't satisfy criteria contained in another table.

In my experience I have found 'NOT EXISTS' (or is it 'NOT IN') to be extraordinarly slow. Being that SQLite provides 'EXCEPT' the much faster construct can be used to the same end (at least it was faster with Oracles's equvalent: 'MINUS', to wit:

	select name,addr from employee where id not in (select id from sales)

becomes

	select name,addr from employee where id in (
		select id from employee
		except
		select id from sales
	)

-- Are you calling Oracle 'a bloated piece of software'?. LOL. I would love to see a comparison of Oracle and SQLite (latest stable or bleeding edge SQLite version Vs Oracle 10g). I would love it. [This comparison idea is as valid as comparing a novel to a short story.] Anyway, SQLite seems a lil' database engine for lil' works. Sorry, not enough for me :). -- Why would anyone compare Oracle to sqlite other than to say "can you add support for this Oracle syntax to make migration between them easier"? -- Someone might mistakenly compare Oracle to SQLite because they fail to comprehend that the two products solve very different problems.

* Ha-Ha-Ha. Be sure I am not dummy users, so I know answers to all questions you asks. I have more than 15 years of db experience. All your blames that I have not take into account something are really foolish! Because let me repeat: I have made the same table with the same fields with the same data in records, and run the same queries on the same hardware in the same clean environment (no other apps was run to eat CPU or disk). And I have not use any tricks like "FIRST_ROWS". Both dbs was on default parameters. You still claim that this is not fair bench ???!!! Tell this to somebody else.

I wonder how useful these "remarks" are...

They don't really pertain to the question of how well SQLite supports either standard or "extended" SQL features; I'd suggest that if the participants in this debate want to continue it, they create a new wiki page specifically for it; copy everything not related to feature support over to it, and delete it from this page.

What about Apache Derby? It uses the Apache 2.0 license and is easy to embed in Java applications (http://db.apache.org/derby/). -- See SqliteVersusDerby


Tcl related

  set values [list a b c]
  db eval { SELECT * FROM table WHERE x IN ($values) }

SQLite does its own variable interpolation which avoids the (messy) need to do value quoting/escaping (to protect against SQL injection attacks, etc.) but in the case where it's an "IN ($variable)" clause, it treats $variable as a single value instead of a Tcl list of values. Or, maybe I'm doing something wrong. If I am, please let me know: dossy@panoptic.com.