1.关联子查询

1.1.关联子查询和自连接在很多时候都是等价的

1.2.使用SQL进行行间比较时,发挥主要作用的技术是关联子查询,特别是与自连接相结合的“自关联子查询”

1.3.缺点

  • 1.3.1.代码的可读性不好

    • 1.3.1.1.特别是在计算累计值和移动平均值的例题里,与聚合一起使用后,其内部处理过程非常难理解
  • 1.3.2.性能不好

    • 1.3.2.1.特别是在SELECT子句里使用标量子查询时,性能可能会变差

2.增长、减少、维持现状

2.1.使用基于时间序列的表进行时间序列分析

2.2.示例

  • 2.2.1.–求与上一年营业额一样的年份(1):使用关联子查询
SELECT year, saleFROM Sales S1WHERE sale = (SELECT saleFROM Sales S2WHERE S2.year = S1.year -1)ORDER BY year;
  • 2.2.2.S2.year = S1.year -1这个条件起到了将要比较的数据偏移一行的作用

  • 2.2.3.–求与上一年营业额一样的年份(2):使用自连接

SELECT S1.year, S1.saleFROM Sales S1,Sales S2WHERE S2.sale = S1.saleAND S2.year = S1.year -1ORDER BY year;

3.用列表展示与上一年的比较结果

3.1.示例

  • 3.1.1.–求出是增长了还是减少了,抑或是维持现状(1):使用关联子查询
SELECT S1.year, S1.sale,CASE WHEN sale =(SELECT saleFROM Sales S2WHERE S2.year = S1.year -1) THEN'→'--持平WHEN sale >(SELECT saleFROM Sales S2WHERE S2.year = S1.year -1) THEN'↑'--增长WHEN sale <(SELECT saleFROM Sales S2WHERE S2.year = S1.year -1) THEN'↓'--减少ELSE'—'END AS varFROM Sales S1ORDER BY year;
  • 3.1.2.–求出是增长了还是减少了,抑或是维持现状(2):使用自连接查询(最早的年份不会出现在结果里)
SELECT S1.year, S1.sale,CASE WHEN S1.sale = S2.sale THEN'→'WHEN S1.sale > S2.sale THEN'↑'WHEN S1.sale < S2.sale THEN'↓'ELSE'—'END AS varFROM Sales S1, Sales S2WHERE S2.year = S1.year -1ORDER BY year;

4.时间轴有间断时

4.1.和过去最临近的时间进行比较

4.2.示例

  • 4.2.1.–查询与过去最临近的年份营业额相同的年份
SELECT year, saleFROM Sales2 S1WHERE sale =(SELECT saleFROM Sales2 S2WHERE S2.year =(SELECT MAX(year) --条件2:在满足条件1的年份中,年份最早的一个FROM Sales2 S3WHERE S1.year > S3.year)) --条件1:与该年份相比是过去的年份ORDER BY year;
  • 4.2.2. 自连接版本
SELECT S1.year AS year,S1.year AS yearFROM Sales2 S1, Sales2 S2WHERE S1.sale = S2.saleAND S2.year = (SELECT MAX(year)FROM Sales2 S3WHERE S1.year > S3.year)ORDER BY year;
  • 4.2.3.–求每一年与过去最临近的年份之间的营业额之差(1):结果里不包含最早的年份
SELECT S2.year AS pre_year,S1.year AS now_year,S2.sale AS pre_sale,S1.sale AS now_sale,S1.sale - S2.sale AS diffFROM Sales2 S1, Sales2 S2WHERE S2.year = (SELECT MAX(year)FROM Sales2 S3WHERE S1.year > S3.year)ORDER BY now_year;
  • 4.2.4.–求每一年与过去最临近的年份之间的营业额之差(1):结果里不包含最早的年份
SELECT S2.year AS pre_year,S1.year AS now_year,S2.sale AS pre_sale,S1.sale AS now_sale,S1.sale - S2.sale AS diffFROM Sales2 S1, Sales2 S2WHERE S2.year = (SELECT MAX(year)FROM Sales2 S3WHERE S1.year > S3.year)ORDER BY now_year;
  • 4.2.5.使用极值函数时会发生排序

5.移动累计值和移动平均值

5.1.示例

  • 5.1.1.–求累计值:使用窗口函数
SELECT prc_date, prc_amt,SUM(prc_amt) OVER (ORDER BY prc_date) AS onhand_amtFROM Accounts;
  • 5.1.2.引入窗口函数的目的原本就是解决这类问题,因此这里的代码非常简洁

    • 5.1.2.1.如果选用的数据库支持窗口函数,也可以考虑使用窗口函数
  • 5.1.3.从性能方面来看,表的扫描和数据排序也都只进行了一次

    • 5.1.3.1.依赖于具体的数据库的
  • 5.1.4.–求累计值:使用冯·诺依曼型递归集合

SELECT prc_date, A1.prc_amt,(SELECT SUM(prc_amt)FROM Accounts A2WHERE A1.prc_date >= A2.prc_date ) AS onhand_amtFROM Accounts A1ORDER BY prc_date;
  • 5.1.5.–求移动累计值(1):使用窗口函数
SELECT prc_date, prc_amt,SUM(prc_amt) OVER (ORDER BY prc_dateROWS 2 PRECEDING) AS onhand_amtFROM Accounts;
  • 5.1.6.–求移动累计值(2):不满3行的时间区间也输出
SELECT prc_date, A1.prc_amt,(SELECT SUM(prc_amt)FROM Accounts A2WHERE A1.prc_date >= A2.prc_dateAND (SELECT COUNT(*)FROM Accounts A3WHERE A3.prc_dateBETWEEN A2.prc_date AND A1.prc_date ) <= 3 )AS mvg_sumFROM Accounts A1ORDER BY prc_date;
  • 5.1.7.A3.prc_date在以A2.prc_date为起点,以A1.prc_date为终点的区间内移动

  • 5.1.8.–移动累计值(3):不满3行的区间按无效处理

SELECT prc_date, A1.prc_amt,(SELECT SUM(prc_amt)FROM Accounts A2WHERE A1.prc_date >= A2.prc_dateAND (SELECT COUNT(*)FROM Accounts A3WHERE A3.prc_dateBETWEEN A2.prc_date AND A1.prc_date ) <= 3HAVING COUNT(*) =3) AS mvg_sum --不满3行数据的不显示FROM Accounts A1ORDER BY prc_date;

5.2.基本思路是使用冯·诺依曼型递归集合

6.查询重叠的时间区间

6.1.示例

  • 6.1.1.–求重叠的住宿期间
SELECT reserver, start_date, end_dateFROM Reservations R1WHERE EXISTS(SELECT *FROM Reservations R2WHERE R1.reserver <> R2.reserver --与自己以外的客人进行比较AND ( R1.start_date BETWEEN R2.start_date AND R2.end_date--条件(1):自己的入住日期在他人的住宿期间内OR R1.end_date BETWEEN R2.start_date AND R2.end_date));--条件(2):自己的离店日期在他人的住宿期间内
  • 6.1.2.–升级版:把完全包含别人的住宿期间的情况也输出
SELECT reserver, start_date, end_dateFROM Reservations R1WHERE EXISTS(SELECT *FROM Reservations R2WHERE R1.reserver <> R2.reserverAND ( ( R1.start_date BETWEEN R2.start_dateAND R2.end_dateOR R1.end_date BETWEEN R2.start_dateAND R2.end_date)OR ( R2.start_date BETWEEN R1.start_dateAND R1.end_dateAND R2.end_date BETWEEN R1.start_dateAND R1.end_date)));