参考文献

  • MySQL技术内幕-SQL编程 姜承尧

第四章 子查询

独立子查询

  • 子查询可以按两种方式分类.若按照期望值的数量,可以将子查询分为标量子查询和多值子查询;若按查询对外部查询的依赖可分为独立子查询(self-contained subquery)和相关子查询(correlated subquery)
  • 独立子查询是不依赖外部查询而运行的子查询.与相关子查询相比,独立子查询更便于SQL语句的调试.
  • 标量子查询可以出现在查询中希望产生标量值的任何地方,而多值子查询可以出现在查询中希望产生多值集合的任何地方.只要标量子查询返回的是单个值或NULL值,就说明该子查询是有效的.如果标量子查询返回多个值,则,MySQL数据库将抛出错误.

EXISTS谓词

  • EXISTS是一个非常强大的谓词,它允许数据库高效地检查查询是否产生某些行.通常EXISTS的输入是一个子查询,并关联到外部查询,但这不是必须的.根据子查询是否返回行,该谓词返回TRUEFALSE.

  • 与其他谓词和逻辑表达式不同的是,无论输入子查询是否返回行,EXISTS都不会返回UNKNOWN.如果子查询的过滤器为某行返回UNKNOWN,则表示该行不改返回,因此,UNKNOWN被认为是FALSE

NOT EXISTS

  • EXISTSIN的一个小区别体现在对三值逻辑的判断上.EXISTS总是返回TRUEFALSE,而对于IN,除了TRUE,FALSE值外,还有可能对NULL值返回UNKNOWN.但在过滤器中,UNKNOWN的处理方式与FALSE相同,因此使用IN与使用EXISTS一样,SQL优化器会选择相同的执行计划.

  • 但输入列表中包括NULL值时,NOT EXISTSNOT IN之间的差异就表现的非常明显了.输入列表中包含NULL值时,IN总是返回TRUEUNKNOWN,因此NOT IN总是返回NOT TRUENOT UNKNOWN,即FALSEUNKNOWN.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    mysql> SELECT NULL IN ('a','b',NULL)\G
    *************************** 1. row ***************************
    NULL IN ('a','b',NULL): NULL
    1 row in set (0.00 sec)

    mysql> SELECT NULL NOT IN ('a','b',NULL)\G
    *************************** 1. row ***************************
    NULL NOT IN ('a','b',NULL): NULL
    1 row in set (0.00 sec)

    mysql> SELECT 'a' NOT IN ('a','b',NULL)\G
    *************************** 1. row ***************************
    'a' NOT IN ('a','b',NULL): 0
    1 row in set (0.00 sec)

    mysql> SELECT 'c' NOT IN ('a','b',NULL)\G
    *************************** 1. row ***************************
    'c' NOT IN ('a','b',NULL): NULL
    1 row in set (0.00 sec)

派生表

  • 派生表又被称为表子查询,与其他表一样出现在FROM的子句中,但是是从子查询派生出的虚拟表中产生的.

  • 派生表的使用形式一般如下:

    1
    FROM (subquery expression) AS derived_table_alias
    • 目前派生表在使用上有以下使用规则:

      • 列的名称必须是唯一的

        1
        2
        mysql> SELECT a,b FROM (SELECT 'A' AS a ,'B' as b, 'C' AS a ) AS t\G
        ERROR 1060 (42S21): Duplicate column name 'a'
      • 在某些情况下不支持LIMIT

子查询可以解决的经典问题

行号
1
2
3
4
5
6
7
8
9
10
11
12
13
CREATE TABLE T (a CHAR ( 1 ));

INSERT INTO T SELECT 'X';
INSERT INTO T SELECT 'X';
INSERT INTO T SELECT 'X';
INSERT INTO T SELECT 'X';
INSERT INTO T SELECT 'X';
INSERT INTO T SELECT 'Y';
INSERT INTO T SELECT 'Z';

SELECT nums.a + smaller AS rownum,C.a
FROM ( SELECT a, count(*) AS count,( SELECT count(*) FROM T AS B WHERE B.a < A.a ) AS smaller FROM T AS A GROUP BY a ) AS C,nums
WHERE nums.a <= count;
1
2
3
4
5
6
7
8
9
10
11
+--------+------+
| rownum | a |
+--------+------+
| 1 | X |
| 2 | X |
| 3 | X |
| 4 | X |
| 5 | X |
| 6 | Y |
| 7 | Z |
+--------+------+
分区
  • 子查询
1
2
3
4
5
SELECT dept_no, emp_no,( SELECT count(*) FROM dept_manager AS S2 WHERE S1.dept_no = S2.dept_no AND S2.emp_no <= S1.emp_no ) AS rownum 
FROM
dept_manager AS S1
ORDER BY
dept_no,emp_no;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
+---------+--------+--------+
| dept_no | emp_no | rownum |
+---------+--------+--------+
| d001 | 110022 | 1 |
| d001 | 110039 | 2 |
| d002 | 110085 | 1 |
| d002 | 110114 | 2 |
| d003 | 110183 | 1 |
| d003 | 110228 | 2 |
| d004 | 110303 | 1 |
| d004 | 110344 | 2 |
| d004 | 110386 | 3 |
| d004 | 110420 | 4 |
| d005 | 110511 | 1 |
| d005 | 110567 | 2 |
| d006 | 110725 | 1 |
| d006 | 110765 | 2 |
| d006 | 110800 | 3 |
| d006 | 110854 | 4 |
| d007 | 111035 | 1 |
| d007 | 111133 | 2 |
| d008 | 111400 | 1 |
| d008 | 111534 | 2 |
| d009 | 111692 | 1 |
| d009 | 111784 | 2 |
| d009 | 111877 | 3 |
| d009 | 111939 | 4 |
+---------+--------+--------+
  • 对于分区问题,假设表中有mm个分区,每个分区有nn行数据,则扫描的总行数将是mn(n1)/2m*n*(n-1)/2,而mn=Nm*n=N为表中的所有数据.因此当nn的值较小,即每个分区中含有的数据较少时,扫描成本为O(N)O(N),而当nn的值较大时,该子查询解决方案的扫描成本为O(N2)O(N^2)

  • 基于游标的解决方案

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    CREATE PROCEDURE pGetPartitionNUmber () 
    BEGIN
    DECLARE done INT DEFAULT 0;
    DECLARE deptno CHAR ( 4 ) DEFAULT NULL;
    DECLARE deptno_prev CHAR ( 4 ) DEFAULT NULL;
    DECLARE empno INT;
    DECLARE rn INT DEFAULT 0;
    DECLARE cur1 CURSOR FOR SELECT dept_no,emp_no FROM dept_manager ORDER BY dept_no,emp_no;
    DECLARE CONTINUE HANDLER FOR NOT found SET done = 1;
    CREATE TEMPORARY TABLE $ret ( dept_no CHAR ( 4 ), emp_no INT, rownum INT ) ENGINE = INNODB;
    OPEN cur1;
    FETCH cur1 INTO deptno,empno;
    START TRANSACTION;
    WHILE done = 0 DO
    IF deptno_prev IS NULL THEN
    SET rn = 1;
    ELSEIF deptno = deptno_prev THEN
    SET rn = rn + 1;
    ELSE
    SET rn = 1;
    END IF;
    SET deptno_prev = deptno;
    INSERT INTO $ret SELECT deptno,empno,rn;
    FETCH cur1 INTO deptno,empno;
    END WHILE;
    COMMIT;
    CLOSE cur1;
    SELECT * FROM $ret;
    DROP TABLE $ret;
    END;

    call pGetPartitionNUmber();
  • 基于临时表的解决方案

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    -- 创建带有自增长列的临时表
    CREATE TABLE $temp (
    id INT UNSIGNED auto_increment,
    dept_no CHAR ( 4 ),
    emp_no INT,
    PRIMARY KEY ( id )
    );

    -- 根据分区的要求插入数据
    INSERT INTO $temp SELECT NULL,dept_no,emp_no FROM dept_manager ORDER BY dept_no,emp_no;

    -- JOIN运算,通过嵌套子查询得到结果
    SELECT m.dept_no,m.emp_no,id-minid+1 AS ROW_NUM
    FROM $temp AS m INNER JOIN (select dept_no,min(id) AS minid FROM $temp GROUP BY dept_no) AS n on m.dept_no=n.dept_no;

    -- 删除临时表
    DROP TABLE $temp;
最小缺失值问题
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
CREATE TABLE x ( a INT UNSIGNED PRIMARY KEY, b CHAR ( 1 ) NOT NULL ) ENGINE = innodb;

INSERT INTO x SELECT 3,'a';
INSERT INTO x SELECT 4,'b';
INSERT INTO x SELECT 6,'c';
INSERT INTO x SELECT 7,'d';

SELECT
CASE
WHEN NOT EXISTS ( SELECT a FROM x WHERE a = 1 ) THEN 1
ELSE
(SELECT min( a )+ 1 AS missing FROM x AS A WHERE NOT EXISTS ( SELECT * FROM x AS B WHERE A.a + 1 = B.a ))
END AS missing;

-- OUTER JOIN写法
SELECT MIN(a.a+1) FROM x a LEFT JOIN x b ON a.a+1 = b.a WHERE b.a IS NULL;
  • 补充缺失的值
1
2
3
4
5
6
7
INSERT INTO x
SELECT
CASE
WHEN NOT EXISTS ( SELECT a FROM x WHERE a = 1 ) THEN 1
ELSE
(SELECT min( a )+ 1 AS missing FROM x AS A WHERE NOT EXISTS ( SELECT * FROM x AS B WHERE A.a + 1 = B.a ))
END AS missing,'p';
缺失范围和连续范围
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
CREATE TABLE g (a int );

INSERT INTO g SELECT 1;
INSERT INTO g SELECT 2;
INSERT INTO g SELECT 3;
INSERT INTO g SELECT 100;
INSERT INTO g SELECT 101;
INSERT INTO g SELECT 102;
INSERT INTO g SELECT 104;
INSERT INTO g SELECT 105;
INSERT INTO g SELECT 106;
-- 缺失范围
-- 方法一
SELECT a + 1 AS start_range,( SELECT MIN( a )- 1 FROM g AS C WHERE C.a > A.a ) AS end_range
FROM g AS A
WHERE NOT EXISTS (SELECT * FROM g AS B WHERE A.a + 1 = B.a ) AND a < ( SELECT MAX( a ) FROM g );

-- 方法二
SELECT cur+1 AS start_range,next-1 AS end_range
FROM (SELECT a AS cur,(SELECT MIN(a) FROM g AS B WHERE B.a > A.a) AS next FROM g AS A) AS C WHERE next - cur >1;
1
2
3
4
5
6
+-------------+-----------+
| start_range | end_range |
+-------------+-----------+
| 4 | 99 |
| 103 | 103 |
+-------------+-----------+
  • 连续问题,采用行号来进行分组是最快的一种做法
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- better
SELECT
min( a ) AS start_range,
max( a ) AS end_range
FROM
(SELECT a,rn,a - rn AS diff FROM ( SELECT a, @a := @a + 1 rn FROM t,( SELECT @a := 0 ) AS a ) AS b ) AS c
GROUP BY diff;

-- not good
SELECT
min( a ) AS start_range,
max( a ) AS end_range
FROM
(SELECT a,(SELECT MIN( a ) FROM g AS A WHERE NOT EXISTS ( SELECT * FROM g AS B WHERE A.a + 1 = B.a) AND A.a >= C.a ) AS max FROM g AS C ) AS D
GROUP BY max

第五章 联接与集合操作

联接查询

  • CROSS JOIN
  • INNER JOIN
  • OUTER JOIN

CROSS JOIN

  • CROSS JOIN对两个表执行笛卡尔积,返回两个表中所有列的组合,若左表有mm行数据,右表有nn行数据,则CROSS JOIN将返回mnm*n行的表

    1
    2
    3
    4
    5
    6
    7
    8
    -- SQL 92
    SELECT *
    FROM tab_a AS A
    CROSS JOIN tab_b AS B

    -- SQL 89
    SELECT *
    FROM tab_a AS A,tab_b AS B

INNER JOIN

  • 通过INNER JOIN用户可以根据一些过滤条件来匹配表之间的数据.在逻辑查询的前三个处理阶段中INNER JOIN应用前两个阶段,即首先产生笛卡尔积的虚拟表,再按照ON过滤条件来进行数据的匹配操作.

  • INNER JOIN没有第三步操作,即不添加外部行,这是和OUTER JOIN最大的区别之一.也正因为不会添加外部行,指定过滤条件在ON子句和WHERE子句中没有任何区别.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    SELECT a.emp_no,first_name,last_name
    FROM employees a
    [INNER] JOIN dept_manager b
    ON a.emp_no = b.emp_no
    WHERE dept_no = 'd001';

    -- <==>
    SELECT a.emp_no,first_name,last_name
    FROM employees a
    [INNER] JOIN dept_manager b
    ON a.emp_no = b.emp_no
    AND dept_no = 'd001';

OUTER JOIN

  • 通过OUTER JOIN用户可以按照一些过滤条件来匹配表之间的数据.与INNER JOIN不同的是,在通过OUTER JOIN添加的保留表中存在未找到的匹配数据.

  • OUTER JOIN应用逻辑查询的前三个步骤,即产生笛卡尔积,应用ON过滤器和添加外部行.对表留表中的行数据,如果是未找到匹配数据而添加的记录,其值用NULL进行填充.

  • 需要注意的事,INNER JOIN中的过滤条件都可以写在ON子句中,而OUTER JOIN的过滤条件不可以这样处理,因为会得到不正确的结果.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    mysql> SELECT e.*,dm.dept_no FROM employees e  LEFT JOIN dept_manager dm on e.emp_no = dm.emp_no AND dm.dept_no='d005' LIMIT 10;
    +--------+------------+------------+-----------+--------+------------+---------+
    | emp_no | birth_date | first_name | last_name | gender | hire_date | dept_no |
    +--------+------------+------------+-----------+--------+------------+---------+
    | 10001 | 1953-09-02 | Georgi | Facello | M | 1986-06-26 | NULL |
    | 10002 | 1964-06-02 | Bezalel | Simmel | F | 1985-11-21 | NULL |
    | 10003 | 1959-12-03 | Parto | Bamford | M | 1986-08-28 | NULL |
    | 10004 | 1954-05-01 | Chirstian | Koblick | M | 1986-12-01 | NULL |
    | 10005 | 1955-01-21 | Kyoichi | Maliniak | M | 1989-09-12 | NULL |
    | 10006 | 1953-04-20 | Anneke | Preusig | F | 1989-06-02 | NULL |
    | 10007 | 1957-05-23 | Tzvetan | Zielinski | F | 1989-02-10 | NULL |
    | 10008 | 1958-02-19 | Saniya | Kalloufi | M | 1994-09-15 | NULL |
    | 10009 | 1952-04-19 | Sumant | Peac | F | 1985-02-18 | NULL |
    | 10010 | 1963-06-01 | Duangkaew | Piveteau | F | 1989-08-24 | NULL |
    +--------+------------+------------+-----------+--------+------------+---------+
    10 rows in set (0.00 sec)
    1
    2
    3
    4
    5
    6
    7
    8
    mysql> SELECT e.*,dm.dept_no FROM employees e  LEFT JOIN dept_manager dm on e.emp_no = dm.emp_no WHERE dm.dept_no='d005' LIMIT 10;
    +--------+------------+------------+-----------+--------+------------+---------+
    | emp_no | birth_date | first_name | last_name | gender | hire_date | dept_no |
    +--------+------------+------------+-----------+--------+------------+---------+
    | 110511 | 1957-07-08 | DeForest | Hagimont | M | 1985-01-01 | d005 |
    | 110567 | 1964-04-25 | Leon | DasSarma | F | 1986-10-21 | d005 |
    +--------+------------+------------+-----------+--------+------------+---------+
    2 rows in set (0.00 sec)

NATURAL JOIN

  • ANSI SQL还支持NATURAL JOIN,即自然联接.NATURAL JOIN等同于INNER JOINUSING的组合,它隐含的作用是将两个表中具有相同名称的列进行匹配.同样的NATURAL LEFT(RIGHT) JOIN等同于LEFT(RIGHT) OUTER JOINUSING的组合.

    1
    2
    3
    4
    5
    SELECT a.emp_no,first_name,last_name 
    FROM employees a INNER JOIN dept_manager b ON a.emp_no = b.emp_no;

    SELECT a.emp_no,first_name,last_name
    FROM employees a NATURAL JOIN dept_manager b;

STRAIGHT_JOIN

  • STRAIGHT_JOIN其实不是新的联接类型,而是用户对SQL优化器的控制,其等同于JOIN.通过STRAIGHT_JOIN,MySQL数据库会强制先读取左边的表.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
mysql> explain SELECT a.emp_no,first_name,last_name FROM employees a INNER JOIN dept_manager b ON a.emp_no = b.emp_no;
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+-------------+
| 1 | SIMPLE | b | NULL | index | PRIMARY | dept_no | 16 | NULL | 24 | 100.00 | Using index |
| 1 | SIMPLE | a | NULL | eq_ref | PRIMARY | PRIMARY | 4 | employees.b.emp_no | 1 | 100.00 | NULL |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)

mysql> explain SELECT a.emp_no,first_name,last_name FROM employees a STRAIGHT_JOIN dept_manager b ON a.emp_no = b.emp_no;
+----+-------------+-------+------------+------+---------------+---------+---------+--------------------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+---------+---------+--------------------+--------+----------+-------------+
| 1 | SIMPLE | a | NULL | ALL | PRIMARY | NULL | NULL | NULL | 299423 | 100.00 | NULL |
| 1 | SIMPLE | b | NULL | ref | PRIMARY | PRIMARY | 4 | employees.a.emp_no | 1 | 100.00 | Using index |
+----+-------------+-------+------------+------+---------------+---------+---------+--------------------+--------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
  • 对两张表的INNER JOIN,通常MySQL数据库的优化器都能工作得很好.但对于有多张表参与联接的语句,MySQL数据库的优化器选择可能不总是正确的.

SELF JOIN

  • SELF JOIN是同一个表的两个实例之间的JOIN操作.

NONEQUI JON

  • 前面介绍的都是EQUAL JOIN(等值联接),即联接条件是基于"等于"运算符的联接操作.NONEQUI JON的联接条件包含"等于"运算法之外的运算符.

    1
    2
    3
    SELECT a.dept_no,a.emp_no,count(1) as rownum
    FROM dept_manager a INNER JOIN dept_manager b on a.emp_no >= b.emp_no
    GROUP BY a.dept_no, a.emp_no;

SEMI JOINANTI SEMI JOIN

  • SEMI JOIN是根据一个表中存在的相关记录找到另外一个表中相关数据的联接.如果从左表返回记录,该联接被称为左半联接;如果从右表返回记录,该联接被称为右半联接.

  • 实现SEMI JOIN的方法有多种,如内部联接,子查询,集合操作等.在使用内部联接方式时,只从一个表中选择记录,然后应用DISTINCT.

    1
    2
    3
    4
    5
    -- 查询来自杭州且发生过订单的客户信息
    SELECT DISTINCT c.customer_id,c.city
    FROM customers AS c
    JOIN orders AS o ON c.customer_id=o.customer_id
    WHERE c.city = 'HangZhou'
  • SEMI JOIN相反的是ANTI SEMI JOIN,它根据一个表中不存在的记录而从另外一个表中返回记录.使用OUTER JOIN并过滤外部行,可以实现ANTI SEMI JOIN

    1
    2
    3
    4
    5
    -- 来自杭州但没有订单的客户信息
    SELECT c.customer_id,c.city
    FROM customers AS c
    LEFT JOIN orders AS o ON c.customer_id = o.customer_id
    WHERE c.city = 'HangZhou' AND o.customer_id IS NULL

多表联接查询

  • 多表联接是查询涉及或者更多张表之间的联接查询操作.

  • 对于INNER JOIN的多表联接查询,可以随意安排表的顺序,而不会影响查询的结果.这是因为优化器会自动根据成本评估出访问表的顺序.在该查询的执行计划中,可能会发现优化器的顺序不同于在查询中指定的顺序.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    mysql> EXPLAIN SELECT a.emp_no, c.first_name, b.title, d.dept_name  FROM dept_manager a JOIN titles b ON a.emp_no = b.emp_no JOIN employees c ON c.emp_no = a.emp_no JOIN departments d ON d.dept_no = a.dept_no;
    +----+-------------+-------+-----------------------------------------------------------------------------+--------+-----------------+-----------+---------+---------------------+------+----------+-------------+
    | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
    +----+-------------+-------+-----------------------------------------------------------------------------+--------+-----------------+-----------+---------+---------------------+------+----------+-------------+
    | 1 | SIMPLE | d | NULL | index | PRIMARY | dept_name | 162 | NULL | 9 | 100.00 | Using index |
    | 1 | SIMPLE | a | NULL | ref | PRIMARY,dept_no | dept_no | 16 | employees.d.dept_no | 2 | 100.00 | Using index |
    | 1 | SIMPLE | c | NULL | eq_ref | PRIMARY | PRIMARY | 4 | employees.a.emp_no | 1 | 100.00 | NULL |
    | 1 | SIMPLE | b | p01,p02,p03,p04,p05,p06,p07,p08,p09,p10,p11,p12,p13,p14,p15,p16,p17,p18,p19 | ref | PRIMARY | PRIMARY | 4 | employees.a.emp_no | 1 | 100.00 | Using index |
    +----+-------------+-------+-----------------------------------------------------------------------------+--------+-----------------+-----------+---------+---------------------+------+----------+-------------+
    4 rows in set, 1 warning (0.00 sec)
    • 不按优化器所选择的顺序联接表,可使用STRAIGHT_JOIN

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      mysql> EXPLAIN SELECT a.emp_no, c.first_name, b.title, d.dept_name  FROM dept_manager a STRAIGHT_JOIN titles b ON a.emp_no = b.emp_no STRAIGHT_JOIN employees c ON c.emp_no = a.emp_no STRAIGHT_JOIN departments d ON d.dept_no = a.dept_no;
      +----+-------------+-------+-----------------------------------------------------------------------------+--------+-----------------+---------+---------+---------------------+------+----------+-------------+
      | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
      +----+-------------+-------+-----------------------------------------------------------------------------+--------+-----------------+---------+---------+---------------------+------+----------+-------------+
      | 1 | SIMPLE | a | NULL | index | PRIMARY,dept_no | dept_no | 16 | NULL | 24 | 100.00 | Using index |
      | 1 | SIMPLE | b | p01,p02,p03,p04,p05,p06,p07,p08,p09,p10,p11,p12,p13,p14,p15,p16,p17,p18,p19 | ref | PRIMARY | PRIMARY | 4 | employees.a.emp_no | 1 | 100.00 | Using index |
      | 1 | SIMPLE | c | NULL | eq_ref | PRIMARY | PRIMARY | 4 | employees.a.emp_no | 1 | 100.00 | NULL |
      | 1 | SIMPLE | d | NULL | eq_ref | PRIMARY | PRIMARY | 16 | employees.a.dept_no | 1 | 100.00 | NULL |
      +----+-------------+-------+-----------------------------------------------------------------------------+--------+-----------------+---------+---------+---------------------+------+----------+-------------+
      4 rows in set, 1 warning (0.00 sec)
  • 对于多表之间的INNER JOIN语句,也可以将INNER JOIN的表和ON过滤条件放在一起

    1
    2
    3
    4
    5
    6
    7
    8
    9
    EXPLAIN SELECT
    a.emp_no,
    c.first_name,
    b.title,
    d.dept_name
    FROM
    dept_manager a
    JOIN ( titles b, employees c, departments d )
    ON (a.emp_no = b.emp_no AND c.emp_no = a.emp_no AND d.dept_no = a.dept_no)

滑动订单问题

  • 滑动订单问题是指每个月返回上一年度(季度或月度等)的滑动订单数,即为每个月份N,返回从月份N-11到月份N的订单总数.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    SELECT DATE_FORMAT(a.order_month,'%Y%m') AS from_month,
    DATE_FORMAT(b.order_month,'%Y%m') AS to_month,
    SUM(c.order_num) AS orders
    -- a表作为下边界(from_month)
    FROM monthly_orders a
    -- b表作为上边界(to_month)
    INNER JOIN monthly_orders b on DATE_ADD(a.order_month,INTERVAL 11 MONTH) = b.order_month
    INNER JOIN monthly_orders c ON c.order_month BETWEEN a.order_month AND b.order_month
    GROUP BY a.order_month,b.order_month;
    1
    2
    3
    4
    5
    6
    7
    8
    -- 每个季度订单的情况
    SELECT DATE_FORMAT(a.order_month,'%Y%m') AS from_month,
    DATE_FORMAT(b.order_month,'%Y%m') AS to_month,
    SUM(c.order_num) AS orders
    FROM monthly_orders a
    INNER JOIN monthly_orders b on DATE_ADD(a.order_month,INTERVAL 2 MONTH) = b.order_month AND MONTH(a.order_month)%3=1
    INNER JOIN monthly_orders c ON c.order_month BETWEEN a.order_month AND b.order_month
    GROUP BY a.order_month,b.order_month;

联接算法

集合操作

  • 通常来说,将联接操作看成是表之间的水平操作,因为该操作生成的虚拟表包含两个表中的列.两表之间的集合操作,一般将这些操作视为垂直操作.MySQL数据库支持两种集合操作UNION ALLUNION DISTINCT

  • 与联接操作一样,集合操作也是对两个输入进行操作,并生成一个虚拟表.在联接操作中,一般把输入表称为左输入和右输入,或者第一个输入和第二个输入.集合操作的两个输入必须拥有相同的列数,若数据类型不同,MySQL数据库会自动将进行隐式转化.同时,结果列的名称由第一个输入决定.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    DROP TABLE IF EXISTS  x ;
    CREATE TABLE x (a char(1)) ENGINE=innodb;

    DROP TABLE IF EXISTS y;
    CREATE TABLE y (a char(1)) ENGINE=innodb;
    INSERT INTO x SELECT 'a';
    INSERT INTO x SELECT 'b';
    INSERT INTO x SELECT 'c';
    INSERT INTO y SELECT 'a';
    INSERT INTO y SELECT 'b';

    SELECT a AS m FROM x
    UNION
    SELECT 1 AS n FROM DUAL
    UNION
    SELECT 'abc' AS o FROM DUAL
    UNION
    SELECT now() AS p FROM DUAL;

    *************************** 1. row ***************************
    m: a
    *************************** 2. row ***************************
    m: b
    *************************** 3. row ***************************
    m: c
    *************************** 4. row ***************************
    m: 1
    *************************** 5. row ***************************
    m: abc
    *************************** 6. row ***************************
    m: 2024-05-31 16:19:12
    6 rows in set (0.01 sec)
    • 这里对各种不同类型进行联接,先从x表中取出类型CHAR(1)的字符,第二个进行UNION操作的是整型数1,第三个是字符串abc第四个是日期类型.虽然类型各不相同,但是MySQL数据库会自动对其判断,选出一种类型进行隐式转换.
    • 另外一方面.在这个例子中对每个选取操作都进行了别名定义,从最后的结果来看,MySQL数据库选择了m这个别名,也就是集合操作中第一个SELECT输入的别名.
    • 除了一下两点,集合操作中的SELECT语句和一般的SELECT查询并无不同:
      • 只有最后一个SELECT可以应用INTO OUTFILE,但是整个集合的操作将被输出到文件中.
      • 不能再SELECT语句中使用HIGH_PRIORITY关键字

UNION DISTINCTUNION ALL

  • MySQL数据库目前对UNION DISTINCT的实现方式如下:
    • 创建一张临时表,即虚拟表
    • 对这种临时表的列添加唯一索引(Unique Index)
    • 将输入的数据插入临时表
    • 返回虚拟表