SQL for Data Scientists in 100 Queries

☆ what this is

☆ scope

☆ setup

☆ background concepts

concept map: overview

☆ connect to database

src/connect_penguins.sh

sqlite3 data/penguins.db

1: select constant

src/select_1.sql

select 1;

out/select_1.out

1

2: select all values from table

src/select_star.sql

select * from little_penguins;

out/select_star.out

Adelie|Dream|37.2|18.1|178|3900|MALE
Adelie|Dream|37.6|19.3|181|3300|FEMALE
Gentoo|Biscoe|50|15.3|220|5550|MALE
Adelie|Torgersen|37.3|20.5|199|3775|MALE
Adelie|Biscoe|39.6|17.7|186|3500|FEMALE
Gentoo|Biscoe|47.7|15|216|4750|FEMALE
Adelie|Dream|36.5|18|182|3150|FEMALE
Gentoo|Biscoe|42|13.5|210|4150|FEMALE
Adelie|Torgersen|42.1|19.1|195|4000|MALE
Gentoo|Biscoe|54.3|15.7|231|5650|MALE

☆ administrative commands

src/admin_commands.sql

.headers on
.mode markdown
select * from little_penguins;

out/admin_commands.out

| species |  island   | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g |  sex   |
|---------|-----------|----------------|---------------|-------------------|-------------|--------|
| Adelie  | Dream     | 37.2           | 18.1          | 178               | 3900        | MALE   |
| Adelie  | Dream     | 37.6           | 19.3          | 181               | 3300        | FEMALE |
| Gentoo  | Biscoe    | 50             | 15.3          | 220               | 5550        | MALE   |
| Adelie  | Torgersen | 37.3           | 20.5          | 199               | 3775        | MALE   |
| Adelie  | Biscoe    | 39.6           | 17.7          | 186               | 3500        | FEMALE |
| Gentoo  | Biscoe    | 47.7           | 15            | 216               | 4750        | FEMALE |
| Adelie  | Dream     | 36.5           | 18            | 182               | 3150        | FEMALE |
| Gentoo  | Biscoe    | 42             | 13.5          | 210               | 4150        | FEMALE |
| Adelie  | Torgersen | 42.1           | 19.1          | 195               | 4000        | MALE   |
| Gentoo  | Biscoe    | 54.3           | 15.7          | 231               | 5650        | MALE   |

3: specify columns

src/specify_columns.sql

select species, island, sex
from little_penguins;

out/specify_columns.out

| species |  island   |  sex   |
|---------|-----------|--------|
| Adelie  | Dream     | MALE   |
| Adelie  | Dream     | FEMALE |
| Gentoo  | Biscoe    | MALE   |
| Adelie  | Torgersen | MALE   |
| Adelie  | Biscoe    | FEMALE |
| Gentoo  | Biscoe    | FEMALE |
| Adelie  | Dream     | FEMALE |
| Gentoo  | Biscoe    | FEMALE |
| Adelie  | Torgersen | MALE   |
| Gentoo  | Biscoe    | MALE   |

4: sort

src/sort.sql

select species, sex, island
from little_penguins
order by island asc, sex desc;

out/sort.out

| species |  sex   |  island   |
|---------|--------|-----------|
| Gentoo  | MALE   | Biscoe    |
| Gentoo  | MALE   | Biscoe    |
| Adelie  | FEMALE | Biscoe    |
| Gentoo  | FEMALE | Biscoe    |
| Gentoo  | FEMALE | Biscoe    |
| Adelie  | MALE   | Dream     |
| Adelie  | FEMALE | Dream     |
| Adelie  | FEMALE | Dream     |
| Adelie  | MALE   | Torgersen |
| Adelie  | MALE   | Torgersen |

5: limit output

src/limit.sql

select species, sex, island
from penguins
order by species, sex, island
limit 10;

out/limit.out

| species |  sex   |  island   |
|---------|--------|-----------|
| Adelie  |        | Dream     |
| Adelie  |        | Torgersen |
| Adelie  |        | Torgersen |
| Adelie  |        | Torgersen |
| Adelie  |        | Torgersen |
| Adelie  |        | Torgersen |
| Adelie  | FEMALE | Biscoe    |
| Adelie  | FEMALE | Biscoe    |
| Adelie  | FEMALE | Biscoe    |
| Adelie  | FEMALE | Biscoe    |

6: page output

src/page.sql

select species, sex, island
from penguins
order by species, sex, island
limit 10 offset 3;

out/page.out

| species |  sex   |  island   |
|---------|--------|-----------|
| Adelie  |        | Torgersen |
| Adelie  |        | Torgersen |
| Adelie  |        | Torgersen |
| Adelie  | FEMALE | Biscoe    |
| Adelie  | FEMALE | Biscoe    |
| Adelie  | FEMALE | Biscoe    |
| Adelie  | FEMALE | Biscoe    |
| Adelie  | FEMALE | Biscoe    |
| Adelie  | FEMALE | Biscoe    |
| Adelie  | FEMALE | Biscoe    |

7: remove duplicates

src/distinct.sql

select distinct species, sex, island
from penguins;

out/distinct.out

|  species  |  sex   |  island   |
|-----------|--------|-----------|
| Adelie    | MALE   | Torgersen |
| Adelie    | FEMALE | Torgersen |
| Adelie    |        | Torgersen |
| Adelie    | FEMALE | Biscoe    |
| Adelie    | MALE   | Biscoe    |
| Adelie    | FEMALE | Dream     |
| Adelie    | MALE   | Dream     |
| Adelie    |        | Dream     |
| Chinstrap | FEMALE | Dream     |
| Chinstrap | MALE   | Dream     |
| Gentoo    | FEMALE | Biscoe    |
| Gentoo    | MALE   | Biscoe    |
| Gentoo    |        | Biscoe    |

8: filter results

src/filter.sql

select distinct species, sex, island
from penguins
where island = 'Biscoe';

out/filter.out

| species |  sex   | island |
|---------|--------|--------|
| Adelie  | FEMALE | Biscoe |
| Adelie  | MALE   | Biscoe |
| Gentoo  | FEMALE | Biscoe |
| Gentoo  | MALE   | Biscoe |
| Gentoo  |        | Biscoe |

9: filter with more complex conditions

src/filter_and.sql

select distinct species, sex, island
from penguins
where island = 'Biscoe' and sex != 'MALE';

out/filter_and.out

| species |  sex   | island |
|---------|--------|--------|
| Adelie  | FEMALE | Biscoe |
| Gentoo  | FEMALE | Biscoe |

10: do calculations

src/calculations.sql

select
    flipper_length_mm / 10.0,
    body_mass_g / 1000.0
from penguins
limit 3;

out/calculations.out

| flipper_length_mm / 10.0 | body_mass_g / 1000.0 |
|--------------------------|----------------------|
| 18.1                     | 3.75                 |
| 18.6                     | 3.8                  |
| 19.5                     | 3.25                 |

11: rename columns

src/rename_columns.sql

select
    flipper_length_mm / 10.0 as flipper_cm,
    body_mass_g / 1000.0 as weight_kg,
    island as where_found
from penguins
limit 3;

out/rename_columns.out

| flipper_cm | weight_kg | where_found |
|------------|-----------|-------------|
| 18.1       | 3.75      | Torgersen   |
| 18.6       | 3.8       | Torgersen   |
| 19.5       | 3.25      | Torgersen   |

☆ check your understanding

concept map: selection

12: calculate with missing values

src/show_missing_values.sql

select
    flipper_length_mm / 10.0 as flipper_cm,
    body_mass_g / 1000.0 as weight_kg,
    island as where_found
from penguins
limit 5;

out/show_missing_values.out

| flipper_cm | weight_kg | where_found |
|------------|-----------|-------------|
| 18.1       | 3.75      | Torgersen   |
| 18.6       | 3.8       | Torgersen   |
| 19.5       | 3.25      | Torgersen   |
|            |           | Torgersen   |
| 19.3       | 3.45      | Torgersen   |

13: null equality

src/filter.sql

select distinct species, sex, island
from penguins
where island = 'Biscoe';

out/filter.out

| species |  sex   | island |
|---------|--------|--------|
| Adelie  | FEMALE | Biscoe |
| Adelie  | MALE   | Biscoe |
| Gentoo  | FEMALE | Biscoe |
| Gentoo  | MALE   | Biscoe |
| Gentoo  |        | Biscoe |

src/null_equality.sql

select distinct species, sex, island
from penguins
where island = 'Biscoe' and sex = 'FEMALE';

out/null_equality.out

| species |  sex   | island |
|---------|--------|--------|
| Adelie  | FEMALE | Biscoe |
| Gentoo  | FEMALE | Biscoe |

14: null inequality

src/null_inequality.sql

select distinct species, sex, island
from penguins
where island = 'Biscoe' and sex != 'FEMALE';

out/null_inequality.out

| species | sex  | island |
|---------|------|--------|
| Adelie  | MALE | Biscoe |
| Gentoo  | MALE | Biscoe |

15: ternary logic

src/ternary_logic.sql

select null = null;

out/ternary_logic.out

| null = null |
|-------------|
|             |
equality
X Y null
X true false null
Y false true null
null null null null

16: handle null safely

src/safe_null_equality.sql

select species, sex, island
from penguins
where sex is null;

out/safe_null_equality.out

| species | sex |  island   |
|---------|-----|-----------|
| Adelie  |     | Torgersen |
| Adelie  |     | Torgersen |
| Adelie  |     | Torgersen |
| Adelie  |     | Torgersen |
| Adelie  |     | Torgersen |
| Adelie  |     | Dream     |
| Gentoo  |     | Biscoe    |
| Gentoo  |     | Biscoe    |
| Gentoo  |     | Biscoe    |
| Gentoo  |     | Biscoe    |
| Gentoo  |     | Biscoe    |

☆ check your understanding

concept map: null

17: aggregate

src/simple_sum.sql

select sum(body_mass_g) as total_mass
from penguins;

out/simple_sum.out

| total_mass |
|------------|
| 1437000    |

18: common aggregation functions

src/common_aggregations.sql

select
    max(bill_length_mm) as longest_bill,
    min(flipper_length_mm) as shortest_flipper,
    avg(bill_length_mm) / avg(bill_depth_mm) as weird_ratio
from penguins;

out/common_aggregations.out

| longest_bill | shortest_flipper |   weird_ratio    |
|--------------|------------------|------------------|
| 59.6         | 172              | 2.56087082530644 |

19: counting

src/count_behavior.sql

select
    count(*) as count_star,
    count(sex) as count_specific,
    count(distinct sex) as count_distinct
from penguins;

out/count_behavior.out

| count_star | count_specific | count_distinct |
|------------|----------------|----------------|
| 344        | 333            | 2              |

20: group

src/simple_group.sql

select
    avg(body_mass_g) as average_mass_g
from penguins
group by sex;

out/simple_group.out

|  average_mass_g  |
|------------------|
| 4005.55555555556 |
| 3862.27272727273 |
| 4545.68452380952 |

21: behavior of unaggregated columns

src/unaggregated_columns.sql

select
    sex,
    avg(body_mass_g) as average_mass_g
from penguins
group by sex;

out/unaggregated_columns.out

|  sex   |  average_mass_g  |
|--------|------------------|
|        | 4005.55555555556 |
| FEMALE | 3862.27272727273 |
| MALE   | 4545.68452380952 |

22: arbitrary choice in aggregation

src/arbitrary_in_aggregation.sql

select
    sex,
    body_mass_g                   
from penguins
group by sex;

out/arbitrary_in_aggregation.out

|  sex   | body_mass_g |
|--------|-------------|
|        |             |
| FEMALE | 3800        |
| MALE   | 3750        |

23: filter aggregated values

src/filter_aggregation.sql

select
    sex,
    avg(body_mass_g) as average_mass_g
from penguins
group by sex
having average_mass_g > 4000.0;

out/filter_aggregation.out

| sex  |  average_mass_g  |
|------|------------------|
|      | 4005.55555555556 |
| MALE | 4545.68452380952 |

24: readable output

src/readable_aggregation.sql

select
    sex,
    round(avg(body_mass_g), 1) as average_mass_g
from penguins
group by sex
having average_mass_g > 4000.0;

out/readable_aggregation.out

| sex  | average_mass_g |
|------|----------------|
|      | 4005.6         |
| MALE | 4545.7         |

25: filter aggregate inputs

src/filter_aggregate_inputs.sql

select
    sex,
    round(
        avg(body_mass_g) filter (where body_mass_g < 4000.0),
        1
    ) as average_mass_g
from penguins
group by sex;

out/filter_aggregate_inputs.out

|  sex   | average_mass_g |
|--------|----------------|
|        | 3362.5         |
| FEMALE | 3417.3         |
| MALE   | 3752.5         |

☆ check your understanding

concept map: aggregation

☆ create in-memory database

src/in_memory_db.sh

sqlite3 :memory:

26: create tables

src/create_work_job.sql

create table job (
    name text not null,
    billable real not null
);
create table work (
    person text not null,
    job text not null
);

27: insert data

src/populate_work_job.sql

insert into job values
    ('calibrate', 1.5),
    ('clean', 0.5);
insert into work values
    ('mik', 'calibrate'),
    ('mik', 'clean'),
    ('mik', 'complain'),
    ('po', 'clean'),
    ('po', 'complain'),
    ('tay', 'complain');

out/insert_values.out

|   name    | billable |
|-----------|----------|
| calibrate | 1.5      |
| clean     | 0.5      |

| person |    job    |
|--------|-----------|
| mik    | calibrate |
| mik    | clean     |
| mik    | complain  |
| po     | clean     |
| po     | complain  |
| tay    | complain  |

28: update rows

src/update_work_job.sql

update work
set person = "tae"
where person = "tay";

out/update_rows.out

| person |    job    |
|--------|-----------|
| mik    | calibrate |
| mik    | clean     |
| mik    | complain  |
| po     | clean     |
| po     | complain  |
| tae    | complain  |

29: delete rows

src/delete_rows.sql

delete from work
where person = "tae";

select * from work;

out/delete_rows.out

| person |    job    |
|--------|-----------|
| mik    | calibrate |
| mik    | clean     |
| mik    | complain  |
| po     | clean     |
| po     | complain  |

30: backing up

src/backing_up.sql

create table backup (
    person text not null,
    job text not null
);

insert into backup
select person, job
from work
where person = 'tae';

delete from work
where person = 'tae';

select * from backup;

out/backing_up.out

| person |   job    |
|--------|----------|
| tae    | complain |

☆ check your understanding

concept map: data definition and modification

31: join tables

src/cross_join.sql

select *
from work cross join job;

out/cross_join.out

| person |    job    |   name    | billable |
|--------|-----------|-----------|----------|
| mik    | calibrate | calibrate | 1.5      |
| mik    | calibrate | clean     | 0.5      |
| mik    | clean     | calibrate | 1.5      |
| mik    | clean     | clean     | 0.5      |
| mik    | complain  | calibrate | 1.5      |
| mik    | complain  | clean     | 0.5      |
| po     | clean     | calibrate | 1.5      |
| po     | clean     | clean     | 0.5      |
| po     | complain  | calibrate | 1.5      |
| po     | complain  | clean     | 0.5      |
| tay    | complain  | calibrate | 1.5      |
| tay    | complain  | clean     | 0.5      |

32: inner join

src/inner_join.sql

select *
from work inner join job
on work.job = job.name;

out/inner_join.out

| person |    job    |   name    | billable |
|--------|-----------|-----------|----------|
| mik    | calibrate | calibrate | 1.5      |
| mik    | clean     | clean     | 0.5      |
| po     | clean     | clean     | 0.5      |

33: aggregate joined data

src/aggregate_join.sql

select
    work.person,
    sum(job.billable) as pay
from work inner join job
on work.job = job.name
group by work.person;

out/aggregate_join.out

| person | pay |
|--------|-----|
| mik    | 2.0 |
| po     | 0.5 |

34: left join

src/left_join.sql

select *
from work left join job
on work.job = job.name;

out/left_join.out

| person |    job    |   name    | billable |
|--------|-----------|-----------|----------|
| mik    | calibrate | calibrate | 1.5      |
| mik    | clean     | clean     | 0.5      |
| mik    | complain  |           |          |
| po     | clean     | clean     | 0.5      |
| po     | complain  |           |          |
| tay    | complain  |           |          |

35: aggregate left joins

src/aggregate_left_join.sql

select
    work.person,
    sum(job.billable) as pay
from work left join job
on work.job = job.name
group by work.person;

out/aggregate_left_join.out

| person | pay |
|--------|-----|
| mik    | 2.0 |
| po     | 0.5 |
| tay    |     |

☆ check your understanding

concept map: join

36: coalesce values

src/coalesce.sql

select
    work.person,
    coalesce(sum(job.billable), 0.0) as pay
from work left join job
on work.job = job.name
group by work.person;

out/coalesce.out

| person | pay |
|--------|-----|
| mik    | 2.0 |
| po     | 0.5 |
| tay    | 0.0 |

37: negate incorrectly

src/negate_incorrectly.sql

select distinct person
from work
where job != 'calibrate';

out/negate_incorrectly.out

| person |
|--------|
| mik    |
| po     |
| tay    |

38: set membership

src/set_membership.sql

select *
from work
where person not in ('mik', 'tay');

out/set_membership.out

| person |   job    |
|--------|----------|
| po     | clean    |
| po     | complain |

39: subqueries

src/subquery_set.sql

select distinct person
from work
where person not in (
    select distinct person
    from work
    where job = 'calibrate'
);

out/subquery_set.out

| person |
|--------|
| po     |
| tay    |

☆ M to N relationships

40: autoincrement and primary key

src/autoincrement.sql

create table person (
    ident integer primary key autoincrement,
    name text not null
);
insert into person values
    (null, 'mik'),
    (null, 'po'),
    (null, 'tay');
select * from person;
insert into person values (1, "prevented");

out/autoincrement.out

| ident | name |
|-------|------|
| 1     | mik  |
| 2     | po   |
| 3     | tay  |
Runtime error near line 12: UNIQUE constraint failed: person.ident (19)

☆ internal tables

src/sequence_table.sql

select * from sqlite_sequence;

out/sequence_table.out

|  name  | seq |
|--------|-----|
| person | 3   |

41: alter tables

src/alter_tables.sql

alter table job
add ident integer not null default -1;

update job
set ident = 1
where name = 'calibrate';

update job
set ident = 2
where name = 'clean';

select * from job;

out/alter_tables.out

|   name    | billable | ident |
|-----------|----------|-------|
| calibrate | 1.5      | 1     |
| clean     | 0.5      | 2     |

42: create new tables from old

src/insert_select.sql

create table new_work (
    person_id integer not null,
    job_id integer not null,
    foreign key(person_id) references person(ident),
    foreign key(job_id) references job(ident)
);

insert into new_work
select
    person.ident as person_id,
    job.ident as job_id
from
    (person join work on person.name = work.person)
    join job on job.name = work.job;
select * from new_work;

out/insert_select.out

| person_id | job_id |
|-----------|--------|
| 1         | 1      |
| 1         | 2      |
| 2         | 2      |

43: remove tables

src/drop_table.sql

drop table work;
alter table new_work rename to work;

out/drop_table.out

CREATE TABLE job (
    ident integer primary key autoincrement,
    name text not null,
    billable real not null
);
CREATE TABLE sqlite_sequence(name,seq);
CREATE TABLE person (
    ident integer primary key autoincrement,
    name text not null
);
CREATE TABLE IF NOT EXISTS "work" (
    person_id integer not null,
    job_id integer not null,
    foreign key(person_id) references person(ident),
    foreign key(job_id) references job(ident)
);

44: compare individual values to aggregates

src/compare_individual_aggregate.sql

select body_mass_g
from penguins
where body_mass_g > (
    select avg(body_mass_g)
    from penguins
)
limit 5;

out/compare_individual_aggregate.out

| body_mass_g |
|-------------|
| 4675        |
| 4250        |
| 4400        |
| 4500        |
| 4650        |

45: compare individual values to aggregates within groups

src/compare_within_groups.sql

select
    penguins.species,
    penguins.body_mass_g,
    round(averaged.avg_mass_g, 1) as avg_mass_g
from penguins join (
    select species, avg(body_mass_g) as avg_mass_g
    from penguins
    group by species
) as averaged
on penguins.species = averaged.species
where penguins.body_mass_g > averaged.avg_mass_g
limit 5;

out/compare_within_groups.out

| species | body_mass_g | avg_mass_g |
|---------|-------------|------------|
| Adelie  | 3750        | 3700.7     |
| Adelie  | 3800        | 3700.7     |
| Adelie  | 4675        | 3700.7     |
| Adelie  | 4250        | 3700.7     |
| Adelie  | 3800        | 3700.7     |

46: common table expressions

src/common_table_expressions.sql

with grouped as (
    select species, avg(body_mass_g) as avg_mass_g
    from penguins
    group by species
)
select
    penguins.species,
    penguins.body_mass_g,
    round(grouped.avg_mass_g, 1) as avg_mass_g
from penguins join grouped
where penguins.body_mass_g > grouped.avg_mass_g
limit 5;

out/common_table_expressions.out

| species | body_mass_g | avg_mass_g |
|---------|-------------|------------|
| Adelie  | 3750        | 3700.7     |
| Adelie  | 3800        | 3700.7     |
| Adelie  | 4675        | 3700.7     |
| Adelie  | 4250        | 3700.7     |
| Adelie  | 3800        | 3700.7     |

☆ explain query plan

src/explain_query_plan.sql

explain query plan
select
    species,
    avg(body_mass_g)
from penguins
group by species;

out/explain_query_plan.out

QUERY PLAN
|--SCAN penguins
`--USE TEMP B-TREE FOR GROUP BY

47: enumerate rows

src/rowid.sql

select rowid, species, island
from penguins
limit 5;

out/rowid.out

| rowid | species |  island   |
|-------|---------|-----------|
| 1     | Adelie  | Torgersen |
| 2     | Adelie  | Torgersen |
| 3     | Adelie  | Torgersen |
| 4     | Adelie  | Torgersen |
| 5     | Adelie  | Torgersen |

48: if-else function

src/if_else.sql

with sized_penguins as (
    select
        species,
        iif(
            body_mass_g < 3500,
            'small',
            'large'
        ) as size
    from penguins
)
select species, size, count(*) as num
from sized_penguins
group by species, size
order by species, num;

out/if_else.out

|  species  | size  | num |
|-----------|-------|-----|
| Adelie    | small | 54  |
| Adelie    | large | 98  |
| Chinstrap | small | 17  |
| Chinstrap | large | 51  |
| Gentoo    | large | 124 |

49: select a case

src/case_when.sql

with sized_penguins as (
    select
        species,
        case
            when body_mass_g < 3500 then 'small'
            when body_mass_g < 5000 then 'medium'
            else 'large'
        end as size
    from penguins
)
select species, size, count(*) as num
from sized_penguins
group by species, size
order by species, num;

out/case_when.out

|  species  |  size  | num |
|-----------|--------|-----|
| Adelie    | large  | 1   |
| Adelie    | small  | 54  |
| Adelie    | medium | 97  |
| Chinstrap | small  | 17  |
| Chinstrap | medium | 51  |
| Gentoo    | medium | 56  |
| Gentoo    | large  | 68  |

50: check range

src/check_range.sql

with sized_penguins as (
    select
        species,
        case
            when body_mass_g between 3500 and 5000 then 'normal'
            else 'abnormal'
        end as size
    from penguins
)
select species, size, count(*) as num
from sized_penguins
group by species, size
order by species, num;

out/check_range.out

|  species  |   size   | num |
|-----------|----------|-----|
| Adelie    | abnormal | 55  |
| Adelie    | normal   | 97  |
| Chinstrap | abnormal | 17  |
| Chinstrap | normal   | 51  |
| Gentoo    | abnormal | 62  |
| Gentoo    | normal   | 62  |

☆ yet another database

assay database table diagram

assay ER diagram

src/assay_staff.sql

select * from staff;

out/assay_staff.out

| ident | personal |  family   | dept | age |
|-------|----------|-----------|------|-----|
| 1     | Kartik   | Gupta     |      | 46  |
| 2     | Divit    | Dhaliwal  | hist | 34  |
| 3     | Indrans  | Sridhar   | mb   | 47  |
| 4     | Pranay   | Khanna    | mb   | 51  |
| 5     | Riaan    | Dua       |      | 23  |
| 6     | Vedika   | Rout      | hist | 45  |
| 7     | Abram    | Chokshi   | gen  | 23  |
| 8     | Romil    | Kapoor    | hist | 38  |
| 9     | Ishaan   | Ramaswamy | mb   | 35  |
| 10    | Nitya    | Lal       | gen  | 52  |

51: pattern matching

src/like_glob.sql

select personal, family from staff
where personal like '%ya%' or family glob '*De*';

out/like_glob.out

| personal | family |
|----------|--------|
| Nitya    | Lal    |
name purpose
substr Get substring given starting point and length
trim Remove characters from beginning and end of string
ltrim Remove characters from beginning of string
rtrim Remove characters from end of string
length Length of string
replace Replace occurrences of substring with another string
upper Return upper-case version of string
lower Return lower-case version of string
instr Find location of first occurrence of substring (returns 0 if not found)

52: select first and last rows

src/union_all.sql

select * from (
    select * from (select * from experiment order by started asc limit 5)
    union all
    select * from (select * from experiment order by started desc limit 5)
)
order by started asc
;

out/union_all.out

| ident |    kind     |  started   |   ended    |
|-------|-------------|------------|------------|
| 17    | trial       | 2023-01-29 | 2023-01-30 |
| 35    | calibration | 2023-01-30 | 2023-01-30 |
| 36    | trial       | 2023-02-02 | 2023-02-03 |
| 25    | trial       | 2023-02-12 | 2023-02-14 |
| 2     | calibration | 2023-02-14 | 2023-02-14 |
| 40    | calibration | 2024-01-21 | 2024-01-21 |
| 12    | trial       | 2024-01-26 | 2024-01-28 |
| 44    | trial       | 2024-01-27 | 2024-01-29 |
| 34    | trial       | 2024-02-01 | 2024-02-02 |
| 14    | calibration | 2024-02-03 | 2024-02-03 |

53: intersection

src/intersect.sql

select personal, family, dept, age
from staff
where dept = 'mb'
intersect
    select personal, family, dept, age from staff
    where age < 50
;

out/intersect.out

| personal |  family   | dept | age |
|----------|-----------|------|-----|
| Indrans  | Sridhar   | mb   | 47  |
| Ishaan   | Ramaswamy | mb   | 35  |

54: exclusion

src/except.sql

select personal, family, dept, age
from staff
where dept = 'mb'
except
    select personal, family, dept, age from staff
    where age < 50
;

out/except.out

| personal | family | dept | age |
|----------|--------|------|-----|
| Pranay   | Khanna | mb   | 51  |

55: random numbers and why not

src/random_numbers.sql

with decorated as (
    select random() as rand,
    personal || ' ' || family as name
    from staff
)
select rand, abs(rand) % 10 as selector, name
from decorated
where selector < 5;

out/random_numbers.out

|         rand         | selector |      name       |
|----------------------|----------|-----------------|
| 7176652035743196310  | 0        | Divit Dhaliwal  |
| -2243654635505630380 | 2        | Indrans Sridhar |
| -6940074802089166303 | 5        | Pranay Khanna   |
| 8882650891091088193  | 9        | Riaan Dua       |
| -45079732302991538   | 5        | Vedika Rout     |
| -8973877087806386134 | 2        | Abram Chokshi   |
| 3360598450426870356  | 9        | Romil Kapoor    |

56: creating index

src/create_use_index.sql

explain query plan
select filename
from plate
where filename like '%07%';

create index plate_file on plate(filename);

explain query plan
select filename
from plate
where filename like '%07%';

out/create_use_index.out

QUERY PLAN
`--SCAN plate USING COVERING INDEX sqlite_autoindex_plate_1
QUERY PLAN
`--SCAN plate USING COVERING INDEX plate_file

57: generate sequence

src/generate_sequence.sql

select value from generate_series(1, 5);

out/generate_sequence.out

| value |
|-------|
| 1     |
| 2     |
| 3     |
| 4     |
| 5     |

58: generate sequence based on data

src/data_range_sequence.sql

create table temp (
    num integer not null
);
insert into temp values (1), (5);
select value from generate_series (
    (select min(num) from temp),
    (select max(num) from temp)
);

out/data_range_sequence.out

| value |
|-------|
| 1     |
| 2     |
| 3     |
| 4     |
| 5     |

59: generate sequence of dates

src/date_sequence.sql

select
    date((select julianday(min(started)) from experiment) + value) as some_day
from (
    select value from generate_series(
        (select 0),
        (select count(*) - 1 from experiment)
    )
)
limit 5;

out/date_sequence.out

|  some_day  |
|------------|
| 2023-01-29 |
| 2023-01-30 |
| 2023-01-31 |
| 2023-02-01 |
| 2023-02-02 |

60: count experiments started per day without gaps

src/experiments_per_day.sql

with
-- complete sequence of days with 0 as placeholder for number of experiments
all_days as (
    select
        date((select julianday(min(started)) from experiment) + value) as some_day,
        0 as zeroes
    from (
        select value from generate_series(
            (select 0),
            (select count(*) - 1 from experiment)
        )
    )
),
-- sequence of actual days with actual number of experiments started
actual_days as (
    select started, count(started) as num_exp
    from experiment
    group by started
)
-- combined by joining on day and taking actual number (if available) or zero
select
    all_days.some_day as day,
    coalesce(actual_days.num_exp, all_days.zeroes) as num_exp
from
    all_days left join actual_days on all_days.some_day = actual_days.started
limit 5
;

out/experiments_per_day.out

|    day     | num_exp |
|------------|---------|
| 2023-01-29 | 1       |
| 2023-01-30 | 1       |
| 2023-01-31 | 0       |
| 2023-02-01 | 0       |
| 2023-02-02 | 1       |

61: self join

src/self_join.sql

with person as (
    select
        ident,
        personal || ' ' || family as name
    from staff
)
select left.name, right.name
from person as left join person as right
limit 10;

out/self_join.out

|     name     |       name       |
|--------------|------------------|
| Kartik Gupta | Kartik Gupta     |
| Kartik Gupta | Divit Dhaliwal   |
| Kartik Gupta | Indrans Sridhar  |
| Kartik Gupta | Pranay Khanna    |
| Kartik Gupta | Riaan Dua        |
| Kartik Gupta | Vedika Rout      |
| Kartik Gupta | Abram Chokshi    |
| Kartik Gupta | Romil Kapoor     |
| Kartik Gupta | Ishaan Ramaswamy |
| Kartik Gupta | Nitya Lal        |

62: generate unique pairs

src/unique_pairs.sql

with person as (
    select
        ident,
        personal || ' ' || family as name
    from staff
)
select left.name, right.name
from person as left join person as right
on left.ident < right.ident
where left.ident <= 4 and right.ident <= 4;

out/unique_pairs.out

|      name       |      name       |
|-----------------|-----------------|
| Kartik Gupta    | Divit Dhaliwal  |
| Kartik Gupta    | Indrans Sridhar |
| Kartik Gupta    | Pranay Khanna   |
| Divit Dhaliwal  | Indrans Sridhar |
| Divit Dhaliwal  | Pranay Khanna   |
| Indrans Sridhar | Pranay Khanna   |

63: filter pairs

src/filter_pairs.sql

with
person as (
    select
        ident,
        personal || ' ' || family as name
    from staff
),
together as (
    select
        left.staff as left_staff,
        right.staff as right_staff
    from performed as left join performed as right
    on left.experiment = right.experiment
    where left_staff < right_staff
)
select
    left.name as person_1,
    right.name as person_2
from person as left join person as right join together
on left.ident = left_staff and right.ident = right_staff;

out/filter_pairs.out

|    person_1     |     person_2     |
|-----------------|------------------|
| Kartik Gupta    | Vedika Rout      |
| Pranay Khanna   | Vedika Rout      |
| Indrans Sridhar | Romil Kapoor     |
| Abram Chokshi   | Ishaan Ramaswamy |
| Pranay Khanna   | Vedika Rout      |
| Kartik Gupta    | Abram Chokshi    |
| Abram Chokshi   | Romil Kapoor     |
| Kartik Gupta    | Divit Dhaliwal   |
| Divit Dhaliwal  | Abram Chokshi    |
| Pranay Khanna   | Ishaan Ramaswamy |
| Indrans Sridhar | Romil Kapoor     |
| Kartik Gupta    | Ishaan Ramaswamy |
| Kartik Gupta    | Nitya Lal        |
| Kartik Gupta    | Abram Chokshi    |
| Pranay Khanna   | Romil Kapoor     |

64: existence and correlated subqueries

src/correlated_subquery.sql

select name, building
from department
where exists (
    select 1
    from staff
    where dept = department.ident
)
order by name;

out/correlated_subquery.out

|       name        |     building     |
|-------------------|------------------|
| Genetics          | Chesson          |
| Histology         | Fashet Extension |
| Molecular Biology | Chesson          |

65: nonexistence

src/nonexistence.sql

select name, building
from department
where not exists (
    select 1
    from staff
    where dept = department.ident
)
order by name;

out/nonexistence.out

|     name      | building |
|---------------|----------|
| Endocrinology | TGVH     |

☆ avoiding correlated subqueries

src/avoid_correlated_subqueries.sql

select distinct
    department.name as name,
    department.building as building
from department join staff
on department.ident = staff.dept
order by name;

out/avoid_correlated_subqueries.out

|       name        |     building     |
|-------------------|------------------|
| Genetics          | Chesson          |
| Histology         | Fashet Extension |
| Molecular Biology | Chesson          |

66: lead and lag

src/lead_lag.sql

with ym_num as (
    select
        strftime('%Y-%m', started) as ym,
        count(*) as num
    from experiment
    group by ym
)
select
    ym,
    lag(num) over (order by ym) as prev_num,
    num,
    lead(num) over (order by ym) as next_num
from ym_num
order by ym;

out/lead_lag.out

|   ym    | prev_num | num | next_num |
|---------|----------|-----|----------|
| 2023-01 |          | 2   | 5        |
| 2023-02 | 2        | 5   | 5        |
| 2023-03 | 5        | 5   | 1        |
| 2023-04 | 5        | 1   | 6        |
| 2023-05 | 1        | 6   | 5        |
| 2023-06 | 6        | 5   | 3        |
| 2023-07 | 5        | 3   | 2        |
| 2023-08 | 3        | 2   | 4        |
| 2023-09 | 2        | 4   | 6        |
| 2023-10 | 4        | 6   | 4        |
| 2023-12 | 6        | 4   | 5        |
| 2024-01 | 4        | 5   | 2        |
| 2024-02 | 5        | 2   |          |

67: window functions

src/window_functions.sql

with ym_num as (
    select
        strftime('%Y-%m', started) as ym,
        count(*) as num
    from experiment
    group by ym
)
select
    ym,
    num,
    sum(num) over (order by ym) as num_done,
    cume_dist() over (order by ym) as progress
from ym_num
order by ym;

out/window_functions.out

|   ym    | num | num_done |      progress      |
|---------|-----|----------|--------------------|
| 2023-01 | 2   | 2        | 0.0769230769230769 |
| 2023-02 | 5   | 7        | 0.153846153846154  |
| 2023-03 | 5   | 12       | 0.230769230769231  |
| 2023-04 | 1   | 13       | 0.307692307692308  |
| 2023-05 | 6   | 19       | 0.384615384615385  |
| 2023-06 | 5   | 24       | 0.461538461538462  |
| 2023-07 | 3   | 27       | 0.538461538461538  |
| 2023-08 | 2   | 29       | 0.615384615384615  |
| 2023-09 | 4   | 33       | 0.692307692307692  |
| 2023-10 | 6   | 39       | 0.769230769230769  |
| 2023-12 | 4   | 43       | 0.846153846153846  |
| 2024-01 | 5   | 48       | 0.923076923076923  |
| 2024-02 | 2   | 50       | 1.0                |

☆ explain another query plain

src/explain_window_function.sql

explain query plan
with ym_num as (
    select
        strftime('%Y-%m', started) as ym,
        count(*) as num
    from experiment
    group by ym
)
select
    ym,
    num,
    sum(num) over (order by ym) as num_done,
    cume_dist() over (order by ym) as progress
from ym_num
order by ym;

out/explain_window_function.out

QUERY PLAN
|--CO-ROUTINE (subquery-3)
|  |--CO-ROUTINE (subquery-4)
|  |  |--CO-ROUTINE ym_num
|  |  |  |--SCAN experiment
|  |  |  `--USE TEMP B-TREE FOR GROUP BY
|  |  |--SCAN ym_num
|  |  `--USE TEMP B-TREE FOR ORDER BY
|  `--SCAN (subquery-4)
`--SCAN (subquery-3)

68: partitioned windows

src/partition_window.sql

with y_m_num as (
    select
        strftime('%Y', started) as year,
        strftime('%m', started) as month,
        count(*) as num
    from experiment
    group by year, month
)
select
    year,
    month,
    num,
    sum(num) over (partition by year order by month) as num_done
from y_m_num
order by year, month;

out/partition_window.out

| year | month | num | num_done |
|------|-------|-----|----------|
| 2023 | 01    | 2   | 2        |
| 2023 | 02    | 5   | 7        |
| 2023 | 03    | 5   | 12       |
| 2023 | 04    | 1   | 13       |
| 2023 | 05    | 6   | 19       |
| 2023 | 06    | 5   | 24       |
| 2023 | 07    | 3   | 27       |
| 2023 | 08    | 2   | 29       |
| 2023 | 09    | 4   | 33       |
| 2023 | 10    | 6   | 39       |
| 2023 | 12    | 4   | 43       |
| 2024 | 01    | 5   | 5        |
| 2024 | 02    | 2   | 7        |

69: blobs

src/blob.sql

create table images (
    name text not null,
    content blob
);

insert into images(name, content) values
    ("biohazard", readfile("img/biohazard.png")),
    ("crush", readfile("img/crush.png")),
    ("fire", readfile("img/fire.png")),
    ("radioactive", readfile("img/radioactive.png")),
    ("tripping", readfile("img/tripping.png"));

select name, length(content) from images;

out/blob.out

|    name     | length(content) |
|-------------|-----------------|
| biohazard   | 19629           |
| crush       | 15967           |
| fire        | 18699           |
| radioactive | 16661           |
| tripping    | 17208           |

☆ yet another database

src/lab_log_db.sh

sqlite3 data/lab_log.db

src/lab_log_schema.sql

.schema

out/lab_log_schema.out

CREATE TABLE sqlite_sequence(name,seq);
CREATE TABLE person(
       ident            integer primary key autoincrement,
       details          text not null
);
CREATE TABLE machine(
       ident            integer primary key autoincrement,
       name             text not null,
       details          text not null
);
CREATE TABLE usage(
       ident            integer primary key autoincrement,
       log              text not null
);

70: store JSON

src/json_in_table.sql

select * from machine;

out/json_in_table.out

| ident |      name      |                         details                         |
|-------|----------------|---------------------------------------------------------|
| 1     | WY401          | {"acquired": "2023-05-01"}                              |
| 2     | Inphormex      | {"acquired": "2021-07-15", "refurbished": "2023-10-22"} |
| 3     | AutoPlate 9000 | {"note": "needs software update"}                       |

71: select field from JSON

src/json_field.sql

select
    details->'$.acquired' as single_arrow,
    details->>'$.acquired' as double_arrow
from machine;

out/json_field.out

| single_arrow | double_arrow |
|--------------|--------------|
| "2023-05-01" | 2023-05-01   |
| "2021-07-15" | 2021-07-15   |
|              |              |

72: JSON array access

src/json_array.sql

select
    ident,
    json_array_length(log->'$') as length,
    log->'$[0]' as first
from usage;

out/json_array.out

| ident | length |                            first                             |
|-------|--------|--------------------------------------------------------------|
| 1     | 4      | {"machine":"Inphormex","person":["Gabrielle","Dub\u00e9"]}   |
| 2     | 5      | {"machine":"Inphormex","person":["Marianne","Richer"]}       |
| 3     | 2      | {"machine":"sterilizer","person":["Josette","Villeneuve"]}   |
| 4     | 1      | {"machine":"sterilizer","person":["Maude","Goulet"]}         |
| 5     | 2      | {"machine":"AutoPlate 9000","person":["Brigitte","Michaud"]} |
| 6     | 1      | {"machine":"sterilizer","person":["Marianne","Richer"]}      |
| 7     | 3      | {"machine":"WY401","person":["Maude","Goulet"]}              |
| 8     | 1      | {"machine":"AutoPlate 9000"}                                 |

73: unpack JSON array

src/json_unpack.sql

select
    ident,
    json_each.key as key,
    json_each.value as value
from usage, json_each(usage.log)
limit 10;

out/json_unpack.out

| ident | key |                            value                             |
|-------|-----|--------------------------------------------------------------|
| 1     | 0   | {"machine":"Inphormex","person":["Gabrielle","Dub\u00e9"]}   |
| 1     | 1   | {"machine":"Inphormex","person":["Gabrielle","Dub\u00e9"]}   |
| 1     | 2   | {"machine":"WY401","person":["Gabrielle","Dub\u00e9"]}       |
| 1     | 3   | {"machine":"Inphormex","person":["Gabrielle","Dub\u00e9"]}   |
| 2     | 0   | {"machine":"Inphormex","person":["Marianne","Richer"]}       |
| 2     | 1   | {"machine":"AutoPlate 9000","person":["Marianne","Richer"]}  |
| 2     | 2   | {"machine":"sterilizer","person":["Marianne","Richer"]}      |
| 2     | 3   | {"machine":"AutoPlate 9000","person":["Monique","Marcotte"]} |
| 2     | 4   | {"machine":"sterilizer","person":["Marianne","Richer"]}      |
| 3     | 0   | {"machine":"sterilizer","person":["Josette","Villeneuve"]}   |

74: last element of array

src/json_array_last.sql

select
    ident,
    log->'$[#-1].machine' as final
from usage
limit 5;

out/json_array_last.out

| ident |    final     |
|-------|--------------|
| 1     | "Inphormex"  |
| 2     | "sterilizer" |
| 3     | "Inphormex"  |
| 4     | "sterilizer" |
| 5     | "sterilizer" |

75: modify JSON

src/json_modify.sql

select
    ident,
    name,
    json_set(details, '$.sold', json_quote('2024-01-25')) as updated
from machine;

out/json_modify.out

| ident |      name      |                           updated                            |
|-------|----------------|--------------------------------------------------------------|
| 1     | WY401          | {"acquired":"2023-05-01","sold":"2024-01-25"}                |
| 2     | Inphormex      | {"acquired":"2021-07-15","refurbished":"2023-10-22","sold":" |
|       |                | 2024-01-25"}                                                 |
| 3     | AutoPlate 9000 | {"note":"needs software update","sold":"2024-01-25"}         |

☆ refresh penguins

src/count_penguins.sql

select species, count(*) as num
from penguins
group by species;

out/count_penguins.out

|  species  | num |
|-----------|-----|
| Adelie    | 152 |
| Chinstrap | 68  |
| Gentoo    | 124 |

76: tombstones

src/make_active.sql

alter table penguins
add active integer not null default 1;

update penguins
set active = iif(species = 'Adelie', 0, 1);

src/active_penguins.sql

select species, count(*) as num
from penguins
where active
group by species;

out/active_penguins.out

|  species  | num |
|-----------|-----|
| Chinstrap | 68  |
| Gentoo    | 124 |

77: views

src/views.sql

create view if not exists
active_penguins (
    species,
    island,
    bill_length_mm,
    bill_depth_mm,
    flipper_length_mm,
    body_mass_g,
    sex
) as
select
    species,
    island,
    bill_length_mm,
    bill_depth_mm,
    flipper_length_mm,
    body_mass_g,
    sex
from penguins
where active;

select species, count(*) as num
from active_penguins
group by species;

out/views.out

|  species  | num |
|-----------|-----|
| Chinstrap | 68  |
| Gentoo    | 124 |

☆ hours reminder

src/all_jobs.sql

create table job (
    name text not null,
    billable real not null
);
insert into job values
    ('calibrate', 1.5),
    ('clean', 0.5);
select * from job;

out/all_jobs.out

|   name    | billable |
|-----------|----------|
| calibrate | 1.5      |
| clean     | 0.5      |

78: add check

src/all_jobs_check.sql

create table job (
    name text not null,
    billable real not null,
    check (billable > 0.0)
);
insert into job values ('calibrate', 1.5);
insert into job values ('reset', -0.5);
select * from job;

out/all_jobs_check.out

Runtime error near line 9: CHECK constraint failed: billable > 0.0 (19)
|   name    | billable |
|-----------|----------|
| calibrate | 1.5      |

☆ ACID

79: transactions

src/transaction.sql

create table job (
    name text not null,
    billable real not null,
    check (billable > 0.0)
);

insert into job values ('calibrate', 1.5);

begin transaction;
insert into job values ('clean', 0.5);
rollback;

select * from job;

out/transaction.out

|   name    | billable |
|-----------|----------|
| calibrate | 1.5      |

80: rollback in constraint

src/rollback_constraint.sql

create table job (
    name text not null,
    billable real not null,
    check (billable > 0.0) on conflict rollback
);

insert into job values
    ('calibrate', 1.5);
insert into job values
    ('clean', 0.5),
    ('reset', -0.5);

select * from job;

out/rollback_constraint.out

Runtime error near line 11: CHECK constraint failed: billable > 0.0 (19)
|   name    | billable |
|-----------|----------|
| calibrate | 1.5      |

81: rollback in statement

src/rollback_statement.sql

create table job (
    name text not null,
    billable real not null,
    check (billable > 0.0)
);

insert or rollback into job values
    ('calibrate', 1.5);
insert or rollback into job values
    ('clean', 0.5),
    ('reset', -0.5);

select * from job;

out/rollback_statement.out

Runtime error near line 11: CHECK constraint failed: billable > 0.0 (19)
|   name    | billable |
|-----------|----------|
| calibrate | 1.5      |

82: upsert

src/upsert.sql

create table jobs_done (
    person text unique,
    num integer default 0
);

insert into jobs_done values("zia", 1);
.print "after first"
select * from jobs_done;
.print


insert into jobs_done values("zia", 1);
.print "after failed"
select * from jobs_done;

insert into jobs_done values("zia", 1)
    on conflict(person) do update set num = num + 1;
.print "\nafter upsert"
select * from jobs_done;

out/upsert.out

after first
| person | num |
|--------|-----|
| zia    | 1   |

Runtime error near line 14: UNIQUE constraint failed: jobs_done.person (19)
after failed
| person | num |
|--------|-----|
| zia    | 1   |

after upsert
| person | num |
|--------|-----|
| zia    | 2   |

☆ normalization

83: create trigger

src/trigger_setup.sql

-- Track hours of lab work.
create table job (
    person text not null,
    reported real not null check (reported >= 0.0)
);

-- Explicitly store per-person total rather than using sum().
create table total (
    person text unique not null,
    hours real
);

-- Initialize totals.
insert into total values
    ("gene", 0.0),
    ("august", 0.0);

-- Define a trigger.
create trigger total_trigger
before insert on job
begin
    -- Check that the person exists.
    select case
        when not exists (select 1 from total where person = new.person)
        then raise(rollback, 'Unknown person ')
    end;
    -- Update their total hours (or fail if non-negative constraint violated).
    update total
    set hours = hours + new.reported
    where total.person = new.person;
end;

src/trigger_successful.sql

insert into job values
    ('gene', 1.5),
    ('august', 0.5),
    ('gene', 1.0)
;

out/trigger_successful.out

| person | reported |
|--------|----------|
| gene   | 1.5      |
| august | 0.5      |
| gene   | 1.0      |

| person | hours |
|--------|-------|
| gene   | 2.5   |
| august | 0.5   |

081: trigger firing

src/trigger_firing.sql

insert into job values
    ('gene', 1.0),
    ('august', -1.0)
;

out/trigger_firing.out

Runtime error near line 6: CHECK constraint failed: reported >= 0.0 (19)

| person | hours |
|--------|-------|
| gene   | 0.0   |
| august | 0.0   |

☆ represent graphs

src/lineage_setup.sql

create table lineage (
    parent text not null,
    child text not null
);
insert into lineage values
    ('Arturo', 'Clemente'),
    ('Darío', 'Clemente'),
    ('Clemente', 'Homero'),
    ('Clemente', 'Ivonne'),
    ('Ivonne', 'Lourdes'),
    ('Soledad', 'Lourdes'),
    ('Lourdes', 'Santiago');

src/represent_graph.sql

select * from lineage;

out/represent_graph.out

|  parent  |  child   |
|----------|----------|
| Arturo   | Clemente |
| Darío    | Clemente |
| Clemente | Homero   |
| Clemente | Ivonne   |
| Ivonne   | Lourdes  |
| Soledad  | Lourdes  |
| Lourdes  | Santiago |

lineage diagram

84: recursive query

src/recursive_lineage.sql

with recursive descendent as (
    select
        'Clemente' as person,
        0 as generations
    union all
    select
        lineage.child as person,
        descendent.generations + 1 as generations
    from descendent join lineage
    on descendent.person = lineage.parent
)
select person, generations from descendent;

out/recursive_lineage.out

|  person  | generations |
|----------|-------------|
| Clemente | 0           |
| Homero   | 1           |
| Ivonne   | 1           |
| Lourdes  | 2           |
| Santiago | 3           |

☆ contact tracing database

src/contact_person.sql

select * from person;

out/contact_person.out

| ident |         name          |
|-------|-----------------------|
| 1     | Juana Baeza           |
| 2     | Agustín Rodríquez     |
| 3     | Ariadna Caraballo     |
| 4     | Micaela Laboy         |
| 5     | Verónica Altamirano   |
| 6     | Reina Rivero          |
| 7     | Elias Merino          |
| 8     | Minerva Guerrero      |
| 9     | Mauro Balderas        |
| 10    | Pilar Alarcón         |
| 11    | Daniela Menéndez      |
| 12    | Marco Antonio Barrera |
| 13    | Cristal Soliz         |
| 14    | Bernardo Narváez      |
| 15    | Óscar Barrios         |

src/contact_contacts.sql

select * from contact;

out/contact_contacts.out

|       left        |         right         |
|-------------------|-----------------------|
| Agustín Rodríquez | Ariadna Caraballo     |
| Agustín Rodríquez | Verónica Altamirano   |
| Juana Baeza       | Verónica Altamirano   |
| Juana Baeza       | Micaela Laboy         |
| Pilar Alarcón     | Reina Rivero          |
| Cristal Soliz     | Marco Antonio Barrera |
| Cristal Soliz     | Daniela Menéndez      |
| Daniela Menéndez  | Marco Antonio Barrera |

contact diagram

85: bidirectional contacts

src/bidirectional.sql

create temporary table bi_contact (
    left text,
    right text
);

insert into bi_contact
select
    left, right from contact
    union all
    select right, left from contact
;

out/bidirectional.out

| original_count |
|----------------|
| 8              |

| num_contact |
|-------------|
| 16          |

86: update group identifiers

src/update_group_ids.sql

select
    left.name as left_name,
    left.ident as left_ident,
    right.name as right_name,
    right.ident as right_ident,
    min(left.ident, right.ident) as new_ident
from
    (person as left join bi_contact on left.name = bi_contact.left)
    join person as right on bi_contact.right = right.name;

out/update_group_ids.out

|       left_name       | left_ident |      right_name       | right_ident | new_ident |
|-----------------------|------------|-----------------------|-------------|-----------|
| Juana Baeza           | 1          | Micaela Laboy         | 4           | 1         |
| Juana Baeza           | 1          | Verónica Altamirano   | 5           | 1         |
| Agustín Rodríquez     | 2          | Ariadna Caraballo     | 3           | 2         |
| Agustín Rodríquez     | 2          | Verónica Altamirano   | 5           | 2         |
| Ariadna Caraballo     | 3          | Agustín Rodríquez     | 2           | 2         |
| Micaela Laboy         | 4          | Juana Baeza           | 1           | 1         |
| Verónica Altamirano   | 5          | Agustín Rodríquez     | 2           | 2         |
| Verónica Altamirano   | 5          | Juana Baeza           | 1           | 1         |
| Reina Rivero          | 6          | Pilar Alarcón         | 10          | 6         |
| Pilar Alarcón         | 10         | Reina Rivero          | 6           | 6         |
| Daniela Menéndez      | 11         | Cristal Soliz         | 13          | 11        |
| Daniela Menéndez      | 11         | Marco Antonio Barrera | 12          | 11        |
| Marco Antonio Barrera | 12         | Cristal Soliz         | 13          | 12        |
| Marco Antonio Barrera | 12         | Daniela Menéndez      | 11          | 11        |
| Cristal Soliz         | 13         | Daniela Menéndez      | 11          | 11        |
| Cristal Soliz         | 13         | Marco Antonio Barrera | 12          | 12        |

87: recursive labeling

src/recursive_labeling.sql

with recursive labeled as (
    select
        person.name as name,
	person.ident as label
    from
        person
    union -- not 'union all'
    select
        person.name as name,
	labeled.label as label
    from
        (person join bi_contact on person.name = bi_contact.left)
	join labeled on bi_contact.right = labeled.name
    where labeled.label < person.ident
)
select name, min(label) as group_id
from labeled
group by name
order by label, name;

out/recursive_labeling.out

|         name          | group_id |
|-----------------------|----------|
| Agustín Rodríquez     | 1        |
| Ariadna Caraballo     | 1        |
| Juana Baeza           | 1        |
| Micaela Laboy         | 1        |
| Verónica Altamirano   | 1        |
| Pilar Alarcón         | 6        |
| Reina Rivero          | 6        |
| Elias Merino          | 7        |
| Minerva Guerrero      | 8        |
| Mauro Balderas        | 9        |
| Cristal Soliz         | 11       |
| Daniela Menéndez      | 11       |
| Marco Antonio Barrera | 11       |
| Bernardo Narváez      | 14       |
| Óscar Barrios         | 15       |

88: query from Python

src/basic_python_query.py

import sqlite3

connection = sqlite3.connect("db/penguins.db")
cursor = connection.execute("select count(*) from penguins;")
rows = cursor.fetchall()
print(rows)

out/basic_python_query.out

[(344,)]

89: incremental fetch

src/incremental_fetch.py

import sqlite3

connection = sqlite3.connect("db/penguins.db")
cursor = connection.cursor()
cursor = cursor.execute("select species, island from penguins limit 5;")
while row := cursor.fetchone():
    print(row)

out/incremental_fetch.out

('Adelie', 'Torgersen')
('Adelie', 'Torgersen')
('Adelie', 'Torgersen')
('Adelie', 'Torgersen')
('Adelie', 'Torgersen')

90: insert, delete, and all that

src/insert_delete.py

import sqlite3

connection = sqlite3.connect(":memory:")
cursor = connection.cursor()
cursor.execute("create table example(num integer);")

cursor.execute("insert into example values (10), (20);")
print("after insertion", cursor.execute("select * from example;").fetchall())

cursor.execute("delete from example where num < 15;")
print("after deletion", cursor.execute("select * from example;").fetchall())

out/insert_delete.out

after insertion [(10,), (20,)]
after deletion [(20,)]

91: interpolate values

src/interpolate.py

import sqlite3

connection = sqlite3.connect(":memory:")
cursor = connection.cursor()
cursor.execute("create table example(num integer);")

cursor.executemany("insert into example values (?);", [(10,), (20,)])
print("after insertion", cursor.execute("select * from example;").fetchall())

out/interpolate.out

after insertion [(10,), (20,)]

XKCD Exploits of a Mom

92: script execution

src/script_execution.py

import sqlite3

SETUP = """\
drop table if exists example;
create table example(num integer);
insert into example values (10), (20);
"""

connection = sqlite3.connect(":memory:")
cursor = connection.cursor()
cursor.executescript(SETUP)
print("after insertion", cursor.execute("select * from example;").fetchall())

out/script_execution.out

after insertion [(10,), (20,)]

93: SQLite exceptions in Python

src/exceptions.py

import sqlite3

SETUP = """\
create table example(num integer check(num > 0));
insert into example values (10);
insert into example values (-1);
insert into example values (20);
"""

connection = sqlite3.connect(":memory:")
cursor = connection.cursor()
try:
    cursor.executescript(SETUP)
except sqlite3.Error as exc:
    print(f"SQLite exception: {exc}")
print("after execution", cursor.execute("select * from example;").fetchall())

out/exceptions.out

SQLite exception: CHECK constraint failed: num > 0
after execution [(10,)]

94: Python in SQLite

src/embedded_python.py

import sqlite3

SETUP = """\
create table example(num integer);
insert into example values (-10), (10), (20), (30);
"""


def clip(value):
    if value < 0:
        return 0
    if value > 20:
        return 20
    return value


connection = sqlite3.connect(":memory:")
connection.create_function("clip", 1, clip)
cursor = connection.cursor()
cursor.executescript(SETUP)
for row in cursor.execute("select num, clip(num) from example;").fetchall():
    print(row)

out/embedded_python.out

(-10, 0)
(10, 10)
(20, 20)
(30, 20)

95: handle dates and times

src/dates_times.py

from datetime import date
import sqlite3


# Convert date to ISO-formatted string when writing to database
def _adapt_date_iso(val):
    return val.isoformat()


sqlite3.register_adapter(date, _adapt_date_iso)


# Convert ISO-formatted string to date when reading from database
def _convert_date(val):
    return date.fromisoformat(val.decode())


sqlite3.register_converter("date", _convert_date)

SETUP = """\
create table events(
    happened date not null,
    description text not null
);
"""

connection = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES)
cursor = connection.cursor()
cursor.execute(SETUP)

cursor.executemany(
    "insert into events values (?, ?);",
    [(date(2024, 1, 10), "started tutorial"), (date(2024, 1, 29), "finished tutorial")],
)

for row in cursor.execute("select * from events;").fetchall():
    print(row)

out/dates_times.out

(datetime.date(2024, 1, 10), 'started tutorial')
(datetime.date(2024, 1, 29), 'finished tutorial')

96: SQL in Jupyter notebooks

src/install_jupysql.sh

pip install jupysql

src/load_ext.text

%load_ext sql

src/jupyter_connect.text

%sql sqlite:///data/penguins.db

out/jupyter_connect.out

Connecting to 'sqlite:///data/penguins.db'

src/jupyter_select.text

%%sql
select species, count(*) as num
from penguins
group by species;

out/jupyter_select.out

Running query in 'sqlite:///data/penguins.db'
species num
Adelie 152
Chinstrap 68
Gentoo 124

97: Pandas and SQL

src/install_pandas.sh

pip install pandas

src/select_pandas.py

import pandas as pd
import sqlite3

connection = sqlite3.connect("db/penguins.db")
query = "select species, count(*) as num from penguins group by species;"
df = pd.read_sql(query, connection)
print(df)

out/select_pandas.out

species  num
0     Adelie  152
1  Chinstrap   68
2     Gentoo  124

98: Polars and SQL

src/install_polars.sh

pip install polars pyarrow adbc-driver-sqlite

src/select_polars.py

import polars as pl

query = "select species, count(*) as num from penguins group by species;"
uri = "sqlite:///db/penguins.db"
df = pl.read_database_uri(query, uri, engine="adbc")
print(df)

out/select_polars.out

shape: (3, 2)
┌───────────┬─────┐
│ species   ┆ num │
│ ---       ┆ --- │
│ str       ┆ i64 │
╞═══════════╪═════╡
│ Adelie    ┆ 152 │
│ Chinstrap ┆ 68  │
│ Gentoo    ┆ 124 │
└───────────┴─────┘

99: object-relational mapper

src/orm.py

from sqlmodel import Field, Session, SQLModel, create_engine, select


class Department(SQLModel, table=True):
    ident: str = Field(default=None, primary_key=True)
    name: str
    building: str


engine = create_engine("sqlite:///db/assays.db")
with Session(engine) as session:
    statement = select(Department)
    for result in session.exec(statement).all():
        print(result)

out/orm.out

building='Chesson' name='Genetics' ident='gen'
building='Fashet Extension' name='Histology' ident='hist'
building='Chesson' name='Molecular Biology' ident='mb'
building='TGVH' name='Endocrinology' ident='end'

100: relations with ORM

src/orm_relation.py

class Staff(SQLModel, table=True):
    ident: str = Field(default=None, primary_key=True)
    personal: str
    family: str
    dept: Optional[str] = Field(default=None, foreign_key="department.ident")
    age: int


engine = create_engine("sqlite:///db/assays.db")
SQLModel.metadata.create_all(engine)
with Session(engine) as session:
    statement = select(Department, Staff).where(Staff.dept == Department.ident)
    for dept, staff in session.exec(statement):
        print(f"{dept.name}: {staff.personal} {staff.family}")

out/orm_relation.out

Histology: Divit Dhaliwal
Molecular Biology: Indrans Sridhar
Molecular Biology: Pranay Khanna
Histology: Vedika Rout
Genetics: Abram Chokshi
Histology: Romil Kapoor
Molecular Biology: Ishaan Ramaswamy
Genetics: Nitya Lal

☆ Appendices

☆ Terms

1-to-1 relation
A relationship between two tables in which each record from the first table matches exactly one record from the second and vice versa.
1-to-many relation
A relationship between two tables in which each record from the first table matches zero or more records from the second, but each record from the second table matches exactly one record from the first.
aggregation
Combining several values to produce one.
aggregation function
A function used to produce one value from many, such as maximum or addition.
alias
An alternate name used temporarily for a table or column.
atomic
An operation that cannot be broken into smaller operations.
autoincrement
Automatically add one to a value.
base case
A starting point for recursion that does not depend on previous recursive calculations.
Binary Large Object (blob)
Bytes that are handled as-is rather than being interpreted as numbers, text, or other data types.
cross join
A join that creates the cross-product of rows from two tables.
common table expression (CTE)
A temporary table created at the start of a query, usually to simplify writing the query.
consistent
A state in which all constraints are satisfied, e.g., all columns contain allowed values and all foreign keys refer to primary keys.
correlated subquery
A subquery that depends on a value or values from the enclosing query, and which must therefore be executed once for each of those values.
cursor
A reference to the current location in the results of an ongoing query.
data migration
To move data from one form to another, e.g., from one set of tables to a new set or from one DBMS to another.
database
A collection of data that can be searched and retrieved.
database management system (DBMS)
A program that manages a particular kind of database.
denormalization
To deliberately introduce duplication or other violate normal forms, typically to improve query performance.
durable
Guaranteed to survive shutdown and restart.
entity-relationship diagram
A graphical depiction of the relationships between tables in a database.
filter
To select records based on whether they pass some Boolean test.
foreign key
A value in one table that identifies a primary key in another table.
full outer join
See cross join.
group
A set of records that share a common property, such as having the same value in a particular column.
in-memory database
A database that is stored in memory rather than on disk.
index
An auxiliary data structure that enables faster access to records.
infinite recursion
See “infinite recursion”.
isolated
The appearance of having executed in an otherwise-idle system.
join
To combine records from two tables.
join condition
The criteria used to decide which rows from each table in a join are combined.
join table
A table that exists solely to enable information from two tables to be connected.
left outer join
A join that is guaranteed to keep all rows from the first (left) table. Columns from the right table are filled with actual values if available or with null otherwise.
many-to-many relation
A relationship between two tables in which each record from the first table may match zero or more records from the second and vice versa.
materialized view
A view that is stored on disk and updated on demand.
normal form
One of several (loosely defined) rules for organizing data in tables.
NoSQL database
Any database that doesn’t use the relational model.
null
A special value representing “not known”.
object-relational mapper (ORM)
A library that translates objects in a program into database queries and the results of those queries back into objects.
path expression
An expression identifying an element or a set of elements in a JSON structure.
primary key
A value or values in a database table that uniquely identifies each record in that table.
query
A command to perform some operation in a database (typically data retrieval).
recursive CTE
A common table expression that refers to itself. Every recursive CTE must have a base case and a recursive case.
recursive case
The second or subsequent step in self-referential accumulation of data.
relational database management system (RDBMS)
A database management system that stores data in tables with columns and rows.
subquery
A query used within another query.
table-valued function
A function that returns multiple values rather than a single value.
temporary table
A table that is explicitly constructed in memory outside any particular query.
ternary logic
A logic based on three values: true, false, and “don’t know” (represented as null).
tombstone
A marker value added to a record to show that it is no longer active. Tombstones are used as an alternative to deleting data.
trigger
An action that runs automatically when something happens in a database, typically insertion or deletion.
upsert
To update a record if it exists or insert (create) a new record if it doesn’t.
Uniform Resource Identifier (URI)
A string that identifies a resource (such as a web page or database) and the protocol used to access it.
view
A rearrangement of data in a database that is regenerated on demand.
window function
A function that combines data from adjacent rows in a database query’s result.

☆ Acknowledgments

This tutorial would not have been possible without:

I would also like to thank the following for spotting issues, making suggestions, or submitting changes: