Contributing
Contributions are very welcome. Please file issues or submit pull requests in our GitHub repository. All contributors will be acknowledged.
In Brief
-
The tutorial lives in
index.md
, which is translated into a static GitHub Pages website using Jekyll. -
The source files for examples are in
src/
, while the output they generate is inout/
. -
Use
pip install -r requirements.txt
to install the packages required by the helper tools and Python examples. You may wish to create a new virtual environment before doing so. All code has been tested with Python 3.12.1. -
Makefile
contains the commands used to re-run each example. If you add a new example, please add a corresponding rule inMakefile
. -
Use
{% include h2_numbered.md title="words" %}
to create a numbered section that includes a runnable code example. -
Use
{% include h2_unnumbered.md title="words" %}
to create an unnumbered section that doesn’t include a runnable code example. -
Use
{% include double.md stem="file" suffix="sql out" %}
inindex.md
to includesrc/file.sql
andout/file.out
. (Any two suffixes can be provided, such as"py out"
.) -
Use
{% include single.md file="dir/file.ext" %}
inindex.md
to include an arbitrary text file without automatically including output. (Note thatsingle
requires a directory name such assrc
orout
butdouble
does not.) -
Add important terms to
_info/glossary.yml
, which is in Glosario format. -
Use
<a href="#g:key">text</span>
to link to glossary entries. The key must identify an entry in_data/glossary.yml
; the#
makes it an in-page reference, while theg:
prefix triggers CSS styling. -
SVG images used in the tutorial are in
img/
and can be edited using draw.io. Please use 12-point Helvetica for text, solid 1-point black lines, and unfilled objects. -
All external links are written using
[box][notation]
inline and defined in_data/tutorial.yml
.
Logical Structure
- Introduction
- A learner persona that characterizes the intended audience in concrete terms.
- Prerequisites (which should be interpreted with reference to the learner persona).
- Learning objectives that define the tutorial’s scope.
- Setup instructions that instructors and learners must go through in order to code along
- Episodes
- Runnable episodes are numbered. Each contains one code sample, its output, and notes for the instructor. Learners are not expected to be able to understand episodes without instructor elaboration.
- Asides are not numbered, and contain code-less explanatory material, additional setup instructions, concept maps summarizing recently-introduced ideas, etc.
- Episodes of both kinds may contain glossary references and/or explanatory diagrams.
- Appendices
- A glossary that defines terms called out in the episodes.
- Acknowledgments that point at inspirations and thank contributors.
Physical Structure
CODE_OF_CONDUCT.md
: source for Code of Conductconduct_.md
: auxiliary file to translate CoC into HTML
CONTRIBUTING.md
: this guidecontributing_.md
: auxiliary file to translate this guide into HTML
LICENSE.md
: licenses for code and proselicense_.md
: auxiliary file to translate licenses into HTML
Makefile
: commands for rebuilding examples- Run
make
with no arguments to see available targets
- Run
README.md
: home page_config.yml
: Jekyll configuration file_data/
: auxiliary data files used to build website_data/thanks.yml
: names of people to include in acknowledgments
_includes/
: Jekyll inclusions (discussed above)_layouts/
: Jekyll layouts_site/
: Jekyll output (not stored in version control)bin/
: helper programs (e.g., for generating databases)db/
: databases used in examplesfavicon.ico
: favicon used in generated siteimg/
: images used in tutorialindex.md
: home page for generated websitemisc/
: miscellaneous filesjupysql.ipynb
: Jupyter notebook used in examplespenguins.csv
: Palmer penguins datasql_keywords.txt
: all SQLite keywords
out/
: generated output files for examplesrequirements.txt
:pip
requirements file to build Python environmentres/
: CSS and JavaScript for styling sitesrc/
: source files for examples
Tags for Issues and Pull Requests
contribute-addition
: a pull request that contains new materialcontribute-change
: a pull request that changes or fixes existing materialgovernance
: discussion of direction, use, etc.help-wanted
: requires knowledge or skills the core maintainer lacksin-content
: issue or PR is related to lesson contentin-infrastructure
: issue or PR is related to build tools, styling, etc.report-bug
: issue reporting an errorrequest-addition
: issue asking for new contentrequest-change
: issue asking for a change to existing content
FAQ
- Why SQL?
- Because if you dig down far enough, almost every data science project sits on top of a relational database. (Jon Udell once called PostgreSQL “an operating system for data science”.) SQL’s relational model has also been a powerful influence on dataframe libraries like the tidyverse, Pandas, and Polars; understanding the former therefore helps people understand the latter.
- Why Jekyll?
- It’s the default for GitHub Pages, and if we used something more comfortable people would spend time fiddling with the tools instead of writing content.
- Why Make?
- It runs everywhere, no other build tool is a clear successor, and, like Jekyll, it’s uncomfortable enough to use that people won’t be tempted to fiddle with it when they could be writing.
- Why hand-drawn figures rather than Graphviz or Mermaid?
- Because it’s faster to Just Effing Draw than it is to try to tweak layout parameters for text-to-diagram systems. If you really want to make developers’ lives better, build a diff-and-merge tool for SVG: programmers shouldn’t have to use punchard-compatible data formats in the 21st Century just to get the benefits of version control.
- Why make this tutorial freely available?
- Because if we all give a little, we all get a lot.