Contributing
Contributions are very welcome. Please file issues or submit pull requests in our GitHub repository. All contributors will be acknowledged.
In Brief
-
The tutorial lives in
index.md, which is translated into a static GitHub Pages website using Jekyll. -
The source files for examples are in
src/, while the output they generate is inout/. -
Use
pip install -r requirements.txtto install the packages required by the helper tools and Python examples. You may wish to create a new virtual environment before doing so. All code has been tested with Python 3.12.1. -
Makefilecontains the commands used to re-run each example. If you add a new example, please add a corresponding rule inMakefile. -
Use
{% include h2_numbered.md title="words" %}to create a numbered section that includes a runnable code example. -
Use
{% include h2_unnumbered.md title="words" %}to create an unnumbered section that doesn’t include a runnable code example. -
Use
{% include double.md stem="file" suffix="sql out" %}inindex.mdto includesrc/file.sqlandout/file.out. (Any two suffixes can be provided, such as"py out".) -
Use
{% include single.md file="dir/file.ext" %}inindex.mdto include an arbitrary text file without automatically including output. (Note thatsinglerequires a directory name such assrcoroutbutdoubledoes not.) -
Add important terms to
_info/glossary.yml, which is in Glosario format. -
Use
<a href="#g:key">text</span>to link to glossary entries. The key must identify an entry in_data/glossary.yml; the#makes it an in-page reference, while theg:prefix triggers CSS styling. -
SVG images used in the tutorial are in
img/and can be edited using draw.io. Please use 12-point Helvetica for text, solid 1-point black lines, and unfilled objects. -
All external links are written using
[box][notation]inline and defined in_data/tutorial.yml.
Logical Structure
- Introduction
- A learner persona that characterizes the intended audience in concrete terms.
- Prerequisites (which should be interpreted with reference to the learner persona).
- Learning objectives that define the tutorial’s scope.
- Setup instructions that instructors and learners must go through in order to code along
- Episodes
- Runnable episodes are numbered. Each contains one code sample, its output, and notes for the instructor. Learners are not expected to be able to understand episodes without instructor elaboration.
- Asides are not numbered, and contain code-less explanatory material, additional setup instructions, concept maps summarizing recently-introduced ideas, etc.
- Episodes of both kinds may contain glossary references and/or explanatory diagrams.
- Appendices
- A glossary that defines terms called out in the episodes.
- Acknowledgments that point at inspirations and thank contributors.
Physical Structure
CODE_OF_CONDUCT.md: source for Code of Conductconduct_.md: auxiliary file to translate CoC into HTML
CONTRIBUTING.md: this guidecontributing_.md: auxiliary file to translate this guide into HTML
LICENSE.md: licenses for code and proselicense_.md: auxiliary file to translate licenses into HTML
Makefile: commands for rebuilding examples- Run
makewith no arguments to see available targets
- Run
README.md: home page_config.yml: Jekyll configuration file_data/: auxiliary data files used to build website_data/thanks.yml: names of people to include in acknowledgments
_includes/: Jekyll inclusions (discussed above)_layouts/: Jekyll layouts_site/: Jekyll output (not stored in version control)bin/: helper programs (e.g., for generating databases)db/: databases used in examplesfavicon.ico: favicon used in generated siteimg/: images used in tutorialindex.md: home page for generated websitemisc/: miscellaneous filesjupysql.ipynb: Jupyter notebook used in examplespenguins.csv: Palmer penguins datasql_keywords.txt: all SQLite keywords
out/: generated output files for examplesrequirements.txt:piprequirements file to build Python environmentres/: CSS and JavaScript for styling sitesrc/: source files for examples
Tags for Issues and Pull Requests
contribute-addition: a pull request that contains new materialcontribute-change: a pull request that changes or fixes existing materialgovernance: discussion of direction, use, etc.help-wanted: requires knowledge or skills the core maintainer lacksin-content: issue or PR is related to lesson contentin-infrastructure: issue or PR is related to build tools, styling, etc.report-bug: issue reporting an errorrequest-addition: issue asking for new contentrequest-change: issue asking for a change to existing content
FAQ
- Why SQL?
- Because if you dig down far enough, almost every data science project sits on top of a relational database. (Jon Udell once called PostgreSQL “an operating system for data science”.) SQL’s relational model has also been a powerful influence on dataframe libraries like the tidyverse, Pandas, and Polars; understanding the former therefore helps people understand the latter.
- Why Jekyll?
- It’s the default for GitHub Pages, and if we used something more comfortable people would spend time fiddling with the tools instead of writing content.
- Why Make?
- It runs everywhere, no other build tool is a clear successor, and, like Jekyll, it’s uncomfortable enough to use that people won’t be tempted to fiddle with it when they could be writing.
- Why hand-drawn figures rather than Graphviz or Mermaid?
- Because it’s faster to Just Effing Draw than it is to try to tweak layout parameters for text-to-diagram systems. If you really want to make developers’ lives better, build a diff-and-merge tool for SVG: programmers shouldn’t have to use punchard-compatible data formats in the 21st Century just to get the benefits of version control.
- Why make this tutorial freely available?
- Because if we all give a little, we all get a lot.