Create projects as quickly
as news breaks

Using Cookiecutter templates, projects can be spun up quickly whenever needed.

terminal

  • datakit project create
  • Creating project from template: /Users/lfenn/.cookiecutters/cookiecutter-r-project
    full_name [Larry Fenn]:
    email [lfenn@ap.org]:
    project_name [New Project]: Hudson Helicopter Crash
    project_slug [hudson-helicopter-crash]
    project_short_description [TK: short project description]: Pull FAA data on helicopter crashes
  • cd hudson-helicopter-crash

Simple, adaptable project structure

Keep all parts of a project cleanly separated: data, code, configuration, documentation.

terminal

    ├── README.md
    ├── analysis
    │   ├── analysis.R
    │   ├── graphs.Rmd
    │   └── story_lines.R
    ├── config
    │   └── datakit-data.json
    ├── data
    │   ├── manual
    │   ├── processed
    │   └── source
    ├── docs
    ├── output
    │   └── totals_over_time.csv
    └── trump-counties-employment.Rproj

Integrate with cloud storage
and hosting

Automate data and code syncing to take the guesswork out of storage and backup.

terminal

  • datakit data push
  • EXECUTING: aws s3 sync --profile default data/ s3://data.ap.org/projects/2019/trump-counties-employment/data/
    upload: data/manual/trumpjobs.sqlite to s3://data.ap.org/projects/2019/trump-counties-employment/data/manual/trumpjobs.sqlite
    upload: data/reports/graphs.html to s3://data.ap.org/projects/2019/trump-counties-employment/data/reports/graphs.html
    upload: data/source/LAUS/laucnty15.xlsx to s3://data.ap.org/projects/2019/trump-counties-employment/data/source/LAUS/laucnty15.xlsx
    upload: data/source/LAUS/laucnty16.xlsx to s3://data.ap.org/projects/2019/trump-counties-employment/data/source/LAUS/laucnty16.xlsx
    upload: data/source/LAUS/laucnty17.xlsx to s3://data.ap.org/projects/2019/trump-counties-employment/data/source/LAUS/laucnty17.xlsx
    upload: data/source/LAUS/laucnty18.xlsx to s3://data.ap.org/projects/2019/trump-counties-employment/data/source/LAUS/laucnty18.xlsx

Quickstart installation guide

For Python 3. If you do not have Python 3 installed on your machine, get the latest version here.

More detailed installation documents are available here.



1. Install datakit-project

This is our most popular plugin and sets you up nicely to use other plugins in the future if you want.

terminal

  • pip install datakit-project

2. Grab a template

DataKit uses Cookiecutter templates for project structure and initial configuration.

The following templates are available:

  • Python project template: This is a Jupyter-specific cookiecutter which uses pipenv.
  • R project template: This is a R project cookiecutter which uses an RStudio .Rproj file.
  • Generic template: This is a basic cookiecutter which defines only the folder structure, a project README, and a .gitignore file. If your project workflow isn't covered by the Python or R project templates, or you want to develop your own project template, this is the place to start.

terminal

  • datakit project create -t https://github.com/associatedpress/cookiecutter-r-project.git

3. Get to work

On the command line, datakit project create will create a project with a standardized file structure.

terminal

  • datakit project create

4. Adapt to your workflow

Additional plugins can help you manage the storage of flat data files, sync your code to GitLab or GitHub and push your output to data.world for sharing. Grab other plugins or develop your own!

Community Contributions

Useful DataKit plugins from around the community:

Better collaborations, less mess

Questions? Comments? Drop us a line at datateam@ap.org
More information on the AP's data journalism program