What we’ll cover

Basic shell navigation
Cookiecutter data analysis
Managing virtual environments
Creating a test data pipeline
Doing a groupby aggregation
Doing unit tests

What we won’t cover

This seminar is just a taster of what we can cover in a full-day workshop, so we won’t be covering:

hardware and GPUs
working on clusters
dynamic data linking
file formats
using the key python libraries in more detail
unit testing
version control with git
working with larger datasets
interactive visualisations
reproducible containers like Docker or Singularity
working on your data

Activity

What do you feel are the major bottlenecks in your analysis process?

Previous submodule: