What we’ll cover

  • Basic shell navigation
  • Cookiecutter data analysis
  • Managing virtual environments
  • Creating a test data pipeline
  • Doing a groupby aggregation
  • Doing unit tests

What we won’t cover

This seminar is just a taster of what we can cover in a full-day workshop, so we won’t be covering:

  • hardware and GPUs
  • working on clusters
  • dynamic data linking
  • file formats
  • using the key python libraries in more detail
  • unit testing
  • version control with git
  • working with larger datasets
  • interactive visualisations
  • reproducible containers like Docker or Singularity
  • working on your data


What do you feel are the major bottlenecks in your analysis process?

Previous submodule: