Setting the Scene
|
For developing software that will be used by other people aside from you, it is not enough to write code that produces seemingly correct output in a few cases. You have to check that the software performs well in different conditions and with different input data, and if something goes wrong, the user is notified of this.
This lesson focuses on intermediate skills and tools for making sure that your software is correct, reliable and fast.
The lesson follows on from the novice Software Carpentry lesson, but this is not a prerequisite for attending as long as you have some basic Python, command line and Git skills and you have been using them for a while to write code to help with your work.
|
Section 1: Obtaining the Software Project and Preparing Virtual Environment
|
In order to develop (write, test, debug, backup) code efficiently, you need to use a number of different tools.
When there is a choice of tools for a task you will have to decide which tool is right for you, which may be a matter of personal preference or what the team or community you belong to is using.
A popular tool for organizing collaborative software development is Git, that allows you to share your code with other people and keep track of its changes.
|
Introduction to Our Software Project
|
Using Git and Github, we can share our code with others and obtain our own copies of others’ projects.
The structure of the software project is defined by its purposes and requirements.
Separation of concerns is one of the most basic principles when deciding on software architecture.
|
Virtual Environments For Software Development
|
Virtual environments keep Python versions and dependencies required by different projects separate.
A virtual environment is itself a directory structure.
Use venv to create and manage Python virtual environments.
Use pip to install and manage Python external (third-party) libraries.
pip allows you to declare all dependencies for a project in a separate file (by convention called requirements.txt ) which can be shared with collaborators/users and used to replicate a virtual environment.
Use pip3 freeze > requirements.txt to take snapshot of your project’s dependencies.
Use pip3 install -r requirements.txt to replicate someone else’s virtual environment on your machine from the requirements.txt file.
|
Section 2: Ensuring Correctness of Software at Scale
|
Using testing requires us to change our practice of code development, but saves time in the long run by allowing us to more comprehensively and rapidly find faults in code, as well as giving us greater confidence in the correctness of our code.
Writing parametrized tests makes sure that you are testing your software in different scenarios.
Writing tests before the features forces you to think of the requirements and best possible implementations in advance.
|
Automatically Testing Software
|
The three main types of automated tests are unit tests, functional tests and regression tests.
We can write unit tests to verify that functions generate expected output given a set of specific inputs.
It should be easy to add or change tests, understand and run them, and understand their results.
We can use a unit testing framework like Pytest to structure and simplify the writing of tests in Python.
We should test for expected errors in our code.
Testing program behaviour against both valid and invalid inputs is important and is known as data validation.
|
Scaling Up Unit Testing
|
We can assign multiple inputs to tests using parametrisation.
It’s important to understand the coverage of our tests across our code.
Writing unit tests takes time, so apply them where it makes the most sense.
|
Robust Software with Testing Approaches
|
Ensure that unit tests check for edge and corner cases too.
Use preconditions to ensure correct behaviour of code.
Write tests before the code itself to think of the functionality and desired behaviour in advance.
|
Section 3: Automatizing code quality checks
|
Running tests every time when the code is updates is tiresome, and we can be tempted to skip this step. We should use automatic Continuous Integration tools, such as GitHub Actions, to make sure that the tests are executed regularly.
Continuous Integration also allows us to set up an automatic code style checks.
To check how efficient our code is and find bottlenecks, we can use Jupyter Lab built-in magic commands and Python libraries, such as cProfile and SnakeViz
|
Continuous Integration for Automated Testing
|
Continuous Integration can run tests automatically to verify changes as code develops in our repository.
CI builds are typically triggered by commits pushed to a repository.
We need to write a configuration file to inform a CI service what to do for a build.
We can specify a build matrix to specify multiple platforms and programming language versions to test against
Builds can be enabled and configured separately for each branch.
We can run - and get reports from - different CI infrastructure builds simultaneously.
|
Verifying Code Style Using Linters
|
|
Measuring time and computational resources required by the software
|
For our software to be usable, we need to take care not only of its correctness, but also of its performance.
Profiling is necessary for large computationally expensive projects or for the software that will be processing large datasets.
Finding bottlenecks and ineffective subroutines is an important part of refactoring, but it is also a useful thing to do at the stage of planning the architecture of the software.
|
Wrap-up
|
|