.. _contribution-guide:
Development Guidelines
======================
Setup
-----
Python environment
~~~~~~~~~~~~~~~~~~
There is an ``environment.yml`` provided in the repository root, which installs all
required development dependencies in the ``facet-develop`` environment.
.. code-block:: sh
conda env create -f environment.yml
conda activate facet-develop
Pre-commit hooks
~~~~~~~~~~~~~~~~
This project uses a number of pre-commit hooks such as black and flake8 to enforce
uniform coding standards in all commits. Before committing code, please run
.. code-block:: sh
pre-commit install
You can use ``pre-commit run`` to manually run the pre-commit hooks from the command
line.
Pytest
~~~~~~
Run ``pytest test/`` from the facet root folder or use the PyCharm test runner. To
measure coverage, use ``pytest --cov=src/facet test/``. Note that the code coverage
reports are also generated in the Azure Pipelines (see CI/CD section).
Note that you will need to set the PYTHONPATH to the ``src/`` directory by
running ``export PYTHONPATH=./src/`` from the repository root.
Git Guidelines
--------------
For commits to GitHub, phrase commit comments as the completion of the sentence *This
commit will …*, e.g.
.. code-block:: RST
add method foo to class Bar
but not
.. code-block:: RST
added method foo to class Bar
Documentation
-------------
This section provides a general guide to the documentation of FACET, including
docstrings, Sphinx, the README and tutorial notebooks.
Docstrings
~~~~~~~~~~
The API documentation is generated from docstrings in the source code. Before writing
your own, take some time to study the existing code documentation and emulate the same
style. Describe not only what the code does, but also why, including the rationale for
any design choices that may not be obvious. Provide examples wherever this helps
explain usage patterns.
- A docstring is mandatory for all of the following entities in the source code,
except when they are protected/private (i.e. the name starts with a leading `_`
character):
- modules
- classes
- functions/methods
- properties
- attributes
- Docstrings are not necessary for non-public methods, but you should have a comment
that describes what the method does.
- Docstrings must use *reStructuredText* syntax, the default syntax for Sphinx.
- Write docstrings for functions and methods in the imperative style, e.g.,
.. code-block:: python
def fit():
"""Fit the model."""
but not
.. code-block:: python
def fit():
"""This is a function that fits the model."""
which is too wordy and not imperative.
- Write docstrings for modules, classes, modules, and attributes starting with a
descriptive phrase (as you would expect in a dictionary entry). Be concise and avoid
unnecessary or redundant phrases.
For example:
.. code-block:: python
class Inspector:
"""
Explains the inner workings of a predictive model using the SHAP approach.
The inspector offers the following analyses:
- ...
- ...
but not
.. code-block:: python
class Inspector:
"""
This is a class that provides the functionality to inspect models
...
as this is too verbose, and explains the class in terms of its name which does not add
any information.
- Properties should be documented as if they were attributes, not as methods, e.g.,
.. code-block:: python
@property
def children(self) -> Foo:
"""The child nodes of the tree."""
pass
but not
.. code-block:: python
@property
def foo(self) -> Foo:
""":return: the foo object"""
pass
- Start full sentences and phrases with a capitalised word and end each sentence with
punctuation, e.g.,
.. code-block:: python
"""Fit the model."""
but not
.. code-block:: python
"""fit the model"""
- For multi-line docstrings, insert a line break after the leading triple quote and before
the trailing triple quote, e.g.,
.. code-block:: python
def fit():
"""
Fit the model.
Use the underlying estimator's ``fit`` method
to fit the model using the given training sample.
:param sample: training sample
"""
but not
.. code-block:: python
def fit():
"""Fit the model.
Use the underlying estimator's ``fit`` method
to fit the model using the given training sample.
:param sample: training sample"""
- For method arguments, return value, and class parameters, one must hint the type using
the typing module. Do not specify the parameter types in the docstrings, e.g.,
.. code-block:: python
def f(x: int) -> float:
"""
Do something.
:param x: input value
:return: output value
"""
but not
.. code-block:: python
def f(x: int) -> float:
"""
Do something.
:param int x: input value
:return float: output value
"""
Sphinx Build
~~~~~~~~~~~~
Documentation for FACET is built using `sphinx `_.
Before building the documentation ensure the ``facet-develop`` environment is active as
the documentation build has a number of key dependencies specified in the
``environment.yml`` file, specifically:
- ``sphinx``
- ``pydata-sphinx-theme``
- ``nbsphinx``
- ``sphinx-autodoc-typehints``
To generate the Sphinx documentation locally navigate to ``/sphinx`` and run
.. code-block:: sh
python make.py html
By default this will clean any previous build. The generated Sphinx
documentation for FACET can then be found at ``sphinx/build/html``.
Documentation versioning is managed via the release process - see the section on
building and releasing FACET.
The ``sphinx`` folder in the root directory contains the following:
- a ``make.py`` script for executing the documentation build via python
- a ``source`` directory containing predefined .rst files for the documentation build
and other required elements (see below for more details)
- a ``base`` folder which contains
* the ``make_base.py`` and ``conf_base.py`` scripts with nearly all configuration for
``make.py`` and ``conf.py``
* ``_static`` directory, containing logos, icons, javascript and css used for
*pytools* and other packages documentation builds
* ``_templates`` directory, containing *autodoc* templates used in generating and
formatting the modules and classes for the API documentation
The ``sphinx/source`` folder contains:
- a ``conf.py`` script that is the
`build configuration file `_
needed to customize the input and output behavior of the Sphinx documentation build
(see below for further details)
- a ``tutorials`` directory that contains all the notebooks (and supporting data) used in
the documentation build. Note that as some notebooks take a little while to generate, the
notebooks are currently committed with cell output. This may change in the future where
notebooks are run as part of the sphinx build
- the essential ``.rst`` files used for the documentation build, which are:
* ``index.rst``: definition of the high-level documentation structure which mainly
references the other ``.rst`` files in this directory
* ``contribution_guide.rst``: detailed information on building and releasing
FACET.
* ``faqs.rst``: contains guidance on bug reports/feature requests, how to contribute
and answers to frequently asked questions including small code snippets
* ``api_landing.rst``: for placing any API landing page preamble for documentation
as needed. This information will appear on the API landing page in the
documentation build after the short description in ``src/__init__.py``. This file
is included in the documentation build via the ``custom-module-template.rst``
- ``_static`` contains additional material used in the documentation build, in this
case, logos and icons
The two key scripts are ``make.py`` and ``conf.py``. The base configuration for the
these scripts can be found in `pytools/sphinx `_.
The reason for this is to minimise code given the standardization of the documentation
build across multiple packages.
**make.py**: All base configuration comes from ``pytools/sphinx/base/make_base.py`` and
this script includes defined commands for key steps in the documentation build. Briefly,
the key steps for the documentation build are:
- **Clean**: remove the existing documentation build
- **FetchPkgVersions**: fetch the available package versions with documentation
- **ApiDoc**: generate API documentation from sources
- **Html**: run Sphinx build to generate HTMl documentation
The two other commands are **Help** and **PrepareDocsDeployment**, the latter of which
is covered below under *Building and releasing FACET*.
**conf.py**: All base configuration comes from ``pytools/sphinx/base/conf_base.py``. This
`build configuration file `_
is a requirement of Sphinx and is needed to customize the input and output behavior of
the documentation build. In particular, this file highlights key extensions needed in
the build process, of which some key ones are as follows:
- `intersphinx `_
(external links to other documentations built with Sphinx: matplotlib, numpy, ...)
- `viewcode `_
to include source code in the documentation, and links to the source code from the objects documentation
- `imgmath `_
to render math expressions in doc strings. Note that a local latex installation is
required (e.g., `MiKTeX `_ for Windows)
README
~~~~~~
The README file for the repo is .rst format instead of the perhaps more traditional
markdown format. The reason for this is the ``README.rst`` is included as the quick start
guide in the documentation build. This helped minimize code duplication. However,
there are a few key points to be aware of:
- The README has links to figures, logos and icons located in the ``sphinx/source/_static`` folder.
To ensure these links are correct when the documentation is built, they are altered and then the
contents of the ``README.rst`` is incorporated into the ``getting_started.rst`` which is generated
during the build and can be found in ``sphinx/source/getting_started``.
- The quick start guide based on the ``Diabetes_getting_started_example.ipynb`` notebook in
the ``sphinx/auxiliary`` folder is not automatically included (unlike all the other
tutorials). For this reason any updates to this example in the README need to be
reflected in the source notebook and vice-versa.
Tutorial Notebooks
~~~~~~~~~~~~~~~~~~~
Notebooks are used as the basis for detailed tutorials in the documentation. Tutorials
created for documentation need to be placed in ``sphinx/source/tutorial`` folder.
If you intend to create a notebook for inclusion in the documentation please note the
following:
- The notebook should conform to the standard format employed for all notebooks included in
the documentation. This template (``Facet_sphinx_tutorial_template.ipynb``) can be found
in ``sphinx/auxiliary``.
- When creating/revising a tutorial notebook with the development environment the following
code should be added to a cell at the start of the notebook. This will ensure your local
clones (and any changes) are used when running the notebook. The jupyter notebook should
also be instigated from within the ``facet-develop`` environment.
.. code-block:: python
def _set_paths() -> None:
# set the correct path when launched from within PyCharm
module_paths = ["pytools", "facet", "sklearndf"]
import sys
import os
if "cwd" not in globals():
# noinspection PyGlobalUndefined
global cwd
cwd = os.path.join(os.getcwd(), os.pardir, os.pardir, os.pardir)
os.chdir(cwd)
print(f"working dir is '{os.getcwd()}'")
for module_path in module_paths:
if module_path not in sys.path:
sys.path.insert(0, os.path.abspath(f"{cwd}/{os.pardir}/{module_path}/src"))
print(f"added `{sys.path[0]}` to python paths")
_set_paths()
del _set_paths
- If you have a notebook cell you wish to be excluded from the generated documentation, add
``"nbsphinx": "hidden"`` to the metadata of the cell. To change the metadata of a cell,
in the main menu of the jupyter notebook server, click on *View -> CellToolbar -> edit
metadata*, then click on edit Metadata in the top right part of the cell. The modified
metadata would then look something like:
.. code-block:: json
{
"nbsphinx": "hidden"
}
- To interpret a notebook cell as reStructuredText by nbsphinx, make a Raw NBConvert cell,
then click on the jupyter notebook main menu to *View -> CellToolbar -> Raw Cell Format*,
then choose ReST in the dropdown in the top right part of the cell.
- The notebook should be referenced in the ``tutorials.rst`` file with a section structure
as follows:
.. code-block:: RST
NAME OF NEW TUTORIAL
*****************************************************************************
Provide a brief description of the notebook context, such as; regression or
classification, application (e.g., disease prediction), etc.
- Use bullet points to indicate what key things the reader will learn (i.e., key takeaways).
Add a short comment here and direct the reader to download the notebook:
:download:`here `.
.. toctree::
:maxdepth: 1
tutorial/name_of_new_tutorial_nb
- The source data used for the notebook should also be added to the tutorial folder unless
the file is extremely large and/or can be accessed reliably another way.
- For notebooks involving simulation studies, or very long run times consider saving
intermediary outputs to make the notebook more user-friendly. Code the produces the
output should be included as a markdown cell with code designated as python to ensure
appropriate formatting, while preventing the cell from executing should the user run
all cells.
Package builds
--------------
The build process for the PyPI and conda distributions uses the following key
files:
- ``make.py``: generic Python script for package builds. Most configuration is imported
from pytools `make.py `__
which is a build script that wraps the package build, as well as exposing the matrix
dependency definitions specified in the ``pyproject.toml`` as environment variables
- ``pyproject.toml``: metadata for PyPI, build settings and package dependencies
- ``tox.ini``: contains configurations for tox, testenv, flake8, isort, coverage report,
and pytest
- ``condabuild/meta.yml``: metadata for conda, build settings and package dependencies
Versioning
~~~~~~~~~~
FACET version numbering follows the `semantic versioning `_
approach, with the pattern ``MAJOR.MINOR.PATCH``.
The version can be bumped in the ``src/__init__.py`` by updating the
``__version__`` string accordingly.
PyPI
~~~~
PyPI project metadata, build settings and package dependencies
are obtained from ``pyproject.toml``. To build and then publish the package to PyPI,
use the following commands:
.. code-block:: sh
python make.py facet tox default
flit publish
Please note the following:
* Because the PyPI package index is immutable, it is recommended to do a test
upload to `PyPI test `__ first. Ensure all metadata presents
correctly before proceeding to proper publishing. The command to publish to test is
.. code-block:: sh
flit publish --repository testpypi
which requires the specification of testpypi in a special ``.pypirc`` file
with specifications as demonstrated `here
`__.
* The ``pyproject.toml`` does not provide specification for a short description
(displayed in the top gray band on the PyPI page for the package). This description
comes from the ``src/__init__.py`` script.
* `flit `__ which is used here to publish to
PyPI, also has the flexibility to support package building (wheel/sdist) via
``flit build`` and installing the package by copy or symlink via ``flit install``.
* Build output will be stored in the ``dist/`` directory.
Conda
~~~~~
conda build metadata, build settings and package dependencies
are obtained from ``meta.yml``. To build and then publish the package to conda,
use the following commands:
.. code-block:: sh
python make.py facet conda default
anaconda upload --user BCG_Gamma dist/conda/noarch/<*package.tar.gz*>
Please note the following:
- Build output will be stored in the ``dist/`` directory.
- Some useful references for conda builds:
- `Conda build tutorial
`_
- `Conda build metadata reference
`_
Azure DevOps CI/CD
------------------
This project uses `Azure DevOps `_ for CI/CD pipelines.
The pipelines are defined in the ``azure-pipelines.yml`` file and are divided into
the following stages:
* **code_quality_checks**: perform code quality checks for isort, black and flake8.
* **detect_build_config_changes**: detect whether the build configuration as specified
in the ``pyproject.yml`` has been modified. If it has, then a build test is run.
* **Unit tests**: runs all unit tests and then publishes test results and coverage.
* **conda_tox_build**: build the PyPI and conda distribution artifacts.
* **Release**: see release process below for more detail.
* **Docs**: build and publish documentation to GitHub Pages.
Release process
~~~~~~~~~~~~~~~
Before initiating the release process, please ensure the version number
in ``src/__init__.py`` is correct and the format conforms to semantic
versioning. If the version needs to be corrected/bumped then open a PR for the
change and merge into develop before going any further.
The release process has the following key steps:
- Create a new release branch from the tag of the latest release named
``release/`` where ```` is the version number of the new release
- Create a new branch from the baseline branch (e.g., ``2.0.x``) named
``dev/`` where ```` is the version number of the new release
- Opening a PR to merge ``dev/`` onto ``release/``.
This will automatically run all conda/pip build tests via
Azure Pipelines prior to allowing to merge the PR.
This will trigger automatic upload of artifacts (conda and pip
packages) from Azure DevOps. At this stage, it is recommended that the pip package
build is checked using `PyPI test `__ to ensure all
metadata presents correctly. This is important as package versions in
PyPI proper are immutable.
- If everything passes and looks okay, merge the PR using a *merge commit*
(not squashing).
This will trigger the release pipeline which will:
* Tag the release commit with version number as specified in ``src/__init__.py``
* Create a release on GitHub for the new version, please check the `documentation
`__
for details
* Pre-fill the GitHub release title and description, including the changelog based on
commits since the last release. Please note this can be manually edited to be more
succinct afterwards
* Attach build artifacts (conda and pip packages) to GitHub release
* Upload build artifacts to conda/PyPI using ``anaconda upload`` and
``flit publish``, respectively
- Remove any test versions for pip from PyPI test
- Merge ``release/`` back onto the baseline branch from which
``dev/`` was branched
- Bump up version in ``src/__init__.py`` on the baseline branch to start work towards
the next release