Some High-level Tips for Python Projects
Below are tips on a few very high-level and commonly-encountered topics in Python software development.
Directory structure
Suppose the the probject is called ‘becool’. A largely standard directory structure looks like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
becool/
|-- archive/
|-- config/
|-- doc/
|-- becool/
| |-- __init__.py
| |-- module_a.py
| |-- module_b.py
| |-- becooler/
| | |-- __init__.py
| | |-- module_c.py
| | |-- module_d.py
|-- tests/
| |-- __init__.py
| |-- test_module_a.py
| |-- test_module_b.py
| |-- becooler/
| | |-- __init__.py
| | |-- test_module_c.py
| | |-- test_module_d.py
|-- scripts/
|-- README.md
|-- setup.py
Some highlights:
-
The project name, ‘becool’, is also the name of the
git
repo. Inside this repo, the main content a Python package called ‘becool’. Hence the repo and the package are both named after the project. This is a common situation.Although nothing forbids you from having multiple packages in one project (e.g. having a
behot
parallel tobecool
), it’s usually a bad idea. The reason is that ifbecool
andbehot
closely interact, then maybe they should be one single package. On the other hand, if they do not closely interact, then they are independently deployable, therefore it’s better to have them in separate projects (i.e. repos). -
Maintain a clean project file structure, and do not be afraid to adjust or refactor. Some reference; some more.
-
The directory
scripts
should not contain__init__.py
, but may contain files named like*_test.py
ortest_*.py
for testing functions in the scripts.Ideally, content of
scripts
is mainly short launchers of functions in packages. For this reason, the scripts often do not need tests.This directory may also be called
bin
. -
archive
is for any code segments that you have deleted from the “main line” but for some reason still want to keep a visible copy for reference. Do not use a branch for this purpose.
The center of gravity of the repo is the Python library (or “package”) becool
.
Scripts are applications of this library, and should strive to be short.
In the scripts, the package becool
should be treated just like any third-party library.
In particular, file-path hacks should be avoided.
If you execute the code in a Docker container like I do,
all dependencies are taken care of by the definition of the Docker image.
In that case, setup.py
does not need to worry about installing dependencies.
Where to put tests?
There are mainly two approaches. The first is to have tests
subdirectories within the package becool
. Specifically, every directory (includeing the top level becool
) contains a subdirectory named tests
, which hosts tests for the modules (i.e. *.py
files) in its parent directory. The second approach uses a tests
directory separate from the package being tested. This is the approach shown in the digram above. Both approaches are used by major open source projects.
After some time using the first approach, I switched to the second. The switch is prompted by some confusing situations related to running tests against the package that has been “installed” into the system site-packages
. I can think of two other reasons. First, conceptually, “tests” are not the package’s intrinsic functionalities. They are supporting the development of the package, just like “doc” is another supporting component. Second, in substantial projects, one may build some utilities to be used by the test code. In the first approach, the location of this testing infrastructure may become awkward.
Alternative structure with a src
sub-directory
There are discussions about having a src
directory.
This post advocates top-level
directories like src
, docs
, tests
, and a main point for this structure is
related to testing.
Without carefully reading the arguments, I have switched to use this structure. I can think of two advantages of this structure:
-
The top-level directories, like
src
,docs
,tests
, are more logical and extensible because they are on the same level of abstraction. It feels better. -
If there are non-Python code, such as Python extensions in C++, this structure can easily grow with the complexity of the codebase. An experimental package of mine of mine is an example.
Naming and style
For the project and packages, single-word names (or two words without hyphen) are more desirable.
For functions and variables, multi-word, verboses names are fine.
Use ‘snake-case’ names; see examples.
Do study the PEP 8
style guide; pick and stick to a decent style.
I use yapf
to format Python code. This frees the developer from thinking about good style,
and help avoids arguments about style between developers. Usually I use the following command:
1
$ yapf -ir -vv --no-local-style ./
Documentation
Do write documentation to help yourself, if not others.
- In-source documentation: make use of Python’s docstring mechanism; follow the Google style; example 1, example 2.
- Stand-alone documentation files in source tree (but outside of source code files):
README.md
in the top-level (project root) or second-level (package root) directories.- Doc files in
doc
directory under the project root directory. - Plain text files are perferred. Preferred formats are ‘plain text’ (
txt
), ‘mark-down’ (md
), or ‘reStructuredText’ (rst
) (if you intend to do extensive documentation and generate friendly HTML versions usingSphinx
). - If you use Mac, a decent Markdown editor is
MacDown
. - To edit
rst
files with live preview, use theAtom
editor (by Github) with a couple plug-ins. - If you use
Sphinx
,reStructuredText
markup syntax can be used in in-source docstrings as well.
- Outside of source tree: for documentation aimed at non-developers or users from other groups that require different levels of access control.
How to generate documentation using Sphinx
If you go to the length of writing Sphinx documentation, below is my work flow.
-
Basically, let Sphinx take over the
doc
directory in the repo. Indoc/
, run1
sphinx-quickstart
Make sure you make these particular choices:
1 2 3 4 5 6
Separate source and build directories (y/n) [n]: y autodoc: automatically insert docstrings from modules (y/n) [n]: y imgmath: include math, rendered as PNG or SVG images (y/n) [n]: n mathjax: include math, rendered in the browser by MathJax (y/n) [n]: y viewcode: include links to the source code of documented Python objects (y/n) [n]: y Create Makefile? (y/n) [y]: y
In
doc/source/conf.py
, make sure theextensions
list contains at least these items:sphinx.ext.autodoc
sphinx.ext.inheritance_diagram
sphinx.ext.mathjax
sphinx.ext.napoleon
sphinx.ext.todo
sphinx.ext.viewcode
Also make sure
html_theme
is set tonature
(unless you have another preference). After that, search foralabaster
inconf.py
, and comment out that block, because that block is needed for the defaultalabaster
theme but could get in the way of other thems. -
Create
*.rst
files as needed indoc/source/
. These stand-alone doc files along with doc in source code will be used to generate documentation. -
In
doc
, runmake html
to generate HTML documentation, which will be located indoc/build/html/
. View the documentation in a web browser, starting with the filedoc/build/html/index.html
. -
git commit
the filesdoc/Makefile
,doc/source/conf.py
, as well as the*.rst
and other files you’ve created indoc/source
. DO NOTgit commit
the generated material indoc/build
.
Logging
Prefer logging to print
in most cases.
-
The twelve factor app advocates treating log events as an event stream, always sending the stream to standard output, and leaving capture of the stream into files to the execution environment.
For capturing logging into files, see my other post. A more complete example appears in another post.
-
In all modules that need to do logging, always have this, and only this, at the top of the module:
1 2
import logging logger = logging.getLogger(__name__)
then use
logger.info()
etc to create log messages.Do not create custom names for the logger. The
__name__
mechanism will create a hierarchy of loggers following your package structure, e.g. with loggers named1 2 3
package1 package1.module1 package1.module1.submodule2
Log messages will pass to higher levels in this hierarchy. Customizing the logger names will disrupt this useful message forwarding.
-
In the
__init__.py
file in the top-level directory of your Python package, include this:1 2 3
# Set default logging handler to avoid "No handler found" warnings. import logging logging.getLogger(__name__).addHandler(logging.NullHandler())
Do not add any other handler in your package code.
-
Do not do any format, handler (e.g. using a file handler), log level, or other configuration in your package code. These belong in the launch scripts and should happen at exactly one place in a running program.
I basically call something like the following function in the launch script. Usually
level
is the only argument I need to specify.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
import logging import os import time def config_logger(level=None, use_utc=True, datefmt=None, format=None, **kwargs): # 'level' is a string form of the logging levels: 'debug', 'info', 'warning', 'error', 'critical'. if level is None: level = os.environ.get('LOGLEVEL', 'info') if level not in [logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR, logging.CRITICAL]: level = getattr(logging, level.upper()) if use_utc: logging.Formatter.converter = time.gmtime datefmt = datefmt or '%Y-%m-%d %H:%M:%SZ' else: logging.Formatter.converter = time.localtime datefmt = datefmt or '%Y-%m-%d %H:%M:%S' format = format or '[%(asctime)s; %(name)s, %(funcName)s, %(lineno)d; %(levelname)s] %(message)s' logging.basicConfig(format=format, datefmt=datefmt, level=level, **kwargs)
-
Use the old-style string formatting (
%
), not the new-style string formatting (str.format
). The following example should serve most of your fomatting needs:1
logger.info('Line %s (%s) has spent %.2f by hour %d', 'asd9123las', 'Huge Sale!', 28.97, 23)
See here for more about formatting.
Testing
Write tests, and adopt a testing framework. My recommendation is py.test
.
You do not need to import pytest
unless you explicitly use pytest
in the code.
See Directory Structure above for where to put the test files.
(Lightly revised on December 21, 2018.)