Style guide
===========
This is a basic style guide for the Dissect projects. The goal of this guide is to increase the understandability and maintainability
of both code and documentation.
Applicability
-------------
This guide is applicable to both new and existing code and the Dissect build pipeline enforces the most important rules.
Certain exceptions are made for older parts of the code as they were written before the creation of this guide.
New code
^^^^^^^^
When submitting new code for inclusion in one of the Dissect projects, your code should adhere
to this guide unless there is a valid reason not to do so. This motivation should be added to your eventual
Pull Request.
Older code
^^^^^^^^^^
When submitting changes to existing code that does not yet adhere to this style guide, a choice should be made
whether to make the change conformant to the guidelines. You can use the rules below to help you decide
what to do in these cases.
If the change in existing code is
- large, it is best to refactor the function, method or class according to these guidelines.
- small and rewriting the code for guideline conformance would not be proportional to the change itself, you may submit the code using the original styling.
.. note::
Regardless of conformance to this style guide, any change you make should be understandable and clear in its functioning.
Code style and formatting
-------------------------
This section lists how to format code in a readable and consistent manner and which specifications and tools are used to
enforce them.
PEP 8 and Black
^^^^^^^^^^^^^^^
The code should adhere to the `PEP 8 `_ Python code style. The adherence to PEP 8
is checked using `Flake8 `_. Flake ``E203`` errors can be ignored due to the ambiguous nature
of these errors (see ``_).
The formatting of the code layout is further refined by using `Black `_.
Black provides functionality to automatically format code and enforces consistent coding style between files and projects regardless
of the author. It also relieves authors of the burden of having to actively think about the formatting.
PEP 8 and Black styles are mandatory. This is configured in the project's ``tox.ini`` files and tested for by our build pipeline.
Maximum line length
^^^^^^^^^^^^^^^^^^^
Lines should be limited to 120 characters. For modern console sizes this gives a bit more room compared to the
standard 80-character limit without sacrificing readability, probably even increasing it.
Type hinting
^^^^^^^^^^^^
New functions and classes should be fully type hinted. The combination of type hinting and docstrings helps in understanding
what the function or class does and how it should be used.
Import order
^^^^^^^^^^^^
Import statements for files and modules are divided into three groups and should be ordered as indicated below:
1. builtin modules
2. modules from external projects *including* other Dissect projects, e.g. PyYAML
3. modules from the project itself.
The imports within each group should be in alphabetical order, as in the example below:
.. code-block:: python
import builtins_a
from builtins_a import foo
import builtins_b
import externals_a
from externals_a import bar
import externals_b
import other_dissect_project
import this_dissect_project
from this_dissect_project import bla
Formatting tuples
^^^^^^^^^^^^^^^^^
Care should be taken when formatting tuples as Black attempts to reformat all elements into a single line.
To prevent this, add a comma (``,``) after the last item of the tuple, like this:
.. code-block:: python
function(
param1,
param2,
)
Coincidentally, this also gives cleaner code diffs when adding or removing items from the tuple later.
Naming variables
^^^^^^^^^^^^^^^^
Naming variables can be challenging. When deciding on a variable name, take the following rules into account:
* Avoid single-character variable names.
* Don't name variables after their type (list, dict etc.).
Incorporating dissect.cstruct definitions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Writing structure definitions is an essential part of writing a new parser. The following rules show how to
format them properly.
Split definition and loading
""""""""""""""""""""""""""""
When using ``dissect.cstruct`` to define and load C structures, split the definition of the structure and the loading of the
structure:
.. code-block:: python
c_def = """
#define SOME_C_DEF = 1
"""
c_obj = cstruct.load(c_def)
This increases readability and allows you to add a ``# noqa: E501`` after the string defining the C structure. This is useful
if the definition comes from an external source which has lines that are too long, but you want to keep the original layout.
Styling structure definitions
"""""""""""""""""""""""""""""
The main rule for styling structure definitions is to keep the style similar to the original structures when this is possible.
Below follows more specific rules depending on the availability of the structures:
1. If open-source or openly documented structures are available, use them as much as possible. Changing field types or slightly
altering structures for performance or compatibility reasons is encouraged. For example, ``char[n]`` is faster than ``int8[n]``,
or changing a ``GUID field_name`` to ``char field_name[16]``.
2. If no original structures are available, make an educated guess on what they could look like in the original source.
For example, during reverse engineering you see a debug log message that uses ``lowerCamelCase`` field names, use that
style for your field names.
If no discernible style is visible, you can use the following general rules:
* For a Microsoft file format, use ``UPPERCASE_NAME`` structure names and ``CamelCase`` field names.
* One exception is that field prefixes like ``dw`` and ``cb`` should be removed, even when copy-pasting structures.
* For other file formats, use ``lowercase_name`` structure and field names.
Documentation style and formatting
----------------------------------
New code needs to be documented properly using docstrings. To understand how documentation
is organised and generated, check out the :doc:`developing for Dissect ` page.
Use of docstrings
^^^^^^^^^^^^^^^^^
Functions and classes should have docstrings detailing what that function or class does and/or how it should be used. They
should be formatted as described in the `Google docstring format `_.
The first line of a docstring should contain a short sentence describing the nature of the function/class, followed by an
empty line and optionally a more verbose explanation detailing how the function/class goes about doing its thing and/or how
it should be used. Finally, add an indented list of arguments, return value(s) and exceptions which can be raised according
to the Google docstring format.
Typing of parameters should be done through type hinting.
Use the ``References:`` clause when referencing external resources such as URLs to websites.
Example docstrings
^^^^^^^^^^^^^^^^^^
An example of how to use the docstring to comment a function/method:
.. literalinclude:: codestyle.py
The examples above look like this:
.. automodule:: codestyle
:members:
The most important takeaways are:
* Use ``typehints`` so type information gets automatically added to the documentation
* ``Args:`` To document parameters
* ``Returns:`` To document what it specifically returns
* ``Raises:`` To document if it raises a specific exception and why
Commit message style and formatting
-----------------------------------
Commit messages should adhere to the following points:
* Separate subject from body with a blank line
* Limit the subject line to 50 characters as much as possible
* Capitalize the subject line
* Do not end the subject line with a period
* Use the imperative mood in the subject line
* The verb should represent what was accomplished (Create, Add, Fix etc)
* Wrap the body at 72 characters
* Use the body to explain the what and why vs. the how
Example commit message
^^^^^^^^^^^^^^^^^^^^^^
An example of a properly formatted commit message:
.. code-block:: text
Fix parsing extra NULL bytes in the NTFS header
Sometimes extra null bytes can be present at the end of the NTFS allocator
table, this patch makes sure they are not included in the next header
structure.