Learning Pieces

Docker

Test deployment service

During the development, I would like to have an easy way to start the Docker services I've created based on the most recent changes in the code. The solution I came up with involved creating a production and a test environment.

My first approach was to create a test pypi-server and a test environment for the journal-manager. The idea was that the journal-manager services would install the journal-manager packages from the test pypi-server.

Info

A neat thing that docker-compose does is to create a network among the services described in the docker-compose.yml file.

I put both test pypi-server and the test journal-manager services in the same docker-compose file to take advantage of the network docker created for the services contained in the docker-compose file. That didn't work. I needed to call pip install indexing the test pypi-server from the image build, but the network between the services is only created after the build step.

Note

A solution would be to setup the test pypi-server in a different docker-compose file and have it started before building the image.

Upload the wheel and extract it during image build

I decided to take a simpler approach.

Build the journal package in my dev machine, producing a wheel.
Copy the wheel to a deploy-packages folder in the deployment server. This folder is part of the image context.
Install the journal-manager from the wheel available in the deploy-package during the build.

The deploy-package folder has a folder for production and test packages. The deploy can be configured to use one or another. Also, depending of the deploy mode, we build the test services or the production services.

Bash

Executing commands via ssh in a remote machine

I had to manually set the PATH variable to contain the path /usr/local/bin.

ssh -p "${SSH_PORT}" "${DEPLOY_USER}@${DEPLOY_HOST}" "export PATH=\$PATH:/usr/local/bin; echo \$PATH; ${DEPLOY_SCRIPT} ${BUILD_MODE} ${DEPLOY_ACTION}"

Info

I've set up ssh and scp to use public/private key pair identification file. More info here

Fermat's little theorem: For any integer \(a\) and any prime number \(p\): \(a^p - a\) is a multiple of \(p\).
Totient function: \(\phi(n)\) denotes the number of co-primes of \(n\).
Totient of prime product: Given prime numbers \(p,q\): \(\phi(pq) = \phi(p)\phi(q) = (p-1)(q-1)\).
Euller's theorem: For any integer \(n\) and \(a\) co-prime of with \(n\): \(a^{\phi(n)} \equiv 1 \mod n\).

Let keyword

The let keyword allows you to compute arithmetic operations.

let "result = (a+b) * (c-d)"

Error handling

Bash uses the error code returned by programs to handle errors. A zero code means success and a non-zero code means error.

One can individually handle errors in bash by using the logical OR operator ||.

command1 || {echo "Command failed"; exit 101;}

You can also trigger cleanup functions or special treatment for some types of errors using trap.

trap cleanup EXIT 
trap handle_termination_signal TERM

Bash has a setting that instructs it to exit if a command fails.

set -exuo pipefail

-e: immediately exit if an error occurs.
-x: print every command that is executed.
-u: if set, any reference to an unset variable is an error.
-o pipefail: any command that fails in a pipe sequence will return an error code.

Vim

Pasting to vim command line

yank a text
go to command mode and press <CTRL> + R and then press "
vim registers
Type registers to display register's contents.
Type help registers to get more information.

MyPy

Static type checking in test files

To locate a module, mypy relies on __init.py__ files.

--no-namespace-packages: In pkg/a/b/mod.py you will need an __init__.py in each folder of mod.py path.
--namespace-packages on and --explicit-package-bases off: In this case, you only need that the top-level folder of the module path have a __init.py__.
--namespace-packages on and --explicit-pacjage-bases on: You don't need any __init__.py, but you mypy will only recognize modules that are inside the folders specified in the mypy_path.

mypy_path = src:test
namespace_packages = true
explicit_package_bases = true

mypy documentation: mypy_path

Pytest

A note about naming convention

Not only the functions but also the files need to follow a name convention. I had t_setup.py and nothing was collecting until I haven't changed to test_setup.py

Setting PYTHONPATH in the configuration

The pythonpath configuration attribute. This variable contains the directories Python will look at during module importing.

General python

Logging

The Python logging library is a very complete solution for your logging problems. You can find most of the information in this link:

Logging cookbook

Some features I discovered recently are:

And also some bad patterns:

Creating a lot of loggers

importlib_resources

The importlib_resources package is built on the top of Python import system to facilitate the use of package resources on your packages.

from importlib_resources import files, as_file

class BuildIndexPage:
    assets = files("danoan.journal_manager.templates").joinpath("material-index", "assets")

    def build(self):
        build_result = super().build()
        if isinstance(build_result, FailedStep):
            return build_result

        try:
            if not self.build_instructions.build_index:
                return self

            env = Environment(
                loader=PackageLoader("danoan.journal_manager", package_path="templates")
            )

            with as_file(BuildIndexPage.assets) as assets_path:
                shutil.copytree(assets_path, self.journals_site_folder.joinpath("assets"))
        except:
            pass

More on the importlib_resources package can be found here

Type hinting of classmethods

I want to give as type hint of a classmethod the very class in which it is defined.

To do that, we need to use TypeVar

from typing import Optional, TypeVar, Type


class TomlDataClassIO:
    """
    Base class for a simple dataclass (i.e. with no mapping types)
    """

    T = TypeVar("T", bound="TomlDataClassIO")

    @classmethod
    def read(cls: Type[T], filepath: str) -> Optional[T]:
        pass

Type variables are useful for generic programming.

Info

Notice that as soon as the type is bounded to a type, this type does not change. You can also explicitly bound to a type, as it was done in the example above with the parameter bound.

Covariant and contravariant types

isinstance, type, mro and getattr

class Shape:
    name: str

class Square(Shape):
    length: int

sh = Shape("my-shape")
sq = Square("my-square",10)

t_sh = type(sh)
t_sq = type(sq)

isinstance(sq,sh) # True

type(sh) == t_sh  # True ( <class 'Shape'> )
type(sq) == t_sq # True ( )<class 'Square'> )

# Given a type, how to check if an instance of this type is an
# instance of a base class?

# 1. Instantiate the type

isinstance( t_sq("my-other-square", 5), sh) # True

# 2. Use mro()

Shape in t_sq.mro() # True

Note

MRO stands for Method Resolution Order.

Some types do not have the "mro" attribute. Make sure to check for it with the getattr function.

ts = str
to = Optional[str]

getattr(ts, "mro", None) # returns the mro builtin method
getattr(to, "mro", None) # None

Origins of Python Functional Features

Blog post by Guido van Rossum

Take aways:

Python had functions as first-class objects since the beginning but was not intended to be a functional programming language.
Lambda is a more a syntactic feature. It is less powerful than lambdas in other programming languages. Think about using variables within the scope the lambda is defined.
The bike shed and the atomic bomb plant

Inheritance x Composition

Takeaways

super() returns a proxy object.
You can interfere in the hierarchy lookup by passing super some parameters. But doing so might be an indication of a design issue.
is a (base and derived) and has a (component and composite) relationships.
Duck typing makes the declaration of interfaces not necessary. But you can still create abstract classes for that (what about Protocols?)
When doing multiple inheritance, think about inheriting from one base class and possibly implementing several interfaces.
Multiple inheritance can lead to unexpected resolution of the __init__ due to MRO. This is particularly problematic if you are running in a class hierarchy design problem called the diamond problem.
Composition is a loosely-coupled alternative to class inheritance. Usually, this design is more flexible then class inheritance (vulnerable to class explosion and the diamond problem).
Mixins are kind of components, but using inheritance. They are more strongly coupled but can be useful to inherit an interface implementation.

Danger

Mixins should not be viewed as base classes. It does not model a is a relationship. It is also not a component. It is more like to a skill. Indeed, it is good to make clear in the class name that the class is a Mixin.

Note

The TomlDataClassIO fits better in the concept of a Mixin.

Superclass and subclass terminology

A derived class D inherits all the methods of its base class B and eventually implements some others. The set of methods of D is a superset of the set of methods of B. This could lead to confusion with respect to the superclass and subclass terminology.

However, the terminology is actually correct. The super and sub terminology refers to the set of instances and not the set of methods or attributes. In the sense that every instance of D is also an instance of (or it can be though of) as as instance of B.

In a concrete example: every Mammal is an Animal, but not every Animal is a Mammal. Therefore, Animal is a superset of the Mammals which makes Mammals a subset of the Animals.

This was once asked in stackoverflow

Danger

The confusion is not out of place though. In OOP there is a technique called mixin which works the same way as regular inheritance. But conceptually, it is different. When an object D inherits a mixin B, we are not modelling the relation is-a. Instead, we are modelling includes-a. Under this concept, it makes much more sense to say that D is a superclass of B.

Method Resolution Order

Every Python object has a __mro__ attribute that tells you the order in which methods are going to be resolved. It is an ordered list of classes in which Python will look, in order, while searching for a method after a statement call.

One of the things that interferes in the MRO is the order of classes in multiple inheritance designs. They are done bottom-to-top and left-to-right

Special type hinting in Python

Takeaways

Protocols are the formalization of duck-typing. You can think of a Protocol P as a type specified in the form of a class with attributes and methods. Every class that has the attributes and methods specified in the Protocol P are said to be of type P.
Particularly useful in static type checking.
Lightweight interfaces (also more flexible than the tight-coupled regular class interfaces).
Use properties to:
1. read-only attributes;
2. lazy computation;
3. api compatibility.
Properties are descriptors. A Descriptor is any class that implements the Descriptor protocol, that is, that implements one or more of the methods below:
1. __get__(self,obj,type=None)->Object;
2. __set__(self, obj,value)->None;
3. __delete__(self,obj) -> none
4. __set_name__(self, owner, name)
If implements only __get__, then it is a non-data descriptor. If it also implements __set__ or __del__, then it is a data descriptor.
Lookup chain: That is the order in which Python access attributes.
1. __get__ data descriptor;
2. Object's __dict__;
3. __get__ non-data descriptor;
4. Class' __dict__;
5. Object parent's class' __dict__;
6. Repeat previous action until no more base class;
7. Raise AttributeError.

Python multiprocessing library

Signal handling

https://superfastpython.com/kill-a-process-in-python/
Testing sys.exit with pytest

Dictionary expasion in function calls

'''
Expanding a dictionary in a function call can also replace
positional arguments.
'''

def f(p1,n1=None,n2=None):
    print(f"{p1},{n1},{n2}")

d1 = {"p1":"positional", "n1":"named 1", "n2":"named 2"}

# Calling with a dict expansion only
f(**d1)
# positional, named 1, named 2

# This raises an error
f(p1="new positional", **d1)
# multiple values for keyword argument p1

# This one also fails
f("does not work", **d1)
# multiple values for keyword argument p1

# Also this one
f(n1="new named 1", **d1)
# multiple values for keyword argument n1

Python Importing System

Modules - Python Tutorial Python Import System

There are some subtleties in the Python import system that I should master.

My initial model for the importing system is the following:

Package: A collection of modules (you need to put a init.py in order to Python recognize it as a package)
Module: A collection of python instructions (usually grouped in a .py file)

It is based on the filesystem analogy in which directories are packages and modules are files. But this is not quite correct. From the Python Import System documentation page:

It’s important to keep in mind that all packages are modules, but not all modules are packages. Or put another way, packages are just a special kind of module. Specifically, any module that contains a __path__ attribute is considered a package.

Import examples

# In this form, you have to call it by the complete name
import package.subpackage.module.function_A 

# In this form, you can call function_A or function_B directly
from package.subpackage.module import function_A, function_B

Import a package as a namespace

Consider the following file hierarchy:

danoan/journal_manager/commands
    __init__.py
    journal_commands/
        __init__.py
        activate.py
        create.py
        deactivate.py

Let us say that I want to access the modules in the journal_commands from a jm namespace. That is,

import danoan.journal_manager.commands.journal_commads as jm
jm.activate()

Then I need to add import statements in the __init__.py file of journal_commands package. That is,

# __init__.py
from .activate import activate
from .create import create
from .deactivate import deactivate

Somewhat odd behaviour

There is an odd behaviour though. Assume we have the __init__.py as the one above.

import danoan.journal_manager.commands.journal_commands as jm
from danoan.journal_manager.commands.journal_commands import activate

If we check the globals() function we get

jm: <module 'danoan.journal_manager.commands.journal_commands' from '/home/daniel/Projects/Git/journal-manager/src/danoan/journal_manager/commands/journal_commands/__init__.py'>

'activate': <function activate at 0x7fe10eabc3a0>

I was expecting that activate would be resolved to the module activate. Indeed, it is resolved to the module activate if we have an empty __init__.py for the journal_commands package.

It is not that odd if you think that after

import danoan.journal_manager.commands.journal_commands as jm

we have the import statements in the __init__.py being called and the functions imported there will collide with the module names, that are the same. What is happening is that the import is overwriting the entry which points to a module to point to the function instead.

If we write like this:

# __init__.py
from .activate import activate_journal
from .create import create_journal
from .deactivate import deactivate_journal

no collision occurs. jm.activate_journal is a function and activate is a module.

Multiprocessing and Signal Handling

In order to correctly terminate the processes I started during the build subcommand with the --with-http-server flag (namely the node http-server and the entr file monitor), I had to register handlers for the SIGINT and SIGTERM signs. Both in the Python and Bash scripts.

t1 = multiprocessing.Process(
    target=node_wrapper.start_server, args=[http_server_folder.joinpath("init.js")]
)
t2 = multiprocessing.Process(target=app_call.start, args=[file_monitor_script])

t1.start()
t2.start()

def terminate_processes(sig, frame):
    print("Terminating http server")
    t1.terminate()
    print("Terminating file-monitor")
    t2.terminate()

signal.signal(signal.SIGINT, terminate_processes)
signal.signal(signal.SIGTERM, terminate_processes)

t1.join()
t2.join()

function handler_signal_int()
{
    echo "Exiting file-monitor"
    exit 0
}

trap handler_signal_int SIGINT

Pypi-server

Setting up the Pypi-server

Set up a service in your docker-compose.yml

iversion: '3.7'

services:
  pypi-server:
    image: pypiserver/pypiserver:latest
    ports:
      - 4962:8080
    volumes:
      - type: bind
        source: /Users/capitu/Services/pypi-server/auth
        target: /data/auth
      - type: volume
        source: pypi-server
        target: /data/packages
    command: -P /data/auth/.htpasswd -a update,download,list /data/packages
    restart: always

volumes:
  pypi-server:

After starting the server, you should be able to access the index in the address capitu_home:4962/packages

Setting up authentication

Create the authentication files with htpasswd

cd /Users/capitu/Services/pypi-server/auth
htpasswd -sc .htpasswd <SOME-USERNAME>

Note

htpasswd is an utility tool to manage user authentication in web servers. It store usernames and their passwords (digest with the sha-1 method in the case above) in a text file.

Uploading your package

Build the distribution with your front-end tool (for example, build)

pyproject-build .

We are using twine to upload the package to the server:

pipx install twine
twine upload --repository-url http://192.168.1.14:4962 dev/extract-sdist/output/dist/*

More information can be found here.

Sphinx

Sphinx Templates

It is possible to modify the output generated by sphinx-apidoc by using templates

apidoc: sphinx extension that extends sphinx with some directives that are able to render information from python source code.
sphinx-apidoc: tool that uses the apidoc sphinx extension to generate API alike documentation.