Programming – Razvan Tudorica

[Unpopular opinion] Don’t use poetry for Python dependencies

Even though Poetry for Python project management has gained popularity in the last years, and even I used it in a few projects, like any tool, it may not be the perfect fit for every project or team. Here I summarised just a few arguments that could be made against using poetry:

Learning Curve: For teams or individuals already familiar with other Python packaging and dependency management tools like pip and virtualenv, or even pipenv, adopting poetry can introduce a learning curve. Understanding poetry’s way of managing dependencies, environments, and packages might require time, effort and numerous mistakes.

Overhead for Simple Projects: Poetry provides a comprehensive solution that might be overkill for very simple projects. For small scripts or applications with minimal dependencies, the overhead of managing a poetry environment not be justified. Do you have an API with 5-20 dependencies. Don’t use poetry. Doesn’t make any sense.

Performance Concerns: Poetry’s dependency resolution process can be slower than some alternatives, particularly for projects with a large number of dependencies. This could impact the speed of continuous integration builds or the responsiveness of development workflows. Personally, I had situation in which adding a new package and rebuilding the lock file was taking me more than 1 hour.

Migration Effort for Existing Projects: Migrating an existing project to poetry from another system can require a non-trivial effort. This includes not only technical changes to how dependencies are managed and packaged but also updating any related documentation, developer guides, and CI/CD pipelines. I faced this challenge and it took a long time to migrate all the projects on poetry and to get it right. Also, in the meantime we had to maintain two systems for the same projects.

While poetry may offer some advantages (and the Internet is full of arguments pro-poetry), weighing these potential drawbacks can help determine if it’s the right choice for your situation.

Don’t repeat the logic in the unit tests

There are multiple mistakes you can do, as software engineer, when you define and write the unit tests for your software.

One of the most common I saw it was to repeat the logic from the tested function/method in the unit test function

### contants.py
HARD_UPPER = 10
SOFT_LOWER = 5
INTERMEDIATE_VALUES = [5, 6, 7, 8, 9, 10]

### functions.py
from constants import HARD_UPPER, SOFT_LOWER, INTERMEDIATE_VALUES

def my_func(x):
	if x in INTERMEDIATE_VALUES:
		return x + 1

	if x > HARD_UPPER:
		return x + x

	if x < SOFT_LOWER:
		return x ** x

Let’s imagine we want to test this dummy function, my_func and we would do it in this way:

from functions import my_func

def test_myfunc():
    test_values = [4, 7, 8, 10]
    for x in test_values:
        if x in [5, 6, 7, 8, 9, 10]:
            expected = x + 1
        if x > 10:
            expected = x + x
        if x < 5:
            expected = x ** x

        assert expected == my_func(x)

It looks nice. It tests the boundaries, it seems to test all the intervals. But, let’s imagine someone is going into constans.py and does this change, increases the SOFT_LOWER with 1 and removes 5 from INTERMEDIATE_VALUES.

SOFT_LOWER = 6
INTERMEDIATE_VALUES = [6, 7, 8, 9, 10]

If we run our tests, everything is green, but some results are not the expected ones, for example, my_func(5), before 5 was in INTERMEDIATE_VALUES and the result was 6. Now, 5 is under the condition if x < 5 so, the result is 5 ^ 5 = 3125.

Ofcourse, abose it is just a silly example where I tried to copy/paste the logic from the target function to the test and the easier fix would be just to try to hardcode the boundaries and some itermediate values, like:

def test_myfunc():
    assert my_func(4) == 256
    assert my_func(5) == 6
    assert my_func(10) == 11
    assert my_func(11) == 22
    assert my_func(8) == 9

Now, we can see the test is failing for x=5

>       assert my_func(5) == 6
E       assert 3125 == 6
E        +  where 3125 = my_func(5)

This is the case when the border values really matters and we want to be sure the developer is councious of this change (they will see the tests failing). This such case can be for example the tax applied (we don’t want to change the VAT value for a country too often, right?) or the maximum number of connected devices (eg: Netflix).

If we can argue the values are not so sensitive we could import directly the constants and use them instead of hardcoding in the test (eg: the time when the weekely report email is sent to the team’s PM and someone change it by mistake from 5PM to 5AM).

docker-compose very slow

In case docker-compose is very (very) slow, and if you run it on a virtual machine, check the entropy

cat /proc/sys/kernel/random/entropy_avail

If it is under 100 probably this is the problem.
One possible solution is to install an entropy daemon, like haveged

apt install -y haveged

I had this problem on a VM running at Scaleway.

Backup database in S3

A simple cron backup script of the databse in aws s3. Our script is called mysqlbackup.sh and it looks like this:

#!/bin/bash
DB="razvantudorica"
NOW=$(date +"%m_%d_%Y")
BACKUPFILE="${DB}_${NOW}.sql.gz"
mysqldump --login-path=s3gallery --databases $DB | gzip > $BACKUPFILE
aws s3 cp $BACKUPFILE s3://backup.razvantudorica.net/$BACKUPFILE --profile backuper
rm $BACKUPFILE

And now a few explanations about the script.

First of all we need to have installed the awscli command.

Afterwords, as you can see, the database password is not hardcoded into the script. We can setup the password with mysql_config_editor. This will store authentication credentials in an encrypted login file named .mylogin.cnf.

For our example database, razvantudorica, and database user myuser, we can run

mysql_config_editor set --login-path=razvantudorica --host=localhost --user=myuser --password

The next step is to configure aws s3 bucket and credentials.

Create a bucket in s3, in our example is called backup.razvantudorica.net.
Create the IAM credentials and save them in ~/.aws/credentials as

[backuper]
aws_access_key_id=AK... 
aws_secret_access_key=...

And in ~/.aws/config add

[profile backuper]
region=eu-west-1
output=json

The last step is to test our script. If no error occurs at running and the backup file is uploaded successfully in S3, then everything is correct and we can add it the crontab list.
Run crontab -e and add this line

0  1 * * * /root/mysqlbackup.sh >> /var/log/mysqlbackup.log

Unknown database type enum requested

Using symfony (5) console command to create new migration based on my entities, I encountered this error.

php bin/console doctrine:migrations:diff

Unknown database type enum requested, Doctrine\DBAL\Platforms\MySQL57Platform may not support it.

The simplest solution is to add mapping_types in config\packages\doctrine.yml

doctrine:
    dbal:
        # ...
        mapping_types:
            enum: string