Tag Archives: python

Cleaner Code With Functional Programming

I’m a Technical Team Leader at Université catholique de Louvain and one of my responsibilities is to analyse pull requests sent by my team mates and all other contributors of OSIS project. Each pull request is analysed according to some acceptance criteria, which determine if the pull request is accepted right away, accepted after completing requests for changes or refused. This process was taking too long because of excessive requests for changes, revealing too many problems to be fixed before the merges.

To gain some time on pull requests, I’ve decided to invest part of my time training the team to improve the general sense of maintainability, stability and reusability of the code. We’ve started with some pressure to change the established culture by following Robert C. Martin’s Clean Code video series. Uncle Bob, as he’s well known, seems to have radical ideas, but he is quite good at convincing us that he is not that radical at all. This initiative has been very valuable and the quality of recent pull requests has increased substantially.

Training sessions are now a weekly commitment. Nowadays, I’m reinforcing what we have learned during the clean code training, but more focused on the technologies we use. Recently, I’ve brought an interesting problem to be solved in Python using Test-Driven Development (TDD):

Given a list of programmers and the maximum number of team members, let’s write an algorithm to randomly distribute programmers into teams. The maximum number of members is greater than 1. The minimum number of members is implicitly equal to 2. The list of programmers should contain at least two elements. A programmer can be member of only one team or no team at all but never member of 2 or more teams. The return should be a list of lists where the lists represent the teams.

We started writing a failing test and then just enough code to pass it. In this loop we ended up writing 7 tests to check the algorithm. Here is what we’ve got:

import random
import unittest

class TeamBuildingTest(unittest.TestCase):
  def test_list_contains_programmers(self):
    programmers = []
    teams = build_teams(programmers, 2)
    self.assertFalse(teams)

  def test_list_with_a_single_programmer(self):
    programmers = ['John']
    teams = build_teams(programmers, 2)
    self.assertFalse(teams)

  def test_minimal_team_size(self):
    programmers = ['John', 'Mary']
    teams = build_teams(programmers, 1)
    self.assertEqual(len(teams), 1)
    teams = build_teams(programmers, 0)
    self.assertEqual(len(teams), 1)

  def test_multiple_teams(self):
    programmers = ['John', 'Mary', 'Carl', 'Smith']
    teams = build_teams(programmers, 2)
    self.assertEqual(len(teams), 2)

  def test_odd_size_of_programmers_teams(self):
    programmers = ['John', 'Mary', 'Carl', 'Smith', 'Luc', 
                   'Paul', 'Ringo']
    teams = build_teams(programmers, 3)
    self.assertEqual(len(teams), 2)

  def test_minimal_and_maximal_team_size(self):
    programmers = ['John', 'Mary', 'Carl', 'Smith', 'Luc', 
                   'Paul', 'Ringo', 'Adam']
    teams = build_teams(programmers, 3)
    self.assertEqual(len(teams), 3)
    self.assertEqual(len(teams[0]), 3)
    self.assertEqual(len(teams[1]), 3)
    self.assertEqual(len(teams[2]), 2)

  def test_random_teams(self):
    programmers = ['John', 'Mary', 'Carl', 'Smith', 'Luc', 
                   'Paul']
    first_round = build_teams(programmers, 2)
    second_round = build_teams(programmers, 2)
    self.assertNotEqual(first_round, second_round)

We arrived together to the following solution:

def build_teams(programmers, size):
  if not programmers or len(programmers) < 2:
    return []
  else:
    return partition_teams(shuffle_programmers(programmers),
                           size if size > 1 else 2)

def shuffle_programmers(programmers):
  shuffled_programmers = list(programmers)
  random.shuffle(shuffled_programmers)
  return shuffled_programmers

def partition_teams(programmers, size=2):
  teams = []
  team = []

  for i, programmer in enumerate(programmers):
    if i == 0 or i % size != 0:
      team.append(programmer)
    else:
      teams.append(team)
      team = []
      team.append(programmer)
  if len(team) > 1:
    teams.append(team)
  return teams

To run all of it yourself, just put all the code in a file named “team_building.py” and run the tests using the following command in the same folder of the file:

$ python3 -m unittest -f 'team_building.py'

When we were done with the code we realised we didn’t do a good job in the function partition_teams(). It was written in the imperative style, with more variables and less meaningful controls. It’s difficult to extract another function from it without breaking other programming rules. However, with a 100% test coverage, we could rethink the problem and refactor the code without breaking the algorithm. Look at what we’ve got:

def partition_teams(progs, s):
  return [progs[x:y] for x in range(0, len(progs), s)
                     for y in range(s, len(progs) + s, s)
                     if x < y and y - x == s and 
                        len(progs[x:y]) > 1]

The new partition_teams() function uses list comprehension, a functional programming concept implemented in Python and many other languages nowadays. We’ve initially thought about nesting maps, filters and reduces, but it seemed to be heading towards complexity. List comprehension was proposed to address exactly the nested cases of higher order list operations, making the code more expressive.

The point about this list comprehension is to manipulate list slicing intervals to get what we want from the list of programmers. Given a list of 9 programmers from which we expect to extract pair programming teams, the first for produces the sequence 0, 2, 4, 6, and 8 and the second for produces the sequence 2, 4, 6, 8, and 10. When they execute we have a y for each x, producing the combinations (0, 2), (0, 4), (0, 6), (0, 8) … (4, 2), (4, 4), (4, 6) … (8, 6), (8, 8), (8, 10). Considering the rules of slicing, combinations like (4, 2) and (8, 8) won’t help us to solve the problem. So we exclude them with the conditions (x < y) and (y - x == s). It partitions the programmers properly, but it isn't immune to teams of 1 person. Since this kind of team doesn't exist, we include the condition (len(progs[x:y]) > 1) to take them out.

Notice that it has more semantics than the imperative solution because we get a dataset and we keep working with datasets in mind, while the imperative solution is more value-oriented and requires more control. Functional programming fascinates me and I'm glad to work with such a powerful language.

How a Python Script Can Launch a Clojure Application

If you ever need a Python Script that starts a Clojure application, do something else and then stops the application then this post is for you. The script below considers that you have a Clojure application that is self-contained in a single jar file. All the dependencies are there and you run it using java -jar [jar-file.jar]. If you have something different from that, then make sure the Clojure application can run via console from where the Python script is located. Learn the code below by reading the inline comments.

# Copy, paste and adapt this code in a file located in the root
# of your Clojure project, in the same level of the project.clj 
# file.
import os
import signal
import subprocess
import time
import traceback

# Reusable function used to kill a subprocess whenever needed. 
# It is used twice in the code.
def _kill_subprocess(subp):
    os.killpg(os.getpgid(subp.pid), signal.SIGTERM)

# If you want, you can even build your self-contained jar file 
# before executing it.
subprocess.call(['lein', 'ring', 'uberjar'])

# Executes the jar file as a subprocess, so it will be alive 
# as long as the Python script keeps executing.
subp = subprocess.Popen(
    'java -jar target/proj-0.1.0-standalone.jar', 
    shell=True, 
    preexec_fn=os.setsid)

# Waits the application to fully launch before doing something 
# else. You have to adapt the seconds below for the needs of 
# your application.
time.sleep(12)

# Write some Python code that does something while the Clojure
# application is running. When it finishes then the Clojure
# application will also finish. 
try:
    do_something_important()
except:
    traceback.print_exc()
    _kill_subprocess(subp)
    exit(1)

# Kills the subprocess to stop the Clojure application.
_kill_subprocess(subp)
exit(0)

That’s a use case, but if you need something slightly different from that let me know in the comments below.

Book Review: Programming Collective Intelligence

When the volume of data managed by our application significantly grows, we start thinking about ways of extracting information from the databases. We know that there are a lot of hidden information there, but we don’t know exactly what they are and how to get them. Year after year, people have been working on solutions such as data mining, data-warehouse, business intelligence, and others to figure out those hidden information. At the same time, people also look for ways of using those new information not only for reporting, but also to give constant feedback to the application in order to achieve better results over time. This looks very much to a self-learning mechanism, that we can formally call machine learning.

Nowadays, a new dimension are absolutely relevant when applying machine learning on data analysis: the user’s behaviour and preferences as individuals and as part of the collectivity. This is clearly the influence of social networks on all sort of applications. They are exploring new possibilities of improving the user experience, but how can we be part of that party? Do we need to get a PhD degree to follow the wave? Is it finally possible to make the subject accessible to ordinary programmers? No, we don’t need a PhD degree and yes, it’s finally possible to make our applications smarter. This is what I’ve noticed while reading the book Programming Collective Intelligence, written by Toby Segaran.

At the beginning, I felt uneasy with the chosen programming language – Python – but I was gradually getting used to it because Python is not that hard to understand. In any case, I think this book would reach a larger audience by writing the examples in Java, which is constantly fighting with C for the first position in a well known ranking of popularity. On the other hand, the book targets more websites/portals and we know that Java is not really the first choice when companies decide to develop an application for a larger audience. Independent on the programming language we master, we can always convert what we’ve understood from Python to our favorite language. At the end, the success when implementing those machine learning algorithms come from our ability to solve computational problems, not from our ability of learning new programming languages.

On which concerns the content, the book brings together a large variety of real world problems from websites and services that we frequently use. It concludes reviewing and comparing the described algorithms, helping the reader to easily visualize the applicability of the different methods and to decide which one is the best for the problem we are actually facing.

This book has a new and better approach to present complex subjects such as data mining, machine learning, and statistics. By contextualizing the subject with real scenarios, the author improved the didactic in an unprecedented manner, since this is not the case in academic books. Actually, Programming Collective Intelligence can also be used by undergraduate and graduate students, since there are exercises at the end of each chapter. This is great because sometimes we finish the study of a chapter without realizing what else can be done with those algorithms. This is not common in this kind of book (non-academic) and yet very appreciated.