Category Archives: Development

Cleaner Code With Functional Programming

I’m a Technical Team Leader at Université catholique de Louvain and one of my responsibilities is to analyse pull requests sent by my team mates and all other contributors of OSIS project. Each pull request is analysed according to some acceptance criteria, which determine if the pull request is accepted right away, accepted after completing requests for changes or refused. This process was taking too long because of excessive requests for changes, revealing too many problems to be fixed before the merges.

To gain some time on pull requests, I’ve decided to invest part of my time training the team to improve the general sense of maintainability, stability and reusability of the code. We’ve started with some pressure to change the established culture by following Robert C. Martin’s Clean Code video series. Uncle Bob, as he’s well known, seems to have radical ideas, but he is quite good at convincing us that he is not that radical at all. This initiative has been very valuable and the quality of recent pull requests has increased substantially.

Training sessions are now a weekly commitment. Nowadays, I’m reinforcing what we have learned during the clean code training, but more focused on the technologies we use. Recently, I’ve brought an interesting problem to be solved in Python using Test-Driven Development (TDD):

Given a list of programmers and the maximum number of team members, let’s write an algorithm to randomly distribute programmers into teams. The maximum number of members is greater than 1. The minimum number of members is implicitly equal to 2. The list of programmers should contain at least two elements. A programmer can be member of only one team or no team at all but never member of 2 or more teams. The return should be a list of lists where the lists represent the teams.

We started writing a failing test and then just enough code to pass it. In this loop we ended up writing 7 tests to check the algorithm. Here is what we’ve got:

import random
import unittest

class TeamBuildingTest(unittest.TestCase):
  def test_list_contains_programmers(self):
    programmers = []
    teams = build_teams(programmers, 2)
    self.assertFalse(teams)

  def test_list_with_a_single_programmer(self):
    programmers = ['John']
    teams = build_teams(programmers, 2)
    self.assertFalse(teams)

  def test_minimal_team_size(self):
    programmers = ['John', 'Mary']
    teams = build_teams(programmers, 1)
    self.assertEqual(len(teams), 1)
    teams = build_teams(programmers, 0)
    self.assertEqual(len(teams), 1)

  def test_multiple_teams(self):
    programmers = ['John', 'Mary', 'Carl', 'Smith']
    teams = build_teams(programmers, 2)
    self.assertEqual(len(teams), 2)

  def test_odd_size_of_programmers_teams(self):
    programmers = ['John', 'Mary', 'Carl', 'Smith', 'Luc', 
                   'Paul', 'Ringo']
    teams = build_teams(programmers, 3)
    self.assertEqual(len(teams), 2)

  def test_minimal_and_maximal_team_size(self):
    programmers = ['John', 'Mary', 'Carl', 'Smith', 'Luc', 
                   'Paul', 'Ringo', 'Adam']
    teams = build_teams(programmers, 3)
    self.assertEqual(len(teams), 3)
    self.assertEqual(len(teams[0]), 3)
    self.assertEqual(len(teams[1]), 3)
    self.assertEqual(len(teams[2]), 2)

  def test_random_teams(self):
    programmers = ['John', 'Mary', 'Carl', 'Smith', 'Luc', 
                   'Paul']
    first_round = build_teams(programmers, 2)
    second_round = build_teams(programmers, 2)
    self.assertNotEqual(first_round, second_round)

We arrived together to the following solution:

def build_teams(programmers, size):
  if not programmers or len(programmers) < 2:
    return []
  else:
    return partition_teams(shuffle_programmers(programmers),
                           size if size > 1 else 2)

def shuffle_programmers(programmers):
  shuffled_programmers = list(programmers)
  random.shuffle(shuffled_programmers)
  return shuffled_programmers

def partition_teams(programmers, size=2):
  teams = []
  team = []

  for i, programmer in enumerate(programmers):
    if i == 0 or i % size != 0:
      team.append(programmer)
    else:
      teams.append(team)
      team = []
      team.append(programmer)
  if len(team) > 1:
    teams.append(team)
  return teams

To run all of it yourself, just put all the code in a file named “team_building.py” and run the tests using the following command in the same folder of the file:

$ python3 -m unittest -f 'team_building.py'

When we were done with the code we realised we didn’t do a good job in the function partition_teams(). It was written in the imperative style, with more variables and less meaningful controls. It’s difficult to extract another function from it without breaking other programming rules. However, with a 100% test coverage, we could rethink the problem and refactor the code without breaking the algorithm. Look at what we’ve got:

def partition_teams(progs, s):
  return [progs[x:y] for x in range(0, len(progs), s)
                     for y in range(s, len(progs) + s, s)
                     if x < y and y - x == s and 
                        len(progs[x:y]) > 1]

The new partition_teams() function uses list comprehension, a functional programming concept implemented in Python and many other languages nowadays. We’ve initially thought about nesting maps, filters and reduces, but it seemed to be heading towards complexity. List comprehension was proposed to address exactly the nested cases of higher order list operations, making the code more expressive.

The point about this list comprehension is to manipulate list slicing intervals to get what we want from the list of programmers. Given a list of 9 programmers from which we expect to extract pair programming teams, the first for produces the sequence 0, 2, 4, 6, and 8 and the second for produces the sequence 2, 4, 6, 8, and 10. When they execute we have a y for each x, producing the combinations (0, 2), (0, 4), (0, 6), (0, 8) … (4, 2), (4, 4), (4, 6) … (8, 6), (8, 8), (8, 10). Considering the rules of slicing, combinations like (4, 2) and (8, 8) won’t help us to solve the problem. So we exclude them with the conditions (x < y) and (y - x == s). It partitions the programmers properly, but it isn't immune to teams of 1 person. Since this kind of team doesn't exist, we include the condition (len(progs[x:y]) > 1) to take them out.

Notice that it has more semantics than the imperative solution because we get a dataset and we keep working with datasets in mind, while the imperative solution is more value-oriented and requires more control. Functional programming fascinates me and I'm glad to work with such a powerful language.

How a Python Script Can Launch a Clojure Application

If you ever need a Python Script that starts a Clojure application, do something else and then stops the application then this post is for you. The script below considers that you have a Clojure application that is self-contained in a single jar file. All the dependencies are there and you run it using java -jar [jar-file.jar]. If you have something different from that, then make sure the Clojure application can run via console from where the Python script is located. Learn the code below by reading the inline comments.

# Copy, paste and adapt this code in a file located in the root
# of your Clojure project, in the same level of the project.clj 
# file.
import os
import signal
import subprocess
import time
import traceback

# Reusable function used to kill a subprocess whenever needed. 
# It is used twice in the code.
def _kill_subprocess(subp):
    os.killpg(os.getpgid(subp.pid), signal.SIGTERM)

# If you want, you can even build your self-contained jar file 
# before executing it.
subprocess.call(['lein', 'ring', 'uberjar'])

# Executes the jar file as a subprocess, so it will be alive 
# as long as the Python script keeps executing.
subp = subprocess.Popen(
    'java -jar target/proj-0.1.0-standalone.jar', 
    shell=True, 
    preexec_fn=os.setsid)

# Waits the application to fully launch before doing something 
# else. You have to adapt the seconds below for the needs of 
# your application.
time.sleep(12)

# Write some Python code that does something while the Clojure
# application is running. When it finishes then the Clojure
# application will also finish. 
try:
    do_something_important()
except:
    traceback.print_exc()
    _kill_subprocess(subp)
    exit(1)

# Kills the subprocess to stop the Clojure application.
_kill_subprocess(subp)
exit(0)

That’s a use case, but if you need something slightly different from that let me know in the comments below.

How To Figure Out Easter Sunday Every Year?

Planning a trip in a high touristic season is hard. Everything is expensive and expected to be crowded. But that’s something we can’t avoid because our vacation must coincide with school holidays. So, by now, we are already planning our vacation during Easter break to reach a fair budget in a nice place.

But, like every geek, I’ve got immediately distracted by a mystery I’ve faced right away: when Easter is going to happen next year? This is one of those holidays that changes every year, like carnival, and I wonder who determines it or how it is calculated. Like every serious procrastinator, I immediately dived into this quest. Holidays planning can wait a little bit.

Wikipedia defines Easter as a moveable feast because it follows lunar cycles instead of Gregorian or Julian calendars. I’m not going into historical details, but ancient people have determined that it comes to be the first Sunday after the full moon that occurs after the spring equinox. The equinox is a day when time is split equally into 12 hours each of light and darkness. However, the equinox doesn’t happen in the same date every year. So, they fixed March 21st to simplify calculation all over the globe.

It’s quite simple to calculate by observation. Just go outside and look at the sky for a full moon after March 21st and do the trick. But I need to know sooner than that to be able to plan our holidays. I have to figure out how to calculate it.

The goal of the calculation is simple: find a date in the Gregorian calendar that corresponds to the state of the lunar cycle in a particular date of the Gregorian calendar. But the means to achieve that are not that simple. I guess I have to calculate lunar phases and when they happen in the Gregorian calendar to compare with March 21st, then we find the first Sunday after that.

After some search on the internet, I found it quite challenging. The calculation of lunar phases is complicated and my wife was pressuring me to stop procrastinating and start planning our trip. When I was about to lose my hope, I found a sample of the book “Practical Astronomy with your Calculator or Spreadsheet”, by Peter Duffett-Smith and Jonathan Zwart, that explains in its first chapter a method devised in 1876 which first appeared in Butcher’s Ecclesiastical Calendar, and which is valid for all years from 1583 onwards. The calculation is quite simple but absolutely enigmatic. Take a look at the
following table considering `year = 2009` as input:

Step Integer part Remainder
(/ x y) (quot x y) (rem x y)
Divide year by 19 a = 14
Divide year by 100 b = 20 c = 9
Divide b by 4 d = 5 e = 0
Divide (b + 8) by 25 f = 1
Divide (b - f + 1) by 3 g = 6
Divide (19a + b - d - g + 15) by 30 h = 20
Divide c by 4 i = 2 k = 1
Divide (32 + 2e + 2i - h - k) by 7 l = 1
Divide (a + 11h + 22l) by 451 m = 0
Divide (h + l - 7m + 114) by 31 n = 4 p = 11
The Easter day falls on day = (p + 1) = 12, month = n = 4 and
year = 2009. Therefore 12/4/2009.

What the hell?! How can anyone make any sense of that?! The disturbing thing is that it works and it’s actually explained in the Book of Common Prayer (1662). I’m completely overwhelmed by curiosity, but I have to leave it for another time. For now, I will simply explain how I’ve implemented that in Clojure.

First, let’s write some failing unit tests to make sure we have our expectations fulfilled. As the reference explains, the algorithm only works for years equal or greater than 1583, so the first test will assure the code throws an exception otherwise.

(ns cosmos.easter-test
  (:require [clojure.test :refer :all]
            [cosmos.easter :as easter]))

(deftest test-minimal-year
  (testing "exception if a year lower than 1583 is informed"
    (is (thrown? IllegalArgumentException 
                 (easter/calculate-easter-date 1582)))))

Another test takes some examples of Easter dates from existing calendars to compare with the results. One of the chosen years must be a leap year just to verify that it doesn’t affect the calculation.

(deftest test-known-easter-dates
  (testing "compares known easter dates with the output"
    (is (= (easter/calculate-easter-date 2000) 
           {:day 23 :month 4 :year 2000}))
    (is (= (easter/calculate-easter-date 2008) 
           {:day 23 :month 3 :year 2008}))
    (is (= (easter/calculate-easter-date 2017) 
           {:day 16 :month 4 :year 2017}))))

Now, let’s write the production code, fully based on the table above, to pass those tests.

(ns cosmos.easter)

(defn calculate-easter-date [year]
  (if (< year 1583)
    (throw (IllegalArgumentException. 
               "Year must be greater than 1582"))
    (let [a (rem  year 19)
          b (quot year 100)
          c (rem  year 100)
          d (quot b 4)
          e (rem  b 4)
          f (quot (+ b 8) 25)
          g (quot (+ b (- f) 1) 3)
          h (rem  (+ (* 19 a) b (- d) (- g) 15) 30)
          i (quot c 4)
          k (rem  c 4)
          l (rem  (+ 32 (* 2 e) (* 2 i) (- h) (- k)) 7)
          m (quot (+ a (* 11 h) (* 22 l)) 451)
          n (quot (+ h l (- (* 7 m)) 114) 31)
          p (rem  (+ h l (- (* 7 m)) 114) 31)]
      {:day (+ p 1) :month n :year year})))

Isn't it amazing?! I wonder what was the reasoning process of the author to come out with such algorithm. Was is a trial-error approach? Who knows. At least, I've got the Easter date right (24/04/2017) and now I can go back to our holiday planning. Wait a minute... what are we going to do in the carnival?! Huuum...

You can find the source code of this post in my GitHub repository cosmos.

Does Your Name Sums Up To A Prime Number?

I just finished a wonderful book! The Curious Incident Of The Dog In The Night-time is a masterpiece, written by Mark Haddon, a famous English children’s book writer who decided to write this book for adults.

That’s a sequence of curious events in the life of a 15-year-old boy, called Christopher, with Asperger syndrome. It’s narrated by himself in a very unique way, telling his efforts to write a book about a small crime that has happened in his street while intercalating the story with his gifted mathematical and logical insights.

A curious thing is that chapters are numbered after prime numbers, rather than conventional successive numbers. The last chapter is 233, which is the 51° prime number. In the chapter 19, he describes a simple algorithm to figure out prime numbers. In his words:

“… you write down all the positive whole numbers in the world. Then you take away all the numbers that are multiples of 2. Then you take away all the numbers that are multiples of 3. Then you take away all the numbers that are multiples of 4 and 5 and 6 and 7 and so on. The numbers that are left are prime numbers”

In one of his insights, he used to get someone’s name, give each letter a value from 1 to 26 and sum the values of each letter of the name to check whether the total is a prime number. So, I thought it would be fun to explore this unique way of seeing things using Clojure. First, we create a map of letters and numbers, where the letters are keys in a map and numbers are their respective values:

(def numbered-alphabet {\a 1  \b 2  \c 3  \d 4  \e 5  \f 6 
                        \g 7  \h 8  \i 9  \j 10 \k 11 \l 12
                        \m 13 \n 14 \o 15 \p 16 \q 17 \r 18
                        \s 19 \t 20 \u 21 \v 22 \w 23 \x 24
                        \y 25 \z 26 \space 0})

The keys are literal characters and the numbers are positive integers. Using the map, we sum the characters of a name:

(defn sum-characters [name]
  ; lower-casing the name to keep the map small
  (let [name (clojure.string/lower-case name)]
    ; given a sequence of characters, reduce it to the sum
    ; of the respective numbers.
    (reduce #(+ (if (char? %1) 
                    (get numbered-alphabet %1) 
                    %1)
                (get numbered-alphabet %2)) 
            (seq name))))

(sum-characters "hildeberto")
>> 98

Is 98 a prime number? No, because it’s even and even numbers have at least 3 divisors: 1, 2 and itself. The algorithm that applies this one and other rules is the following:

(defn is-prime [x]
  (if (or (= x 2) (= x 3))
    true
    ; eliminate even numbers
    (if (even? x)
      false
      ; Number is odd, so it deserves some more attention.
      ; After eliminating even numbers, the maximum divisor
      ; of a number is less than its square root
      (let [max-div (Math/ceil (Math/sqrt x))]
        (loop [i 3]
	  (if (zero? (rem x i))
              false
	      (if (> i max-div)
                  ; it is a prime number
                  true
                  (recur (inc i)))))))))

(is-prime (sum-characters "Hildeberto"))
>> false

That’s a pity my own name doesn’t sum up to a prime number, but the book gave some names to test the code:

(is-prime (sum-characters "Jesus Christ"))
>> true
(is-prime (sum-characters "Scooby Doo"))
>> true
(is-prime (sum-characters "Sherlock Holmes"))
>> true
(is-prime (sum-characters "Doctor Watson"))
>> true

Wait a minute… he uses first and last names! So, let me test with mine:

(is-prime (sum-characters "Hildeberto Mendonca"))
>> true

Yeeeeeesssss! I’m prime-numered!!!

Fluffy Encryption of Numbers Within URLs

In the REST web services world, a resource is anything that’s important enough to be referenced as a thing in itself. It is something that can be stored on a computer and represented as a stream of bits such as a document, a row in a database, or the result of running an algorithm. The book RESTful Web APIs explains very well that every resource has to have at least one URI, which is the name and address of that resource. The URL to a resource in a typical web application looks like {database-table-name}/{record-id}. For example: /blogs/myblog/entries/134 goes from the general to the specific, from a list of blogs to a particular blog, to the entries in that blog, to a particular entry.

I don’t know about you, but when I look at those URLs and see those record ids I feel very uncomfortable. I’m not sure if it has security implications, but it’s strange to let everybody know that they are dealing with the product number 14 or the order number 35. Maybe I’m being paranoid, but it is worth sharing a trick to replace those ids by fake ones.

The method is simple. We will need two secret numbers that should be configurable for each installation of an application. The secret numbers compose a formula to encrypt a number, as illustrated in Clojure:

(def secret-n1 345)
(def secret-n2 3)

(defn encrypt [id] 
  (bit-xor (* id secret-n1) 
           secret-n2))

(encrypt 35)
>> 12072

The id is multiplied by the first secret number (secret-n1) and the result is used to perform a binary XOR (bitwise exclusive or) with the second secret number (secret-n2). The result is a number that gives no clue about the id it is hiding. Here is an example of applying the function encrypt:

; Original URL
(def id 35)
(def url-entries "http://mywebsite.com/myblog/entries/%d")
(def url-entry (format url-entries id))
(format "Original URL: %s" url-entry)
>> Original URL: http://mywebsite.com/myblog/entries/35

; Modified URL
(def url-entry (format url-entries (encrypt id)))
(format "Modified URL: %s" url-entry)
>> Modified URL: http://mywebsite.com/myblog/entries/12072

Notice that 35 is not exposed anymore. In its place we have 12072, which is not a real id or it’s not related to the current record. But how can we recover the original id after processing the request with 12072? Now comes the decrypt function to recover the original id:

(defn decrypt [request-id] 
  (/ (bit-xor request-id secret-n2) 
     secret-n1))

(decrypt 12072)
>> 35

The secret numbers are used here as well but inverting the calculation. This time, we perform the binary XOR first, with the id that comes in the request and the second secret number (secret-n2), and we divide (instead of multiply) the result by the first secret number (secret-n1).

These ids are maybe meaningless to concern us so much, but what about a global user id or a social security number or even a phone number? This method can protect them all.

The only downside I have identified for this technique is that in the event of compromising the security of the secret numbers, all URLs will change when at least one secret number changes. It means that these URLs are not permanent, thus useless to bookmark them. This is not a serious concern, but you should think about the implications before adopting it.