When the volume of data managed by our application significantly grows, we start thinking about ways of extracting information from the databases. We know that there are a lot of hidden information there, but we don’t know exactly what they are and how to get them. Year after year, people have been working on solutions such as data mining, data-warehouse, business intelligence, and others to figure out those hidden information. At the same time, people also look for ways of using those new information not only for reporting, but also to give constant feedback to the application in order to achieve better results over time. This looks very much to a self-learning mechanism, that we can formally call machine learning.

Nowadays, a new dimension are absolutely relevant when applying machine learning on data analysis: the user’s behaviour and preferences as individuals and as part of the collectivity. This is clearly the influence of social networks on all sort of applications. They are exploring new possibilities of improving the user experience, but how can we be part of that party? Do we need to get a PhD degree to follow the wave? Is it finally possible to make the subject accessible to ordinary programmers? No, we don’t need a PhD degree and yes, it’s finally possible to make our applications smarter. This is what I’ve noticed while reading the book Programming Collective Intelligence, written by Toby Segaran.

lrg.jpg

At the beginning, I felt uneasy with the chosen programming language – Python – but I was gradually getting used to it because Python is not that hard to understand. In any case, I think this book would reach a larger audience by writing the examples in Java, which is constantly fighting with C for the first position in a well known ranking of popularity. On the other hand, the book targets more websites/portals and we know that Java is not really the first choice when companies decide to develop an application for a larger audience. Independent on the programming language we master, we can always convert what we’ve understood from Python to our favorite language. At the end, the success when implementing those machine learning algorithms come from our ability to solve computational problems, not from our ability of learning new programming languages.

On which concerns the content, the book brings together a large variety of real world problems from websites and services that we frequently use. It concludes reviewing and comparing the described algorithms, helping the reader to easily visualize the applicability of the different methods and to decide which one is the best for the problem we are actually facing.

This book has a new and better approach to present complex subjects such as data mining, machine learning, and statistics. By contextualizing the subject with real scenarios, the author improved the didactic in an unprecedented manner, since this is not the case in academic books. Actually, Programming Collective Intelligence can also be used by undergraduate and graduate students, since there are exercises at the end of each chapter. This is great because sometimes we finish the study of a chapter without realizing what else can be done with those algorithms. This is not common in this kind of book (non-academic) and yet very appreciated.