Tag Archives: database

Book Review: Programming Collective Intelligence

When the volume of data managed by our application significantly grows, we start thinking about ways of extracting information from the databases. We know that there are a lot of hidden information there, but we don’t know exactly what they are and how to get them. Year after year, people have been working on solutions such as data mining, data-warehouse, business intelligence, and others to figure out those hidden information. At the same time, people also look for ways of using those new information not only for reporting, but also to give constant feedback to the application in order to achieve better results over time. This looks very much to a self-learning mechanism, that we can formally call machine learning.

Nowadays, a new dimension are absolutely relevant when applying machine learning on data analysis: the user’s behaviour and preferences as individuals and as part of the collectivity. This is clearly the influence of social networks on all sort of applications. They are exploring new possibilities of improving the user experience, but how can we be part of that party? Do we need to get a PhD degree to follow the wave? Is it finally possible to make the subject accessible to ordinary programmers? No, we don’t need a PhD degree and yes, it’s finally possible to make our applications smarter. This is what I’ve noticed while reading the book Programming Collective Intelligence, written by Toby Segaran.

At the beginning, I felt uneasy with the chosen programming language – Python – but I was gradually getting used to it because Python is not that hard to understand. In any case, I think this book would reach a larger audience by writing the examples in Java, which is constantly fighting with C for the first position in a well known ranking of popularity. On the other hand, the book targets more websites/portals and we know that Java is not really the first choice when companies decide to develop an application for a larger audience. Independent on the programming language we master, we can always convert what we’ve understood from Python to our favorite language. At the end, the success when implementing those machine learning algorithms come from our ability to solve computational problems, not from our ability of learning new programming languages.

On which concerns the content, the book brings together a large variety of real world problems from websites and services that we frequently use. It concludes reviewing and comparing the described algorithms, helping the reader to easily visualize the applicability of the different methods and to decide which one is the best for the problem we are actually facing.

This book has a new and better approach to present complex subjects such as data mining, machine learning, and statistics. By contextualizing the subject with real scenarios, the author improved the didactic in an unprecedented manner, since this is not the case in academic books. Actually, Programming Collective Intelligence can also be used by undergraduate and graduate students, since there are exercises at the end of each chapter. This is great because sometimes we finish the study of a chapter without realizing what else can be done with those algorithms. This is not common in this kind of book (non-academic) and yet very appreciated.

Yasmim Security: User Authentication using JDBC Realm

In the previous post, I introduced you to the Yasmim application. Today, I’m going into details on its implementation, starting from a very important non-functional requirement: the user authentication. Since I’m using a JEE6 Application server, I’m going to explain how we did it using the security infrastructure provided by the container.

    JDBC Realm
    This is a configuration entirely made in the application server. It may demand some time, but at least your application is going to be free from the basic security complexity. Yasmim uses JDBC Realm because we stored the user data in a database, which could be a third party database (single sign-on for multiple applications in the context of an organization) or the database of the application (users and groups embedded to be independent from the organizational context).

    The figure below proposes a data structure for users and groups. It is more complex and realistic than most examples found on the web, but it follows basically the same principle: a group contains several users and a user can be part of several groups. We created a N-N relationship between the tables account_group and user_account, using the intermediary table user_group.

    The script to create this structure is available in the repository. The beginning of the script creates the structure above in a MySQL database. For the purpose of this application, you can create the database with just these three tables. If you don’t know how to create a MySQL database, follow this post. You will need a connection poll pointing to this database. If you don’t know how to create a connection pool, I also wrote a post explaining this.

    The sql scripts below consider the model above and they are used to populate the tables for test purposes. The first one inserts 2 users in the database: John Smith and Mary Allen. Their password is secret but it is not readable because a MD5 algorithm was applied on it. The realm will need encoded passwords because we have specified the MD5 as the diggest algorithm in the realm configuration detailed later on. The column confirmation_code is not relevant now, so we defined it as null.

    insert into user_account values
    insert into user_account values

    Then, we insert the basic groups admins and users. The group users was defined as the default for registered users, but this attribute is not relevant either here. Maybe we can explore it in a future post.

    insert into access_group values
    ('admins','Administrative Group', null);
    insert into access_group values
    ('users','Default User Group', 1);

    Finally, we associate the users with the groups. Mary Allen is part of the groups admins and users and John Smith is part of the group users. So, Mary can do everything John can do, but John can’t do everything that Mary can.

    insert into user_group values
    ('admins', 'maryallen@acme.com');
    insert into user_group values
    ('users', 'maryallen@acme.com');
    insert into user_group values
    ('users', 'johnsmith@acme.com');

    Follow the steps below to configure the JDBC Realm:

    1. Go to the Glassfish admin console (http://localhost:4848/), navigate to Configuration > Security > Realms and press “New…”;
    2. Name: yasmim-realm – it will be used to refer to this realm in the application (tag realm-name in the file web.xml);
    3. Class Name: com.sun.enterprise.security
    4. Properties specific to this Class (using the model above as a reference):
      1. JAAS Context: jdbcRealm – identifier for the login module;
      2. JNDI: jdbc/yasmim – the data source JNDI name pointing to the connection pool where the tables are available;
      3. User Table: user_account – user table according to the model above;
      4. User Name Column: user_id – column where the user name is stored;
      5. Password Column: password – column where the password is stored;
      6. Group Table: user_group – notice that the group table is not the access_group, but the intermediary table user_group. access_group is used to store further details about the groups;
      7. Group Name Column: group_id – this is the foreign key to the access_group table, thus this is the id of the group;
      8. Digest Algorithm: MD5 – of course you are not going to save the password as it is. This is actually embarrassing for you to know the password of your users. So, let’s inform to the realm that the password was scrambled by the MD5 algorithm; and
      9. Encoding: Base64 –  the encoding used by the MD5 algorithm.
    That’s all for the server-side. Now, we are going to configure the application to use this realm.

    Application Configuration

    We have to configure the files web.xml and sun-web.xml. An example of web.xml configured for this purpose is available in the Yasmim repository. Here, some comments about the configuration:

    Since some pages are going to be protected, we’ll need an access denied page to alert the user:


    On the application side, the access to resources of the system is given to roles, that can be assumed by users and groups. Roles are declared this way:


    This security constraint states that only users with the role admin can access the content of the /admin folder. These users can perform GET and POST http methods only.

    <display-name>Admin Constraints</display-name>

    This second constraint states that only users with the role user can access the content of the /media folder. The tag url-pattern is flexible enough to consider several complex situations, but you may find some difficulty if the name of the files and folders are not so well organized.

    <display-name>Users Constraints</display-name>

    Now, we define which authentication method will be presented to the user. We chose the most common one, which is the FORM method. yasmim-realm is the name of the realm we have just created. /login.xhtml is the authentication page and /login_error.xhtml is shown in case of authentication failure.


    In addition to the web.xml file, the sun-web.xml file is used to associate the application roles with the user groups. It is a practical way to associate roles to several users at the same time. The file is located in the WEB-INF folder, together with web.xml, and its content is shown below:

    <sun-web-app error-url="">

    In this case, the role admin is associated with the group admins and the role user is associated with the groups admins and users.

    Implementation of the Authentication Page

    There are, at least, three pages to consider: login.xhtml, login_error.xhtml and access_denied.xhtml. The login.xhtml page was declared in the web.xml file as the page that collects the authentication data from the user though a standardized form. The form contains the username and password fields and the submit command to post those data to the server. The action of the form and the name of each field are pre-defined. Look at the code below:

    <form method="post" action="j_security_check">
    <h:panelGrid columns="2">
    <h:outputLabel for="j_username" value="E-mail"/>
    <input type="text" name="j_username"/>

    <h:outputLabel for="j_password" value="Password"/>
    <input type="password" name="j_password"/>

    <h:outputText value=" "/>
    <h:panelGrid columns="2">
    <input type="submit" name="submit" value="Sign in"/>
    <h:button outcome="index" value="Cancel"/>

    This is a JSF 2.0 page with Facelets, but it can be a simple HTML page too. The important thing is the form action, which is j_security_check, and the username and password fields, which are named respectively as j_username and j_password. When the user submit the data the web container will handle it in a special way without interfering in the logic of your application. If you don’t need any other fancy feature, the work is done here.

    According to the authentication and authorization data in the database and what was declared in the application configuration, Once authenticated, Mary Allen has full access to the content of the folders admin and media and John Smith has access only to the content of the folder media. To save your time, take the Yasmim source code as a starting point and explore other possibilities.

    Creating a Connection Pool to PostgreSQL on Glassfish V3

    I recently created a new connection pool to PostgreSQL on Glassfish and I would like to share the steps I followed with you. Obviously, you need PostgreSQL and Glassfish installed on your machine and a database already created in PostgreSQL. If you didn’t configure your new PostgreSQL installation yet, follow the steps I described at this previous post and come back here to continue with the connection pool. The necessary steps are:

    1. We need the PostgreSQL JDBC Driver, since Glassfish and its deployed applications are writen in Java. Drivers are available for download at http://jdbc.postgresql.org. For this experiment choose the JDBC4 driver.
    2. Download the driver file postgresql-<version>.jdbc4.jar and copy it to the diretory [glassfish_home]/glassfish/domains/domain1/lib/.
    3. Restart Glassfish in order to make it load the new database driver. I thought that adopting an OSGI architecture Glassfish would never need restarts again, but I was wrong. At least, the restarting process is faster than V2.
    4. Enter in the administration console and go to Resources/JDBC/Connection Pools.
    5. Create a new connection pool with the name [database_name]Pool, select the resource type javax.sql.ConnectionPoolDataSource, select the database vendor PostgreSQL and click next.
    6. Select the datasource classname org.postgresql.ds.PGConnectionPoolDataSource and inform the following additional properties:
      Password=******* 😉
      PortNumber=5432 (this is the default port but make sure that you are using the correct one)
    7. Click Finish to save the new connection pool.
    8. Go to the list of connection pools again and select the new one that you just created.
    9. Click on Ping to check if the connection was correctly configured. The message “Ping Succeeded” means that the connection is working fine.
    10. In order to be able to use this connection pool in JEE applications, we have to create a JNDI name for it. Go to Resources/JDBC/JDBC Resources.
    11. Click on New and set the JNDI Name jdbc/[database_name], select the connection pool created above and click Ok to finish. This JNDI name will be used by applications to access the PostgreSQL database.

    These instructions may work with Glassfish V2 as well, since its database configuration is quite similar.

    I’m being very specific in terms of chosen technologies, but if you have a slightly different configuration and these steps are not working yet, please comment your issues below, describing also your current configuration/context. Maybe, we can figure it out.

    Good Ideas for Kenai That Somebody Else Should be Doing Instead

    As you already know, Oracle is planning to deactivate Kenai for community developers. Yes, it is sad but let’s move on and spread our code before it is destroyed. It is also time to make public some ideas I’ve been sharing with Sun’s folks last year. I have criticized Oracle for some decisions they have made, that’s true, but I don’t want to criticize and shut up. I have to propose something better and realistic to show that they are lazy thinkers, growing by buying other people’s innovation, not actually doing their own.

    I want to discuss 2 ideas. Each one solves a particular need of entrepreneur developers.

    1. I need integrated services to manage my software project and also to deploy it: It would be a mix of SourceForge with Amazon EC2, or Assembla with Google App Engine. Once my code is committed to the server and an integration test is performed, then my application would be automatically deployed in a server. Today, we need at least two different service providers to make it happen and it means more bureaucracy and waste of time. If Oracle or RedHat adopts something like this, the payment for such services would financially support their open-source projects, such as OpenSolaris, Glassfish, JBoss and MySQL.
    2. I’m an expert in my own open-source product, but I need help with its infrastructure: I have a complex and large JEE5 application and my expertise allows me to provide good support in terms of application features and bug fixing. As a matter of fact, my application is using Glassfish as application server and MySQL as database. However, all I know about these technologies are enough to develop and deploy the application. If a critical error occurs, avoiding the application to run because of an external and unexpected problem, probably my own knowledge is not enough. In order to solve that, Oracle or RedHat would make a partnership with developers, providing services that complements developer’s services. Considering RedHat, the customer would pay directly to them for the services and RedHat would transfer part of the payment to the developer, who provided part of the service too. With this deal, every developer becomes a potential retailer of RedHat’s services. That’s one more solution to increase investments on open-source projects.
    Both ideas above were explained to Sun engineers some months ago (At the time I was thinking that IBM would by Sun :D).  To sell open source products, companies should innovate, however Sun was good when innovating their products, but not their selling and services processes. These parts of their business were always traditional. RedHat, on the other hand, has far better selling and services processes for their open source products than Sun because they are agile, have simple procedures and minimal distance between the customer and the specialist. That’s why they are one of the only companies to get profit from open-source.

    Companies that already offer software project management (track system, version control, wiki, etc.), such as AtlassianCollabnetGoogle or even Oracle with Kenai, have strategic advantage because they already have something done, with large experience on it. I only wish that people out there get this idea and make it happen. This is not what I love to do, so I won’t appropriately exploit it. But I’m confident that these are good ideas and I hope to blog about any future service that provides such features and I’m probably going to be one of the first customers of the pioneer.

    Creating a New MySQL Database for Dummies

    I’m writing this post for self reference because I always forget how I’ve created my last MySQL database. Well, I’m sharing because it could be useful for you too. I consider that you already have MySQL 5.0 or superior installed and running normally. I also consider that you configured the path of your operating system, pointing to the bin folder where all MySQL commands are located. If you don’t fulfill this prerequisites, you can easily find solutions for each one using Google. It depends on your operating systems, so I would never be complete here.

    I’m going to use the basics, avoiding specific tools that can accelerate the process, because if you fulfill the prerequisites then I know you have these basics. So, execute the command below to start:

    c:> mysql -u root -p

    It will create a client authenticated session to access MySQL. “-u” means that you are passing the user of the session in the command line and “-p” means that you want to type the password right after the command has been executed. The user “root” has enough rights to create the database, but we will not use it all the time. Once authenticated, type the command below to create the database and a dedicated user for it:

    mysql> create database book_shop;
    mysql> create user 'bs_user'@'localhost' identified by ’(password)’;
    mysql> use book_shop;
    mysql> grant all privileges on book_shop.*
    to 'bs_user'@'localhost';
    mysql> flush privileges;

    The database will be created, a new user too and finally we give all privileges for this user to operate the new database. To check if the database was appropriately created, execute the following command:

    mysql> show databases;

    Check if the database “book_shop” is in the list. It should be ;). Then we have to sign out from this root’s session and open a new session for the new user to create the database structure. Please, follow the sequence of commands below:

    mysql> quit;
    c:> mysql -u bs_user -p
    mysql> use book_shop;
    mysql> source (path of the database script)/script.sql;

    Well, that’s it! You should use the new user, “bs_user”, in your application avoiding the super powerful root guy. My intention is not to be complete on this post, but helpful. Maybe you can find a lot of additional good practices out there but this is the fastest technique I recommend you to do in order to start developing your project.

    Any question? You know… comment below 😉 !