Sunday, May 24, 2009

Table Generation Strategy Revised

The JEE platform (EJB 3.0) offers, since its previous version (EJB 2.1), a feature to generate database tables during the application deployment. This feature is currently part of the Java Persistence API (JPA), which can be used even outside the JEE environment, like in desktop applications. My case is exactly a desktop application where I'm using an embedded database to simplify as much as possible the application installation and configuration.

When I started working with this embedded database I have configured my persistence.xml file to "create" the database tables during the deployment time, based on annotations made in the entity classes. This option generates a DDL (Data Definition Language) file that is processed by the JPA framework to generate the tables. Unfortunately, if you change your entity classes, this strategy won't update the existing tables, creating inconsistencies with the data model. Unfortunately, the application will crash. There are other two options: a) "delete and create", which will drop the entire database structure to be created again; and b) "do nothing" and let the developer find a way to manage the database structure. "delete and create" keeps the consistency between entity classes and data model but it loses all the data persisted so far. It might be useful for tests, but the developer should remember to change this option when delivering a new version to end-users, which is the addition of one more point of failure. "do nothing" implies in more work for developers, who will take the responsibility to manage the data structure. Based on the two shameful possibilities ("create" and "delete and create"), "do nothing" seems to be the best and unique option available for serious projects.

Today I'm using the option "do nothing", keeping in mind my constant concern for simplifying the life of end users when installing and using the application. So, every time the application is started I check if the database exists. If not, I execute a ddl script bundled inside the main jar file. I'm using the following code:

public static void main(String[] args) {
{...}
boolean applicationPropertiesLoaded = false;
int attempts = 0;
do {
try {
attempts++;
loadApplicationProperties();
applicationPropertiesLoaded = true;
}
catch(Exception e) {
logger.warning("Application properties"+
"not found. Trying to create the database");
createDatabase();
if(attempts == 2) {
logger.severe(
"Problem to load application properties.");
}
}
} while(!applicationPropertiesLoaded ||
(attempts < 2));
{...}
}

The code above tries to access the database, executing a basic operation like loading the application properties. If the database doesn't exist an exception is thrown and the respective catch block is executed. Inside the catch block the method "createDatabase()" is invoked to create the database. This process is performed at least 2 times, since the application properties should be loaded anyway. See the "createDatabase()" method below:

private static void createDatabase() {
logger.info("Creating the database"+
" for the first time");
List sqlCommands =
loadSQLCommands(Main.class
.getResourceAsStream(
"/META-INF/create-database.sql"));
EntityManagerFactory emf =
Persistence.createEntityManagerFactory(
AbstractEntity.ENTITY_MANAGER);
EntityManager em = emf.createEntityManager();
EntityTransaction et = null;
String sqlCommand = null;
try {
et = em.getTransaction();
et.begin();
for(int i = 0;i < sqlCommands.size();i++) {
sqlCommand = sqlCommands.get(i);
em.createNativeQuery(sqlCommand).executeUpdate();
logger.info("database update: " + sqlCommand);
}
initializeDatabase(em);
}
catch(Exception ex) {
logger.log(Level.WARNING,
"Problem to execute the sql command: "+
sqlCommand, ex);
et.rollback();
}
finally {
et.commit();
em.close();
}
}

The code above accesses the file "create-database.sql" embedded inside the application jar, extracts a list of sql commands, executes one by one, and initializes the database, inserting some initial default data. I list below the method "loadSQLCommands()" that opens the embedded file and reads its content:

public static List loadSQLCommands(
InputStream stream) {
List sqlCommands = new ArrayList();
try {
BufferedReader reader = new BufferedReader(
new InputStreamReader(stream));
StringBuffer sb = new StringBuffer();
String line = null;
while((line = reader.readLine()) != null) {
sb.append(line.trim());
}
StringTokenizer st =
new StringTokenizer(sb.toString(), ";");
while(st.hasMoreTokens()) {
sqlCommands.add(st.nextToken());
}
return sqlCommands;
} catch (MalformedURLException ex) {
logger.log(Level.SEVERE,
"Malformed URI of the file.", ex);
} catch (FileNotFoundException fnfe) {
logger.log(Level.WARNING,
"Database script not found.", fnfe);
} catch (IOException ioe) {
logger.log(Level.WARNING,
"Problem to read the script.", ioe);
}
return null;
}

Finally, the method 'initializeDatabase()", invoked at the end of the method "createDatabase()", is listed below:

public static void initializeDatabase(
EntityManager em) {
logger.info("Initializing the database"+
"for the first time.");
List sqlCommands = loadSQLCommands(
Main.class.getResourceAsStream(
"/META-INF/initialize-database.sql"));
for(String sqlCommand:sqlCommands) {
em.createNativeQuery(sqlCommand).executeUpdate();
logger.info("data update: " + sqlCommand);
}
}

With all this code above we could solve only the database creation process. There are still more code to update the database when needed. Because this post became so big, I'm going to describe the update process in a future post in this blog. Maybe, I missed some point here because there are so many details. So, if you had problems to implement it, please leave your comments below and I will try to complement the content to fulfill your needs. Keep following me!

15 comments:

  1. This solution is really ugly, though. You must rely on a platform (dbms) .sql script, breaking the JPA/eclipseLink convenience.
    It would be much better if one could 'backup' the data before dropping the database, and then load it into the new database. Maybe with the create option that could be achieved, by having two dbs, the old and the new, and then merge them.
    Would that be possible?
    ReplyDelete
  2. The problem is not that simple. If we backup the data and make the updates in the database, the previous data have the old structure and they cannot be recovered to the new one. We cannot rely on a specific dbms because JPA gives the freedom to use any DBMS and we are not going to lose this advantage, that can be useful in the future. Any possible merge would be worse to implement because every database update would demand a different merging strategy. Do you know any other solutions that can address these constraints in a better way?
    ReplyDelete
  3. I agree with the concern you raised. However, I still feel it'd be possible to have (pseudo-)automatic code that would do the translation of the schemas. I wish eclipseLink provided it.
    ReplyDelete
  4. EclipseLink only provides the auto-generate strategy, which is more useful for tests and cannot be used in production. Any change in the model means a good bye to the data. Btw, there is a pattern called Active Record, proposed by Martin Fowler, that states something about model and data updates. But this is something for a future post. :)
    ReplyDelete
  5. I'm not sure how Active Record solves/relates to the problem.
    However, I thought of the problem (which I share) and apparently there are also these options:
    1. catch the exceptions after you change the Entity objects, and there execute your update scripts (and possibly convince eclipseLink that now water is under the bridge).

    2. Applies only to apps with a data warehouse: in the rare event in which you will need to update the schema, then you could drop and delete, and have the sync module find out that the client needs data, all of which in the warehouse. The sync module will have beeen updated for the new schema.

    What do you think of these solutions?
    What do you think of these solutuio
    ReplyDelete
  6. My comments for each one of your points:

    1: if the entity classes are not consistent with the database model exceptions may occur when the user interacts with the features using that entities, and unless I implement a really intricate code, the user will consequently face a failure. To avoid that, we perform all updates (sql scripts) during the application initialization (main method), before any kind of user interaction.

    2: We are working with an embedded database, as written in the second paragraph. So, we are not considering any complex database deployment like a data warehouse.
    ReplyDelete
  7. Where do you get AbstractEntity.ENTITY_MANAGER from? Coudl you post/attach a SSCCE?
    ReplyDelete
  8. Ouch! I've created the AbstractEntity class, which is an abstract class that contains the constant ENTITY_MANAGER. This constant is a string that contains the name of the persistence unit. The name of the persistence unit is declared in the file persistence.xml, as shown below:

    persistence-unit name="usi4biz-pu" transaction-type="RESOURCE_LOCAL"

    In this case, the persistence unit name is "bookstore-pu". You can put this string directly, but a constant would make a possible maintenance faster.
    ReplyDelete
  9. That's a good idea, I was also upset with the constant. However, having it in the modifiable .xml is not also so great.

    I expect your create-database.sql to contain vendor-specific create table/trigger/procedure. Correct?

    I'd like a SSCCE to play with. I've not found any around.
    ReplyDelete
  10. Using create-database.sql to match what EclipseLink would have generated, is error-prone, and not lazy. Why not still use the create table generation option, and apply your code only to updates? In your main method you check if the db structure has changed, and if so, execute sql updating commands. That's the same approach you propose, but used only for updates.

    PS: if you deploy the application with Java Web Start, you could have the update code be executed
    only once, when you make the update.
    ReplyDelete
  11. Yes. The create-database.sql file is written according to vendor-specific language and features.

    You can find a create-database.sql file here http://github.com/htmfilho/Yasmim/blob/master/yasmim-ejb/src/conf/database-create.sql as example.
    ReplyDelete
  12. and what's your say on my last comment/strategy suggestion/improvement?
    Your feedback is valued.
    ReplyDelete
  13. The create table generation option drops and creates a new table if any annotated entity class is changed. Then we lose all previous data stored. We have no chance to apply scripts only for updates because the generation option just don't wait for it. The drop/create table will happen at the initialization process.

    That's why I'm doing it separetedly. Because I need the chance to execute the update script only.
    ReplyDelete
  14. http://wiki.eclipse.org/Using_EclipseLink_JPA_Extensions_(ELUG)%23Using_EclipseLink_JPA_Extensions_for_Schema_Generation&usg=AFQjCNG_yfrLrgEFsZozLeSCs9WqQZRN5g&ei=ThTTS_qPIoac-ga-pdH6Ag&sa=X&oi=section_link&resnum=1&ct=legacy&ved=0CAsQygQ:

    create-tables – EclipseLink will attempt to execute a CREATE TABLE SQL for each table. If the table already exists, EclipseLink will follow the default behavior of your specific database and JDBC driver combination (when a CREATE TABLE SQL is issued for an already existing table). In most cases an exception is thrown and the table is not created.

    With this option no data is lost. One would have to catch (maybe) that exception, and then execute the update code.

    Also, when is the initialization process? Before main is executed? Any resource where that is explained?
    I wonder how that is for Java Web Start.
    ReplyDelete
  15. That's right! Actually, I was describing the "Drop and Create" table generation strategy. The one that guarantees the consistency of the model, but data will be lost. The "Create" strategy really produces exceptions in case of inconsistencies.

    Yes. It is possible to create the database using the "Create" strategy and update it using the exception scope. I'm not sure if doing the update in the occurrence of an exception would work appropriately (it would be the subject of another post), but at least, using the script, we can create the database exactly as we need (considering vendor-dependent features). That's the advantage of this approach.

    The initialization occurs when the persistence unit is used for the first time. Thus, during or after the main method, but not before. The Java Web Start just updates files. It has no influence in a local database.
    ReplyDelete