Monday, January 05, 2009

Embedded Databases in Java

As I am developing "Better Time Machine :-) ©", I am looking for various embedded databases. Each database has good and bad sides. I even tried JDBM (ripped off from W3C Jigsaw and bit fixed inside — it is just few classes). All I need is very fast and embeddable engine (means, very small at footprint and memory consumption).

Here is a list of what I've tried (since software is written in Java, so I tried pure 100% Java solutions):
  • Apache Derby (AKA JavaDB)
  • Oracle Berkeley DB for Java
  • db4o for Java (sort of ZODB for Python, but much better)
  • W3C JDBM (ripped off from Jigsaw)
  • Some JNI wrappers around GDBM (JDBM and JavaDBM).
  • H2
So shortly, you want to play with these rubber toys only in the case when your database is no more than just a storage for reasonable amount of data that is used by desktop application or a daemon or a server software (here be careful with concurrent access for write). In any other situation, you definitely want to go with a solid server database solution. Use PostgreSQL if you want good stuff for free, MySQL if you want to screw yourself or Oracle if you can buy.

If you want objects, use db4o. Very neat database. But footprint is not any small: about 30M. Think about it. You can also go with W3C JDBM, store entry key as a string and entire object as a byte array of serialized object. Crappy, but works well in some certain cases, if you do not need funky searches (otherwise go back to db4o).

If you want embedded SQL, then definitely H2. It is RDBMS from the same author of HSQLDB, just simply faster and better. It has lots of features and very nicely coded. Performance is very high, it supports very large databases, in-memory databases and works in server mode. Footprint is very small and it is 100% pure Java open source software. It also supports ORMs, if you need this thing. But I am not sure how you could install 120M of ORM, then make cumbersome XML mappings, then use weird HQL (as for Hibernate, for example) and stay happy because your database engine footprint is just 500K library... Go real stuff: use plain JDBC with few more codelines and plain SQL that you can write best! It will reduce size of your application hundred times and increase performance at least as twice, no mention perfect maintainability of code, since it is small, easy, simple and very well understandable to anybody.

Apache Derby is the last thing you need. It is actually IBM Cloudscape, if you remember one. Amount of bugs is so big and it coded so nasty, that to open source and change its name is the only way to hope someone will use it. At performance it is slowest and footprint is twice bigger than H2. Additionally, database might get easily corrupted or just suddenly stop booting. Apache Derby is "No Go" for you and you want to avoid it.

Happy New Year! :)

Update: Hooray, I have an alpha version of my "Better Time Machine © :-)" working! And it works really amazing for me: saves disk space, fast, customizable, yet very simple in use. I am going to put it on public domain, once I am sure it will not break your backups. :-)

Some more update: db4o is not 30Mb footprint, sorry. So do not think about it. :-)


Maik Jablonski said...

The footprint of db4o isn't 30MB, that's the size of the download-package (sources and documentation included). The actual footprint of db4o is less than 1MB...

Vagaus said...


I'm wondering about the footprint you talked about. Are you referring to the amount of memory used by Db4o libraries? Or are this footprint related to the amount of memory that is required to run your application with Db4o?

In the first case I'd expect a footprint in the range 1.5 ~ 2.0 Mb (the size of the jars)


BM said...

Maik, Adriano: uhm, yes. I was blogging at 1:00AM, my bad. :) And db4o is that really great stuff, actually.