March 17, 2016

Crate.io - Part 1

So I started playing a bit with crate.io while I waited for the power to come back on here at the office. Crate is powered as a NoSQL data model with implied typing and optimistic locking. What makes crate unique is that it attempts to provide the aggregation functionality commonly found in RDBMS. However the eventually consistent non-transactional data model is very foreign. It has things like full text search and geometric querying functionality, which is nice but nothing to write home about.

The data types it claims to handle add the additional array and object types not found in traditional SQL systems. PostgreSQL does support these types but they are not without their limitations. Crate's ability to handle JSON type input seems a great deal more natural than has been my experience with PSQL 9.4+, which was fairly awful.

Let's create the following table:

create table test (
  data object,
  label string
  );
 now we make an insert with this data:
insert into test values(
  {"z"= 1},
  'some test'
  );
 Now if I attempt to use insert another row with a data type that is different from the previous row there is  a problem:
insert into test values (
  {"z"= [1, 2 ,3]},
  'baba bobo'
  )

yields:

SQLActionException[Validation failed for data['z']: Invalid long]

now that may be logical...what crate does is for the complex data types (object and arrays) it detects the data types of the inutted values and creates a schema around those values. There are other data storage engines that perform similarly. Creating a schema on the fly. This makes querying fast and more efficient, especially where the actual database is written in a format that is much more type rigid.

Now this was an unexpected and annoying issue:

insert into test values (
  {"mixed_array"= [1, 'two', 3.0]},
  'mix it up!'
  )

yields:

SQLActionException[Validation failed for data: Mixed dataTypes inside a list are not supported]

The strongly typed Java roots of Crate may be apparent from these few limitations. The type flexibility within arrays is not an uncommon convention in JSON. As for the rigidity of the schema, I imagine that for performance reasons Crate initially detects inputted types and creates a schema to adhere to it. This is not uncommon for some NoSQL databases that attempt to have what I've seen called gradual typing.

Crate supports blob types, like MySQL and others. Blobs are supposed to allow for binary storage of data. Crate doesn't point out any clear limitation.

I want to quickly summarize my findings so far. I have not reviewed crate for performance, high availability and reliability and many other things for that matter. My initial focus was evaluating its support for JSON and having "dynamic schema" while using a SQL variant. It is non-transactional system that utilizes optimisic locking. If you want to store your JSON which may have a "fuzzy" schema or hybrid schema you may run into problems. Crate locks into its perceived schema based on inputted data. If your JSON is consistent and you want to support database aggregation (the way it should be) crate may be for you.

Bottom line: Looks like a promising solution for dealing with data that has a schema with limited fluidity. Has many of the features you would expect from an RDMBS with the scalability of the newer NoSQL variants. Warrants further investigation and looks "optimistic".