April 05, 2016

Data Evolution to Revolution

I apologize in advance as this is an attempt to solidify some ideas that I have been having into something a tad more cohesive. Let’s start small. Not sure if this is a chicken and egg sort of thing but I think it just may be. Does the evolution of technology yield more data and thus more complex and intricate mechanisms to capture, evaluate, utilize and wield said data are needed. Advances in technology bring about more data quicker and more accurately. Strides forward in technology enable greater insight into data that may have existed for years maybe even decades or more. Most of these technologies are merely building on top of their predecessor taking things one step further and refining them. Occasionally there are novel ideas that truly disrupt the pace and direction shattering preconceptions and misconstrued understandings. We have grown very little in our “PC” age in true advancement. We have sleeker, smaller, faster infrastructure. We have smart watches, phones, tablets and more. We have this buzz word “cloud” and “IoT” that people love to throw around. Anyone today can make an “app” and they think that is novel, that will change the world. We are so far away from true advancement on the computing level to use the word of intelligence. I would not pretend to be an expert on anything to do with AI or machine learning. I do however know that we have neither the precision or speed capable of coming remotely close to anything of substance. We are playing “Go” and Jeopardy, we are writing cookbooks, and more. True creativity is alien to our creations. We are doing nothing more then creating formidable copy-cats. Sure it may consume many different approaches to chess or some other topic. Ultimately it is illustrating a path that will attempt to “beat” its opponent. I am not enough of a philosopher or scientist to evaluate the state of that level of comprehension. It is certainly complex and well structured, and it may beat a human. It is however a very far cry from the human intellect.

Now that I have gotten this far I can try to say with a bit more clarity that I am trying to illustrate that computers and technology evolve constantly. Through their evolution they yield more and more data. New technologies are needed to understand and discover all of the depths of that data. Ultimately we are further burrowing into the initial data acquired. Nothing new only uncovered.

To teach a computer to act like a man you cannot, but you can empower a computer with the collected knowledge of a man and enable it to utilize that knowledge.

The inspiration for this rant is because I have been toying with the notion of creating a layer that will assist in abstracting data persistence from the application developer. This layer won’t understand your data only understand how it relates to each other, the types of data, and the types of analysis performed on that data.

The primary difference between TCP and UDP is fairly simple and can illustrate the beginning of this abstraction TCP has error checking and used when order matters. If you care about speed and nothing else, UDP is appropriate. TCP is heavy but reliable for data integrity, order, and consistency. I’m sure I’m not the first to draw a parallel to that of the traditional RDBMS your SQL variation versus the newer NoSQL variant. SQL is typically characterized as transactional, a strict schema, relational and challenging to scale horizontally. Whereas the NoSQL variant is typically assumed to be non transactional, schemaless and easy to scale horizontally. There of course have been the newer variants often dubbed as NewSQL which exhibit a hybrid approach attempting to take th best qualities of the aforementioned technologies and provide a hell of a solution. There have been advancements in technologies specific for very large file store, JSON documents, and full text searchable data. Suffice to say, there is no size fits all solution. Depending on the software needs, business requirements among many other factors there may be dozens of different factors that can attribute to proposing a design. The DBA used to be the overlord with his stored procedures and locked down tables. Not every problem can be solved with a store procedure. With an ever growing technological world, and more and more data emerging a strategy to stay afloat ahead of the curve seems faint and distant. I think that with the right approach it can be done and change the nature of how we interact with data today.

Spring Data hopes to provide a layer of abstraction when accessing data from different sources. Varying from SQL sources, NoSQL variants, In-Memory sources or others. With Spring Data the @Entity represents your top-level data that we are dealing with. The other being the @Repository which is the abstraction layer that interacts directly with the store engine that is configured. Spring Data can support many different types of store engines but ultimately the onus lies upon the application architect to decide what the repository connects to. Image if there was a layer that determined how the defined data would be persisted evolving over time.

The relationship between an FPGA and processing is similar to this new layer and the data that it will persist. With an FPGA the gates are constructed according to the task at hand and how the module was programmed to react under those given circumstances. Similarly, this new layer which I am going to dub as an “Adaptable Data Source Gateway” will utilize the different components that are at its disposal based on the data inputted and the configured priorities.

Here is a high level overview of how this may be accomplished. The comparison to an FPGA only goes so far. An FPGA doesn’t “automatically” change its design, rather it changes its design based on its programming. To add this functionality with regards to data it will be necessary to maintain a list of possible storage engine types (engine interfaces) as well as implementations of those interfaces. We will also need rules to identify what data requirements are best suited for each interface. Ultimately we need a metadata system that will allow the developers to identify the data from afar allowing the system to gradually grasp the data needs.

I have some ideas for the secret ingredients that will make this possible. I am going to toy with these ideas so far and do some rapid prototyping. I hope to write more soon about my findings. As always any input is appreciated!