Tag Archives: spring

Data Evolution to Revolution

I apologize in advance as this is an attempt to solidify some ideas that I have been having into something a tad more cohesive. Let’s start small. Not sure if this is a chicken and egg sort of thing but I think it just may be. Does the evolution of technology yield more data and thus more complex and intricate mechanisms to capture, evaluate, utilize and wield said data are needed. Advances in technology bring about more data quicker and more accurately. Strides forward in technology enable greater insight into data that may have existed for years maybe even decades or more. Most of these technologies are merely building on top of their predecessor taking things one step further and refining them. Occasionally there are novel ideas that truly disrupt the pace and direction shattering preconceptions and misconstrued understandings. We have grown very little in our “PC” age in true advancement. We have sleeker, smaller, faster infrastructure. We have smart watches, phones, tablets and more. We have this buzz word “cloud” and “IoT” that people love to throw around. Anyone today can make an “app” and they think that is novel, that will change the world. We are so far away from true advancement on the computing level to use the word of intelligence. I would not pretend to be an expert on anything to do with AI or machine learning. I do however know that we have neither the precision or speed capable of coming remotely close to anything of substance. We are playing “Go” and Jeopardy, we are writing cookbooks, and more. True creativity is alien to our creations. We are doing nothing more then creating formidable copy-cats. Sure it may consume many different approaches to chess or some other topic. Ultimately it is illustrating a path that will attempt to “beat” its opponent. I am not enough of a philosopher or scientist to evaluate the state of that level of comprehension. It is certainly complex and well structured, and it may beat a human. It is however a very far cry from the human intellect.

Now that I have gotten this far I can try to say with a bit more clarity that I am trying to illustrate that computers and technology evolve constantly. Through their evolution they yield more and more data. New technologies are needed to understand and discover all of the depths of that data. Ultimately we are further burrowing into the initial data acquired. Nothing new only uncovered.

To teach a computer to act like a man you cannot, but you can empower a computer with the collected knowledge of a man and enable it to utilize that knowledge.

The inspiration for this rant is because I have been toying with the notion of creating a layer that will assist in abstracting data persistence from the application developer. This layer won’t understand your data only understand how it relates to each other, the types of data, and the types of analysis performed on that data.

The primary difference between TCP and UDP is fairly simple and can illustrate the beginning of this abstraction TCP has error checking and used when order matters. If you care about speed and nothing else, UDP is appropriate. TCP is heavy but reliable for data integrity, order, and consistency. I’m sure I’m not the first to draw a parallel to that of the traditional RDBMS your SQL variation versus the newer NoSQL variant. SQL is typically¬† characterized as transactional, a strict schema, relational and challenging to scale horizontally. Whereas the NoSQL variant is typically assumed to be non transactional, schemaless and easy to scale horizontally. There of course have been the newer variants often dubbed as NewSQL which exhibit a hybrid approach attempting to take th best qualities of the aforementioned technologies and provide a hell of a solution. There have been advancements in technologies specific for very large file store, JSON documents, and full text searchable data. Suffice to say, there is no size fits all solution. Depending on the software needs, business requirements among many other factors there may be dozens of different factors that can attribute to proposing a design. The DBA used to be the overlord with his stored procedures and locked down tables. Not every problem can be solved with a store procedure. With an ever growing technological world, and more and more data emerging a strategy to stay afloat ahead of the curve seems faint and distant. I think that with the right approach it can be done and change the nature of how we interact with data today.

Spring Data hopes to provide a layer of abstraction when accessing data from different sources. Varying from SQL sources, NoSQL variants, In-Memory sources or others. With Spring Data the @Entity represents your top-level data that we are dealing with. The other being the @Repository which is the abstraction layer that interacts directly with the store engine that is configured. Spring Data can support many different types of store engines but ultimately the onus lies upon the application architect to decide what the repository connects to. Image if there was a layer that determined how the defined data would be persisted evolving over time.

The relationship between an FPGA and processing is similar to this new layer and the data that it will persist. With an FPGA the gates are constructed according to the task at hand and how the module was programmed to react under those given circumstances. Similarly, this new layer which I am going to dub as an “Adaptable Data Source Gateway” will utilize the different components that are at its disposal based on the data inputted and the configured priorities.

Here is a high level overview of how this may be accomplished. The comparison to an FPGA only goes so far. An FPGA doesn’t “automatically” change its design, rather it changes its design based on its programming. To add this functionality with regards to data it will be necessary to maintain a list of possible storage engine types (engine interfaces) as well as implementations of those interfaces. We will also need rules to identify what data requirements are best suited for each interface. Ultimately we need a metadata system that will allow the developers to identify the data from afar allowing the system to gradually grasp the data needs.

I have some ideas for the secret ingredients that will make this possible. I am going to toy with these ideas so far and do some rapid prototyping. I hope to write more soon about my findings. As always any input is appreciated!

SCP, SMB and more

Recently I was asked to add additional connection protocols as a means to submit samples to be analyzed by our  automated forensic analysis platform. We had our UI, and REST API and multiple REST clients (C#, Java and Python). These are all standard but all require integration or manual intervention. Protocols like FTP, SFTP, SCP, SMB and many others are used to transfer files for everyday use. They have commercially available clients and many operating systems already have built-in support as well. My challenge was providing a smart, powerful and flexible integration.

I knew that there are several open servers available to receive files. In order to integrate in to our architecture something would need to consume the uploaded files. Additionally, we would have to handle the authentication, authorization aspects necessary to upload to the server. We can’t just have users created for every system reading directories as that is clunky, error prone, and not scalable. I opted in for an alternate approach.

Our core components are mostly written in Java and I was looking for a solution that would directly integrate or by means of JNI. I began with SCP and immediately found the Apache project MINA. It provides a complete Java SSHD/SFTP/SCP solution from soup to nuts. The intention of the SCP/SFTP is to be written to disk but with little ingenuity I Was able to completely cut out that step stream directly into our system without ever writing to disk. We are already using Spring Security for our authentication. While I wasn’t going to take the time to extend Spring Security to handle the SSH protocol I did utilize the ThreadLocal security context SecurityContextHolder. This enabled connection between the authentication mechanism that MINA provides and the data transfer to identify the user based on the security context setup. This enable me to continue using the rest of the application I had already secured. The rest of the system thought it came from HTTP, or didn’t really care. Ideally I would extend some interfaces in Spring Security and actually bind the protocol but that would be a nice addon that I can recommend to the Spring Integration team who already support SFTP. Click here to view the gist for the scp integration.

Some of this code is just extending the ScpHelper and the ScpCommand. This provided an easy way to access my existing authentication service and setup the security context.

Scp was out of the way but SMB was the more challenging integration. I didn’t find nearly as much on the topic and there are a lot more complications to handle. SCP/SFTP is safeguarded with TLS the same encryption process that makes HTTPS secure using public/private key encryption. After the initial handshake all data sent over the wire is encrypted. This facilitates authentication in comparison to many other protocols out there. Much to my surprise and naivety I was hoping that I would be able to utilize the stored credentials already encrypted and protected which we use to access via Spring Security. Instead SMB usually send encrypted credentials that are compared to already obtained credentials. This is a typical practice passing challenge data as to conceal the secret information and prevent any false information. I had to result in storing the hash digest MD4 of the user’s password to be compared to the client provided hashes password.

The library I used was developed by Alfresco called JLAN. It is on the older side and scarcely maintained. Sadly, its documentation was slightly better than you’d expect to find from a Jboss product. There is a developer’s guide and installation guide. For general usage it may be fine, but for what I was planning on doing it was tad more challenging. Some software engineers try to protect their future code by using final for anything and everything and making things very rigid and hard to extend. They only let you access to very small selection of methods and may not even document those well. I wanted a way to hook in my UserAuthenticationService that we used in the scp service. My challenge was that the SecurityConfigSection would only let you specify the UsersInterface by specifying a String of the Class that implements said interface. That class is instantiated and made completely inaccessible there after. This made accessing my Spring managed bean very difficult to nearly impossible. Usually I would try to have @ComponentScan pickup the class and either @Autowired the interface or use the BeanFactory to retrieve it dynamically. I came up with a simple but really nice approach to handle this.

In my @Configuration class I set the BeanFactory in the enum making it statically accessible. Thus, even our annoying UsersInterface implementation can take advantage of our managed beans without having to deal with any final mess.

After I worked out that spring bean issues I still had to deal with the frustrations of learning the ins and outs of the SMB protocol. This approach also would allow for a transfer that requires no writes to disk and authentication that can flow through the existing system. Look here for a gist of the general approach. If you really want to use the JLAN library realize that Alfresco sells an enterprise license and probably supports a great deal of options with it.

Here is a snippet of what you will want to add to a POM (maven) to play around with these code samples.

More to come!