This is the third article in the larger series of interviews around diverse aspects of Data Science technology and its practical implementation in modern businesses. Today in the interview: Paulo Baena, software developer with dataWerks.

With desktop or DC virtualization, an organization reduces their demand on hardware. How data virtualization makes the life of IT manager simpler?

IT Manager bear not only the budget responsibility, they have to ensure that applications in their IT environment remain functional and flexible, while IT departments never lose control over the applications. The diversity of applications is generally difficult to tame. Add to that the pressure to analyse ever larger amounts of data at ever higher speed, and you’ll get a more difficult problem than a small budget.

Data virtualization gives some relief to this. Like in software design, if we abstract business data to a separate layer, then the new projects don’t need to care anymore about capturing and normalizing the raw data. There will be no complex macros, no multiple data ingress points to consider. The projects can run in a more generic way, they are developed and implemented faster and run more stable.

Getting back to your question – if you deploy a data virtualization solution, then you can get more control of the data based IT projects, can get better application stability in shorter time, for less money. So yes, there must also be some “hard savings”.

Does data virtualization raise new requirements to my IT infrastructure or to my operations?

At least with dataWerks, we don’t need to change anything on the applications or on the underlying infrastructure. If a business data source has sufficient number of free connectors, if the network has enough free bandwidth, then we’re generally operational. However, we see that the data sanity is very often an issue. For example, some business processes don’t really care that their dates and timestamps are stored as text, another ones are sensitive to that. Data virtualization means a unified interpretation of data, otherwise data can’t be correlated intelligently and, the solution doesn’t deliver the maximum value.

I’d say, in our projects, quite some time is spent on achieving a level of data maturity. This includes a lot of learning, both on our side and on the customer’s.

At first glance you may see only relatively high fixed costs of developing of data normalizing transformers, but keep in mind that connecting of a new datasource of known type comes at low margin cost. On a larger scale, this approach pays off very well.

How easy is the implementation of a data virtualization solution in a typical project?

It varies quite a bit! Typically, the easiness of a project implementation is directly proportional to how complete your data virtualization product is. Accumulating of enough coded logic to formalize (and to normalize) the data is the biggest challenge of implementation. You know, at the end of the day all data is either strings or longs, maybe bitmaps or metadata, they all are commodity. The true magic is to know how you want to analyze them, so that you can plan, how to abstract them, in order to treat them simply and universally.

It’s very common that customers aren’t clear about how they want to interpret their data. Getting data readiness at acceptable level complicates any project, regardless how small.

Drop us a line!