This is the fourth article in the larger series of interviews around diverse aspects of Data Science technology and its practical implementation in modern businesses. Today in the interview: Reginaldo Soares, software developer with dataWerks.
Thinking on modern enterprise IT organizations, what attention should they pay to data virtualization?
Data virtualization is pretty crucial for the IT. See, companies capture single events or combination of events – a lot of them is business relevant – in separate silos. It’s not particularly bad, it’s just organic to the history of an application infrastructure. Separated systems for data collection create complexity of data analysis. As IT manager, you need to teach users, how to access the collected data and how to play with that. Even for data professionals, recognition of data patterns is cumbersome.
Data Virtualization materializes all data in a single virtual place. It’s a great method to combine the data and a key to business intelligence. Do you remember Data Mart that was a new concept back in 2000er? It was a single source of final historical data, ready for analysis. One can see Data Virtualization correlated to that, but it deals with all data diversity, is quick and works at much lower costs.
Whatever you look at – Data Virtualization, Data Hubs, Database Federations – all technologies have their definitions and their strengths, but IT organizations evaluate and use them all from the single viewpoint, which is: How can I leverage data storage technologies for better data analysis?
What does the term “data driven enterprise” mean for you?
Data driven companies use information as a key mechanism for their business. Large companies treat information today with same dedication they payed in the past to materials and parts for their conveyor belts. Advertising industry is all built on data. It’s now not about having the best message but rather about recognizing the receptive groups and individuals and addressing them instantly. Logistics businesses, take Zalando as an example, see themselves as technology companies today, because they can’t stay at the top of the complexity of delivery and payment channels, if they had slow or unreliable data processing.
Having computing capacity for data pattern analysis is today more powerful for a business than, say, having optimized process descriptions. Way more powerful! This is why data driven enterprise is not less than the next industry revolution. Data collection, data analysis, ability to forecast events, all real time – data driven business model is really attractive to many industries and to different company profiles.
How to select the most future-proof data technology for an enterprise?
What is proven as universal technology, adaptive to changes in the data processing? Looking at the past, that are relational databases! And they are still future-proof, because they are so difficult to unroot and replace by a totally different technology. I’m convinced, RDBs will be broadly used for years ahead.
However, if you want to achieve something new, then you may find no suitable legacy technology. Thanks to Google and LinkedIn, who mastered this challenge some time ago, we got new quasi standard data technologies. If you sketch a data strategy for your company and you analyze the floating IT tool market, you recognize Hadoop/Hive, Mapreduce and Spark as the tools of choice for data processing engine. You can watch these technologies blending and creating more integrated ready-to-go tools. Once you are familiar with quasi-standard technologies in data processing, you would realize their limitations and see much more clear, which way to move on.
As for the data storage technologies, modern NoSQL, non-relational databases like Mongo or Cassandra are focused on specific data problems. There is no universal advise here. Starting with an RDB, you’re on the safe side.