Data Services is the Answer! Yes, but…
In our last post, we talked about using a data warehouse strategy as one of the ways to break down data silos across multiple departments and systems. Building a data warehouse is a traditional way to tackle the data silo problem. However, successful projects take serious organizational commitment and months of development time. With data moving faster than ever and business teams increasingly looking for the ability to rapidly experiment with analytics, the wait time for a data warehouse often leads business teams to move on before the warehouse can even be deployed. As a result, business and IT teams are looking for other ways to unify data across disparate data sources.
The second option is to build a data services layer where data engineers can query disparate repositories, including unstructured and structured data, to build blended data sets for business teams. There a many advantages to this approach over building a data warehouse.
Benefits of Data Services vs. Data Warehouse Approach
No moving data
The elimination of building ETL processes or developing custom extract and load jobs to wrangle data from your various sources, transform it into a relational structure and then load it into your data warehouse is arguably the most significant benefit of using a data services strategy over a data warehouse strategy. Data services, by their nature, do not move data from the source systems and allow you to blend data to create virtual datasets. For example, you can pull data from your MySQL database, blend it with data from your Cassandra data store and create a new data set for use in analytics. However, there is another factor to consider. If you’re looking at data virtualization solutions, many still require data to be transformed into a common relational structure before it can be used. Data has evolved beyond well-understood relational models so forcing it to conform adds cost and complexity so choose your solution wisely.
Free advice: Make sure the data services layer in your solution natively supports (no drivers to install) unstructured and semi-structured formats from NoSQL and REST-API sources so you can avoid the need to transform your data and shoehorn into back into a relational structured. Even if you are only using structured data today that may not be the case tomorrow as most new data that is interesting to explore is semi-structured or unstructured.
Experimentation and agility
Most business teams and leaders understand that analytics can make the difference between profit and loss or beating the competition or taking a beating. As analytics becomes critical to a companies ability to compete in the future, agility in building new data pipelines also becomes critical. With a data services layer that natively integrates with unstructured and structured data sources, you give your data teams need the ability to rapidly discover and experiment without the overhead of updating schemas and ETL processes. By unshackling them from a pre-defined schema, they can transition to an iterative agile development model for building data analytics products and work closely with the business to rapidly experiment and refine. Analytics products built in this manner are much more effective in moving business teams towards data-driven decision making because they deliver exactly what teams need much faster. If business teams have to wait weeks or months for their change requests to be acted upon, they have moved on.
Free advice: Ask yourself how difficult it is to add a field, a table, a new data source into your existing analytics architecture. If the answer is, “I’d rather have a root canal” then you might have a problem. Your data services layer should resolve this problem, not contribute to it. Be sure you’re not adding barriers to experimentation by forcing conformance to a relational structure when you have semi-structured or unstructured data sources in your stack.
Reduced cost of ownership
All things considered, the data architecture when using a data services layer should be less complex than that of a data warehouse simply because you are not moving data and no pre-defined schemas are used. With reduced complexity comes a reduction in costs to build and maintain the architecture as you need fewer resources to develop the data pipeline and the cost to make changes is relatively low.
Free advice: I can’t emphasize enough that simplification goes out the window as soon as you start transforming unstructured data back into a relational structure so this benefit assumes native integration with no use of drivers, etc. Building the integration may sound difficult but there are tools out there that have already solved the native integration problem. We are one of them but there are others.
In our humble opinion, the need to move, flatten, transform and apply structure to unstructured data should be a thing of the past. We are evidence that there is an emerging wave of new analytics tools that are leading the way to the future of data analytics where business self-service, experimentation, and data agility thrive.
Come catch the wave with us!
Sign up for a free trial here