Apr 26, 2024 10:30:00 AM
During this Autumn, me and our crew at Agile Data Engine talked with the data platform teams of tens of international enterprise organizations from multiple industries. Discussions revolved around what’s keeping the teams busy, what kind of requests and expectations they are facing in 2022, and how to be successful in the complex setting that most enterprise organizations find themselves in. A lot is going on for sure, and the requirements and expectations for a well-functioning data platform team have evolved a lot during the past couple of years – and they keep constantly evolving.
1. The value of DataOps is widely recognized
We hear a lot of talk about how agility, stability, and automation are the goals for data solutions development and operations. Not only data engineering functions talk about these goals, but also BI and operative analytics teams. DataOps is the connecting methodology enabling them.
Just like DevOps can have a significant effect on software development projects, DataOps does have a transforming effect on teams working with data solutions. Here are some real-life examples of the improvements well-functioning DataOps-driven teams have experienced:
- Monthly data platform production deployments rise from single digits to triple digits.
- The amount of time spent to keep the platform up-to-date and running drops by tens to hundreds of working hours per month.
- Senior developers say that projects that took weeks to months earlier now take days to weeks.
There are also more unusual benefits of DataOps: I heard of dramatic improvements in the quality of sleep of managers and developers after joining a DataOps-driven data platform team. The story doesn’t tell if their sleep got better than Average Joe’s or if they just got rid of nightmares of 5 AM emergency calls about management reports being blank. Should this be listed as a benefit in the job ad? Anyway, there seem to be a lot of DataOps-related initiatives emerging from a multitude of organizations.
2. Benchmarking the level of data engineering operations is difficult but worth it
Benchmarking enterprise data engineering operations might seem like a waste of time because of the unique nature of each organization and the complexity of the operations. But interestingly, organizations have been fairly interested in how others succeed in their data engineering operations.
I’m not only talking about platform architecture, use cases, or modeling methodologies. Organizations want to hear about the actual figures for the number of production deployments per month, the size of the data models, the number of rows loaded daily, and so on. Discussions quickly move toward the composition of the data engineering team(s) and the operations methodologies.
The reason for the benchmarking might be the growing demand for more complex data solutions that require processing even larger data sets. Managers are trying to understand if others are facing the same challenges. Another reason could be that recruiting senior data professionals is – and will be – very difficult, and managers are looking for new ideas on how to scale operations.
This in-depth benchmarking is a welcome development. There is always value in exchanging ideas, reviewing the maturity of current operations, and trying to outpace the dearest competitor.
3. The complexity of analytical use cases is on the rise
There is a somewhat clear tendency that as cloud DWs have become the industry standard, the requirements for analytics solutions have become more complex. In the best (or worst?) case, teams work with integrations to hundreds of legacy systems and monstrous amounts of unstructured data. The data should be available 24/7 in a standardized format to support decision-making with constantly evolving reporting requirements.
I’m not sure if I should thank or blame cloud DW vendors for this. The elasticity of the cloud is, of course, a promise fulfilled, but it comes with a cost. In addition to monthly invoices, the number of working hours and calendar time spent to get things done can be surprisingly high if the data platform is inefficiently built or the operations don’t support efficient usage of the platform.
Those were the days when electricity was still cheap, nobody really cared if the on-premises servers were running on +80% CPU usage around the year, and Excel was the ultimate, and the only, enterprise data platform. And to be clear, I’m not saying Excel can’t be the ultimate data platform anymore. Just remember to… well, let’s just say there are certainly some benefits with modern enterprise cloud data platforms.
Now I’m getting off-track! Let’s jump back to 2022.
Today, we are seeing a lot of near real-time analytical requirements, operational analytics use cases, and organizations going ‘360 everything’ with integrated data platforms. Also, self-service analytics is taking huge leaps (sometimes too long – not all end-users want to be data engineers).
To sum up – the maturity of data operations in enterprise organizations varies quite a lot. Because of methodologies such as DataOps, it’s possible to catch up with the more mature data-driven organizations. At the same time, almost everyone is talking about the lack of resources and talent. Organizations are thinking of new ways to scale productivity. Can anyone guess which we see first: a ChatGPT software developer or a ChatGTP data engineer?
Have a nice rest of 2022, and make good data models!