Lauri Koobas

So far, we have covered how to think about the data platform, its vision, its mission, and which kinds of humans might be best suited to get it started. Let’s now explore how things evolve as we begin implementing these ideas.

In reality, it’s a hard problem because we are attempting to balance the technical readiness of the data platform with the shifting needs of analytics in a growing company in a changing environment. Also, even if we succeed at finding talented people at the right time to do it, the variety of data tools on the market almost guarantees that new hires will need to learn some of them on the job. Some level of disappointment is built into the whole thing by default, and it makes sense to set our expectations accordingly. With that said, this is how I see things go.

Evolution of the platform:

Prototype / minimum viable product and test users; no value generated.

A couple of datasets exist on the data platform and can be queried. It seems like a small step, but it usually hides the complexity of the initial setup: comparing and choosing the tool(s), proper networking and security, connecting the tools to the existing data, building a pipeline (possibly its own tool), and giving access to/teaching a couple of users to test it. Not yet reliable enough to be used.

Usually takes a couple of months, and the only pressure is time and technical complexity. The user expectations are low, and they are not yet disappointed.

Core functionality and early adopters; little value generated.

Pipelines mostly work (still) as the amount of data and number of sources is small. A few analysts can use it for some of their work as the data they request gets added and is usually up to date. Theoretically, it could and should be coordinated with a team that uses a somewhat isolated data set to increase the success of this approach.

It usually also takes a couple of months to reach this stage. The type of stress changes as operating the platform with real users adds a lot of ad-hoc requests, thus introducing context switching and significantly reducing the building speed. Expectations of users tend to exceed actual capacity, leading to disappointment. Hiring more analysts or otherwise forcibly increasing the number of active users leads to more disappointment faster. Proceeding to the next step requires additional people.

Core product and early majority; serious value generated. 

Pipelines work, including more complex situations around quickly changing or extensive datasets. Multiple teams are using the platform, and some analysts may be able to do almost all of their work on it. Unless the team is adequately staffed, then all work is ad-hoc work. An increasing number of users creates more complex challenges, which take more time to solve. The early adopters-turn-advocates take some of the educational and hand-holding burdens. 

Reaching this stage takes months to years. The stress is evolving as awareness of the data platform spreads, creating unrealistic expectations of the features and speed of adoption. Existing users will start hitting the limitations of both software and data as working with raw data gets overwhelming and error-prone. Most everyone is struggling, even though quite a lot of value is being generated.

Complete product and late majority; data as a competitive advantage. 

Everything works, data scientists don’t hate their life, company core systems produce data specifically for analytics, modeling layer / single source of truth type solutions cover most domains, data quality is assumed instead of hoped for, etc.

Reaching this stage takes years of focused investment, effort, and leadership. Most companies don’t even attempt it. Should this stage be reached, then the stress shifts to security and regulatory compliance, as many people and systems handle a large amount of data. Maybe also on how to spend all the profit that comes from being a market leader 🙂

Invisibility

I would tentatively add one final step here – invisibility. This is the stage where the levels of automation, the ubiquity of data-based solutions, and cultural penetration (or data literacy) reach so high that “data work” once again becomes a narrow specialization. Like in the 1970s, everyone was the mechanic of their car, but in the 2020s, problems are infrequent and solved by using an app to get someone to fix it for us.

People

Increasing the data platform team is unavoidable but should be carefully directed toward more automation and tooling. It’s the only way to serve more users and increase analytics speed (see the previous article). In practice, it means a bit of a split between the development and operation of the data platform. Building automation and tooling cannot happen under a constant barrage of ad-hoc requests. Neither can all such requests be ignored until a sufficiently robust general solution is built to solve that specific case. Thus the best thing that can be done is to hire someone who enjoys helping users and suggesting/implementing ways for process automation.

One final aspect not mentioned thus far is managing the costs of infrastructure and tools. It’s a genuine hygiene factor – giving it attention from time to time is helpful, and it will hurt really bad if ignored for too long. However, doing it is work that needs to be planned, executed, and revisited. Tracking and attributing costs is also non-obvious as it’s split between a constellation of providers with their separate pricing models and billing cadence.

Conclusion

It took decades for product and software engineering to learn to work well with each other. And that was just one group talking to one other group. When we now add “data people” as the third group, then only the bravest will venture into the unknown of a unified product-engineering-data organization.

Scroll to Top