What is Data Platform
Data Platform articles
How to think about it, why and when, and how might you get started?
Many articles about the data warehouse, data platform, and associated buzzwords exist. Most of these talk about a state – “why it’s good to have X.” Very few discuss either the assumptions made, the required previous states, or when it’s not appropriate to strive for X altogether. One important aspect I can’t recall seeing at all is the type of skills, team, and culture/structure that would be appropriate for each of those ideal states.
This series of articles attempts to walk the path of an organization that wants to go from almost no data platform to having widespread use of trusted data and supporting the analytics function.
As with all these types of advice, then it always depends. There is an incredible article (by Tristan Handy of dbt fame) about what the analytics function should look like for companies of various sizes. It makes sense to look at data platforms the same way – it doesn’t depend on the size of the company as much as the number of people using data. These are the main customers of a data platform in the first place.
Like analytics, the platform side is context-dependent – within the same organization, various domains can have vastly differing experiences. Marketing may have multiple analysts working with close to real-time dashboards, while customer support may still be emailing Excel sheets around to understand customer satisfaction.
One might say that tooling bypasses these discrepancies as it benefits everyone equally, but it has several underlying assumptions that might not be met equally:
- data – it exists at the point where the tools can be used. Connecting a fantastic BI tool to an empty warehouse isn’t helpful.
- skills – the people who would benefit from using the tools know how to use the tools. Self-service takes a lot of hand-holding.
- processes – there is a feedback loop about the appropriateness of the tools. The tooling should also be reviewed as the business and analytics requirements change.
While tools are essential, they alone are insufficient to elevate the whole org. Please keep this in mind while getting bombarded by SaaS marketing, all of which promise to solve everything and then some.
Level 0 – no data engineer
For quite a long while, you can do very well without having a dedicated person to provide support for analysts. The need is usually triggered by the evolution of the software in the rest of the company. When you have an architecture where data is stored in more than one place, it will create some complexities for analysis. For example, you may use an external SaaS tool for marketing. It works very well until you want to automatically populate some customer segmentation data into that tool from another SaaS service. Accesses will need to be configured and scheduled; hopefully, some monitoring is attached to it, and maybe it will even get documented before that guy goes on vacation for a month.
It can be done without a data engineer, especially if you have one of those unicorn full stack data scientists working for you. The drawback is that the longer this goes on, the worse the mess gets. Consequently, finding someone to help clean it up will get more challenging as everything is terrible and nothing works. Think about it – that situation is a tough sell for any sensible candidate, especially as they would need to do it alone for a while.
The main point to keep in mind at this level is – to document as much as possible. It will be out of date quickly, but at least there will exist a description of how things were, giving direction on where to start digging.
How to start thinking about it?
The primary function of the data platform is to provide data and tools for analysts. It can be that there is also a regulatory or business need to provide some reporting, but it usually doesn’t start there. At first, it won’t be a data platform you read about in the articles. All you have is a lot of pain and a vague understanding that someone should do something about it, but the current people don’t know how.
Whatever the setup currently is, it can be considered the data platform. Because there are processes in place, reports are being created, decisions are being made, etc. It just really doesn’t meet the needs of the organization as is.
In the beginning, the most value will probably be gained by focusing on enabling – making the data appear, move and be accessible for use. Tooling will come later as it’s meant to increase the efficiency of the data users and directly depends on how many of those users are there.
Why should you care?
On a fundamental level – because someone in your organization who was hired to do a different job is also keeping this stuff working. At least find out who that is and talk to them. Maybe it’s all solid, and they enjoy doing it, and it’s OK for a while longer. Or perhaps they left a year ago, and nobody knows. Either way, it’s worth knowing.
Or maybe you have a few tools that have been bought over the years, all of which promise to solve your data-related needs. And still, nothing works.
When should you hire someone?
Refer to the article linked at the beginning that discusses analytics in various company stages. If you have one analyst, you don’t need a data engineer yet as that one analyst is probably competent – they were hired as the first data person, after all.
Once you plan to hire a few more data users or expand the set of software, it’s a good time to talk to someone about it. Bear in mind that finding your first data engineer is probably as least as complex and time-consuming as finding your first analyst.
An aspect of timing the hiring is the reason that you are hiring. If the main goal is only to lessen the pain of a couple of analysts, then you might be able to wait for a little longer (like a month or two).
On the other hand, if you plan to develop into a data-driven organization that might consider analytics a competitive advantage, then it makes sense to prioritize it. Probably even to the point of continued elevated investment into this capacity ahead of growth and aiding the expansion that is undoubtedly to follow.
Third, the timing depends on your product roadmap. If not much is planned in new functionality, it might be slightly easier to get on top of things on the data side. Should you instead plan to rapidly expand new features that all will want guidance from analytics (as they should), then that would also put more strain on building the data platform.
In summary – if you are reading this article, then it’s not too early to start looking for a data engineer 🙂
What to expect?
It depends on how much you already have in place and how much of a mess it is. In the ideal case, the first data engineer will build up a clean data warehouse, achieving the critical mass needed for analytics quickly and efficiently. Even in the perfect case, there will be a period (a few weeks or months) where work is in progress but doesn’t quite produce value yet.
In almost all other cases, no additional direct value will be generated in the beginning as the existing things will need to be untangled before being replaced with more appropriate ones. The users will also need time to adjust and so on. On a positive note, though – your analysts will probably stick around a while longer (in the hot job market) when there is the hope of help from a data engineer.
There is quite a bit of lead time built into this role, and thus the hiring decision should be earlier than later.
OK, but how?
That’s the tricky part, isn’t it 🙂
Ideally, you’ll find a generalist in this fairly narrow field – someone who can debug and fix whatever you already have, is excellent at communicating, has a vision for the platform, is good at change management, and hopefully understands the business.
Or, you can hire me to help your existing team to prioritize and execute in the most efficient way.