Main goal of Data Platform
Data Platform articles
A framework for thinking about a data platform – we need to flesh out some principles to build upon to continue the series.
Vision and mission talk about the reasons for existing, what to do and how. It helps to find a balance between being concise and saying everything, between being open-ended and having boundaries, and between being general and still actionable.
First, I’d like to address why it even makes sense to think about the vision and mission of a data platform. It’s such a new concept that maybe it’s too early or unnecessary? The concept’s novelty is precisely why it’s the perfect time to define some context and set some limits. Otherwise, we’ll end up in a situation where what we build expands to touch everything and good luck then handing parts of it back to other teams.
Let’s start with choosing between being reactive vs. proactive. If a function of an organization is always reactive, then resourcing that function only happens when there is a severe lack of supply in the face of overwhelming demand. It’s also known as a constant stream of urgent work, quick fixes instead of proper solutions, and unhappy humans. The sad part is that getting out of this vicious circle requires time to think, which is already in very short supply. Been there, done that, definitely don’t want to repeat it.
Proactive it is then. We need a plan or a roadmap; for that, we need a goal – there’s a higher chance of getting somewhere if we have a destination in mind. No need to worry too much about the specific purpose, though, as the minute we start talking about it, there will be feedback. It’s worth emphasizing – all this thinking and work on vision, mission, and goals would come to nothing if nobody knew about it. Communication is vital, and verbalizing these thoughts early is recommended. Bonus points for doing it throughout the process and asking for feedback along the way! 🙂
Based on the goals and the roadmap, we can estimate the resources needed – humans with specific skills, software vendors with certain functionalities, etc. Most importantly, we can see the things we still have to learn. It’s all good, though – that’s what being proactive is about.
It brings us to the following assumption: Does this work need to be visible, or is the goal to become invisible? It sounds like a weird question, so let me expand on this.
The accounting department/function is usually invisible in the sense that it’s so good that nobody has to worry about it or know its internal workings of it. The default state is that things work correctly and on time – salaries, bills, and taxes are paid, everything is nice and legal, accounting is not a bottleneck for growth, etc. Same with janitors, lawyers, and whoever keeps the fridge stocked with milk for the coffee.
Product development is the other end of the spectrum of visibility as it needs to be seen by everyone for alignment or at least awareness. If the product direction were to be hidden and nobody knew the people doing it, there would be no direction.
In my opinion, a “data platform” should be closer to the invisible end of this continuum. It cannot be completely invisible, though, as being data-driven has the expectation that many people will interact with data in some capacity. Thus, it seems closer to people operations or management in the sense that seamless enablement and human focus are critical.
So we have a somewhat visible part of the organization with a human focus attempting to automate some data-related stuff.
I propose this mission statement:
“We help data guide the way”
It’s what the data platform team does.
Given these requirements, we can now start to think more clearly about who should be our first hire in this area. We’d like them to have:
- excellent communication skills, of course, as whatever they build won’t be useful unless it’s also used. Helping users is a large part of the job for a long time, so it should be something they like to do and should not be overlooked.
- generalist data-related technical skills as they need to start building some structure within the mess that already exists.
That first person will need a lot of context, but compactly. Working on the right thing is paramount as it’s trivial to spend whole days and weeks on problems that might have a simple temporary workaround. Experience and pragmatism are definitely of high value.
How to fail less with the first hire?
The first aspect I’d like to touch on is the technical choice of build vs. buy and some guidance on tools. Not that the choices are made beforehand, but that there is clarity on (cloud) platform to be used, budget, vendor selection process, and other considerations. Same with headcount plans – it makes a significant difference whether there is a plan to hire a team of 6 within a few months or maybe hire one other person next year.
In the first couple of weeks, there should be some meetings set up in addition to the regular things (direct lead, closest stakeholders, etc.) :
- legal/infosec – state of sensitive data and boundaries around it
- CTO or someone else who owns the adoption of new vendors
- head of product – the roadmap for the next year – are there mostly enhancements of current features or massive expansion into novel things, all of which will need analytics?
The aim of those meetings is twofold – an explanation of what’s a data platform and what’s going to happen when, but also not to make a wrong turn right off the gates.
What else to keep in mind?
Everyone benefits from having someone to discuss problems and solutions with. Especially in this case, as data engineering work is a systems integration work and the challenges are somewhat unique. Things like how to store microsecond precision data in a system with millisecond precision. Or what would be a valuable way to partition some of the larger datasets. Or how to create a proper audit trail for the sensitive data. Or what happens when they go on vacation or fall sick?
My point is that the inability of a single person in a specific function to talk about these things can be a problem. The first solution is that the direct lead should be interested and collaborative and have time to help. The second solution is to hire another data engineer. And the third solution, in case the first two are not an option, is to encourage and support the single data engineer to attend some meetups and conferences.
What next?
It feels like a good place to introduce how I see the progression of value in the area of data engineering/platform.
The number of humans working on data engineering:
- nobody – analytics and reporting are painful and hard to trust
- 1 person – less pain for others. Tools are chosen and pipelines are built, but most things are ad-hoc, not documented, and fail often.
- 2 people – less pain for ourselves. It’s necessary to start packaging, documenting, automating, and all those cool things just to work within a team. There is still no slack, but from time to time, it makes sense to refactor an odd choice made earlier. All the work is still around making data available.
- 3 people – do some planning. Think about roadmaps, frameworks, features, and costs. At this point, whoever goes on vacation can leave the laptop in the office. Most of the work is still making data available and timely, but some time can be spent on actual forward-looking planning.
- 4 people – start adding a little value on the analytics side. Focus on making the life of the main customers (analysts) a little easier. Start adding more observability, lineage, testing, and possibly an additional tool for some specific use case. There is still no slack, but there might be enough cognitive capacity across the team to support a subset of users with something that helps them directly.
- 5 people – start adding a little value on the product/engineering side. Finally, we are covering most of the things related to the pipelines and destinations and can put some focus on the source side – product owners and engineers who are producing the data in the first place. It’s less about tooling and more about communication, education, code reviews, feedback loops, and other proactive approaches to uplevel that part of the organization regarding working with data.
- 6 people – add resiliency. At this point, we have a good part of the company built up, supporting the vast majority of reporting and analytics, deeply integrated with many diverse internal and external systems. It also means that if things break, then the impact will be felt. An on-call rotation, as well as specific efforts on other non-functional requirements of any software system, like scalability and many others, might be needed. (relevant article)
The following article will be about the how part – “we help data guide the way … by x, y, z“.