Back to writing

Main goal of Data Platform

Part 2 of 5: Data Platform Series

Data Platform Series

  1. 1.What is data platform
  2. 2.Main goal of Data Platform (current)
  3. 3.How that goal is achieved
  4. 4.Why is it important
  5. 5.How it evolves

Main goal of Data Platform

“We help data guide the way”

It’s what the data platform team does.

Given these requirements, we can now start to think more clearly about who should be our first hire in this area. We’d like them to have:

  • excellent communication skills, of course, as whatever they build won’t be useful unless it’s also used. Helping users is a large part of the job for a long time, so it should be something they like to do and should not be overlooked.
  • generalist data-related technical skills as they need to start building some structure within the mess that already exists.
  • That first person will need a lot of context, but compactly. Working on the right thing is paramount as it’s trivial to spend whole days and weeks on problems that might have a simple temporary workaround. Experience and pragmatism are definitely of high value.

    How to fail less with the first hire?

    The first aspect I’d like to touch on is the technical choice of build vs. buy and some guidance on tools. Not that the choices are made beforehand, but that there is clarity on (cloud) platform to be used, budget, vendor selection process, and other considerations. Same with headcount plans – it makes a significant difference whether there is a plan to hire a team of 6 within a few months or maybe hire one other person next year.

    In the first couple of weeks, there should be some meetings set up in addition to the regular things (direct lead, closest stakeholders, etc.) :

  • legal/infosec – state of sensitive data and boundaries around it
  • CTO or someone else who owns the adoption of new vendors
  • head of product – the roadmap for the next year – are there mostly enhancements of current features or massive expansion into novel things, all of which will need analytics?
  • The aim of those meetings is twofold – an explanation of what’s a data platform and what’s going to happen when, but also not to make a wrong turn right off the gates.

    What else to keep in mind?

    Everyone benefits from having someone to discuss problems and solutions with. Especially in this case, as data engineering work is a systems integration work and the challenges are somewhat unique. Things like how to store microsecond precision data in a system with millisecond precision. Or what would be a valuable way to partition some of the larger datasets. Or how to create a proper audit trail for the sensitive data. Or what happens when they go on vacation or fall sick?

    My point is that the inability of a single person in a specific function to talk about these things can be a problem. The first solution is that the direct lead should be interested and collaborative and have time to help. The second solution is to hire another data engineer. And the third solution, in case the first two are not an option, is to encourage and support the single data engineer to attend some meetups and conferences.

    What next?

    It feels like a good place to introduce how I see the progression of value in the area of data engineering/platform.

    The number of humans working on data engineering:

  • nobody – analytics and reporting are painful and hard to trust
  • 1 person – less pain for others. Tools are chosen and pipelines are built, but most things are ad-hoc, not documented, and fail often.
  • 2 people – less pain for ourselves. It’s necessary to start packaging, documenting, automating, and all those cool things just to work within a team. There is still no slack, but from time to time, it makes sense to refactor an odd choice made earlier. All the work is still around making data available.
  • 3 people – do some planning. Think about roadmaps, frameworks, features, and costs. At this point, whoever goes on vacation can leave the laptop in the office. Most of the work is still making data available and timely, but some time can be spent on actual forward-looking planning.
  • 4 people – start adding a little value on the analytics side. Focus on making the life of the main customers (analysts) a little easier. Start adding more observability, lineage, testing, and possibly an additional tool for some specific use case. There is still no slack, but there might be enough cognitive capacity across the team to support a subset of users with something that helps them directly.
  • 5 people – start adding a little value on the product/engineering side. Finally, we are covering most of the things related to the pipelines and destinations and can put some focus on the source side – product owners and engineers who are producing the data in the first place. It’s less about tooling and more about communication, education, code reviews, feedback loops, and other proactive approaches to uplevel that part of the organization regarding working with data.
  • 6 people – add resiliency. At this point, we have a good part of the company built up, supporting the vast majority of reporting and analytics, deeply integrated with many diverse internal and external systems. It also means that if things break, then the impact will be felt. An on-call rotation, as well as specific efforts on other non-functional requirements of any software system, like scalability and many others, might be needed. (relevant article)
  • The following article will be about the how part – “*we help data guide the way … **by x, y, z*“.