What is data platform
Data Platform Series
- 1.What is data platform (current)
- 2.Main goal of Data Platform
- 3.How that goal is achieved
- 4.Why is it important
- 5.How it evolves
What is Data Platform
Level 0 – no data engineer
For quite a long while, you can do very well without having a dedicated person to provide support for analysts. The need is usually triggered by the evolution of the software in the rest of the company. When you have an architecture where data is stored in more than one place, it will create some complexities for analysis. For example, you may use an external SaaS tool for marketing. It works very well until you want to automatically populate some customer segmentation data into that tool from another SaaS service. Accesses will need to be configured and scheduled; hopefully, some monitoring is attached to it, and maybe it will even get documented before that guy goes on vacation for a month.
It can be done without a data engineer, especially if you have one of those unicorn full stack data scientists working for you. The drawback is that the longer this goes on, the worse the mess gets. Consequently, finding someone to help clean it up will get more challenging as everything is terrible and nothing works. Think about it – that situation is a tough sell for any sensible candidate, especially as they would need to do it alone for a while.
The main point to keep in mind at this level is – to document as much as possible. It will be out of date quickly, but at least there will exist a description of how things were, giving direction on where to start digging.
How to start thinking about it?
The primary function of the data platform is to provide data and tools for analysts. It can be that there is also a regulatory or business need to provide some reporting, but it usually doesn’t start there. At first, it won’t be a data platform you read about in the articles. All you have is a lot of pain and a vague understanding that someone should do something about it, but the current people don’t know how.
Whatever the setup currently is, it can be considered the data platform. Because there are processes in place, reports are being created, decisions are being made, etc. It just really doesn’t meet the needs of the organization as is.
In the beginning, the most value will probably be gained by focusing on enabling – making the data appear, move and be accessible for use. Tooling will come later as it’s meant to increase the efficiency of the data users and directly depends on how many of those users are there.
Why should you care?
On a fundamental level – because someone in your organization who was hired to do a different job is also keeping this stuff working. At least find out who that is and talk to them. Maybe it’s all solid, and they enjoy doing it, and it’s OK for a while longer. Or perhaps they left a year ago, and nobody knows. Either way, it’s worth knowing.
Or maybe you have a few tools that have been bought over the years, all of which promise to solve your data-related needs. And still, nothing works.
When should you hire someone?
Refer to the article linked at the beginning that discusses analytics in various company stages. If you have one analyst, you don’t need a data engineer yet as that one analyst is probably competent – they were hired as the first data person, after all.
Once you plan to hire a few more data users or expand the set of software, it’s a good time to talk to someone about it. Bear in mind that finding your first data engineer is probably as least as complex and time-consuming as finding your first analyst.
An aspect of timing the hiring is the reason that you are hiring. If the main goal is only to lessen the pain of a couple of analysts, then you might be able to wait for a little longer (like a month or two).
On the other hand, if you plan to develop into a data-driven organization that might consider analytics a competitive advantage, then it makes sense to prioritize it. Probably even to the point of continued elevated investment into this capacity ahead of growth and aiding the expansion that is undoubtedly to follow.
Third, the timing depends on your product roadmap. If not much is planned in new functionality, it might be slightly easier to get on top of things on the data side. Should you instead plan to rapidly expand new features that all will want guidance from analytics (as they should), then that would also put more strain on building the data platform.
In summary – if you are reading this article, then it’s not too early to start looking for a data engineer 🙂
What to expect?
It depends on how much you already have in place and how much of a mess it is. In the ideal case, the first data engineer will build up a clean data warehouse, achieving the critical mass needed for analytics quickly and efficiently. Even in the perfect case, there will be a period (a few weeks or months) where work is in progress but doesn’t quite produce value yet.
In almost all other cases, no additional direct value will be generated in the beginning as the existing things will need to be untangled before being replaced with more appropriate ones. The users will also need time to adjust and so on. On a positive note, though – your analysts will probably stick around a while longer (in the hot job market) when there is the hope of help from a data engineer.
There is quite a bit of lead time built into this role, and thus the hiring decision should be earlier than later.
OK, but how?
That’s the tricky part, isn’t it 🙂
Ideally, you’ll find a generalist in this fairly narrow field – someone who can debug and fix whatever you already have, is excellent at communicating, has a vision for the platform, is good at change management, and hopefully understands the business.
Or, you can hire me to help your existing team to prioritize and execute in the most efficient way.