Lauri Koobas

We help data guide the way… how?

I believe that we need to be educators and connectors, first and foremost. At this point, the whole analytics world is still very new to almost everyone – analytics in the sense of data-driven decision-making, not basic monitoring. We are barely scratching the edge of early adopters here compared to how much of the world is tech-based and producing data.

We have to demonstrate the value of the new approach. It involves quite a bit of change, which is always a tough sell. It also places some pretty high expectations on us – in addition to learning technical systems to integrate them, we need to know various company functions for similar reasons. Some examples include nudging a product owner to think of analytical questions up front, convincing a developer to store timestamps in UTC instead of local time so that data could be used later, and teaching an analyst about data partitioning so that their queries run faster, etc.

It might seem over-reaching our mandate or taking on too big of a task, but anything less has failed to have a lasting effect. We have gone through some hype cycles with “big data” and “artificial intelligence” touted as a miracle cure and fountain of youth for your business. In reality, though, they are over-promised and under-delivered. Spreading butter on a puzzle doesn’t make it a sandwich.

So, the communication part is necessary, covering the proactive and visible aspects discussed in the previous post. But it’s insufficient – we are short at least some principles that would help us choose our tools and processes.

Drawing from many years of personal experience across different companies, tools, and cultures, I would prioritize safety and friendliness for users when interacting with data. My first choice will always be a tool where I can play around without anything irreversibly wrong happening. Given that most people infrequently work with data and related tools, we can’t expect everyone to have high proficiency with said tools. The best onboarding experience is the one that is not needed because everything is evident, including the Undo functionality.

Like with many things, it is the same with data work – learning starts with running and then modifying a premade example. And then continues with copying what’s already there instead of building it from scratch every time. That’s true for all components, be it a report, bar chart, data pipeline, or infrastructure piece. We should continue to care for what we create after we have made it – that way, we can be sure that whatever we copy for that new request follows the best practices. Besides, it also feels nice to act professionally, like any other craftsman.

To summarize, I propose the following as the mission for the data platform:

We help data guide the way by teaching humans to create and use it. We provide tools and processes that are safe and friendly. We lead by example, utilizing best practices and keeping our own house clean.

It doesn’t include many of the often specified non-functional requirements (scalability, cost, etc.) as I believe these are both self-evident in the “best practices” and highly variable over time.

 

Level 2, the second data engineer

The “level” refers to the number of humans working on data engineering:

2 people – less pain for ourselves. It’s necessary to start packaging, documenting, automating, and all those cool things to work within a team. There is still no slack, but from time to time, it makes sense to refactor an odd choice made earlier. All the work is still around making data available.

Much is already covered in this whole post, but I will add some additional notes.

There is an immediate and substantial communication overhead from one to two. It’s like going from living alone to living with a partner – many things need to be explained and agreed upon. On the other hand, more stuff will get done, double the experience to develop better solutions, etc.

One aspect of communication is that ownership and responsibility become essential topics. Not that who owns what, but how do you get from one person owning everything to two people sharing and caring? One possible approach is to plan and hire in such a way that an essential piece of the platform will be the primary responsibility of the second person. That way, they will naturally become the owner of it. Of course, there are other ways, but the main point is that it should be an open discussion before the second person starts, ideally even before the job ad is written.

As I’m hinting here, planning is becoming more critical as time goes on and the environment changes. By the time a company goes from zero to two data engineers, the number of analysts, product people, revenue, and many other things have probably multiplied by a lot. That usually also means that the hectic early changes will become less frequent, and planning will begin to make sense.

The following article will be about the “why” part, the vision statement.

Scroll to Top