The Lost Data Consumer

June 14, 2022

In the last 1,5 year I’ve talked to hundreds of businesses about their data problems & stack in almost all industries, in the pursuit of finding product-market-fit with Weld. I have gotten so many insights into business operations in general, but one of the key observations has been that the data world is dangerously polarized right now.

Why? I see it being driven by a few main trends:

  • Modern data tools have so much to offer that companies get overwhelmed with the setup complexity and confuse why they started the data journey in the first place.
  • Because of the tooling complexity, ownership of (commercial) data tooling has shifted to Engineering / R&D who have a hard time driving business value from the data stack, due to the confusion between data engineering and business intelligence.
  • There’s a lack of data analysts who are able to both make the tooling work and understand the business.

These trends often drive misalignment of incentives in departments and end up creating data silos.

In the polarization process, I see 3 main exaggerated personas:

  1. The Business Value Purist
  2. The Infrastructure Junkie
  3. The Lost Data Consumer

They tend to have a hard time understanding each other - you’ll probably recognize all three personas.

The Business Value Purist

The Business Value Purist.

Business Value Purists barely know SQL. They deliver business results. They used to spend time in Microsoft Excel until their boss made them get a Mac. They probably worked in consulting or banking, and now work in data. They don't seem to care if data is clean or correct, they love shadow-IT, they want answers to questions, and they want them fast. They add things like 'built dashboards that generated $420 million dollars in revenue' to their resume. They claim to know a lot about “Big Data, ML & AI”.

Do they know what a materialized view or dbt Macro is? Definitely not. They write a 5000 line spaghetti SQL query composed of Stack Overflow snippets and call it “Marketing Master Data”.

The Infrastructure Junkie

The Infrastructure Junkie.

Infrastructure Junkies love technology. Scalable and fast technology. They write blog posts in Markdown talking about the theoretical future of data science. Where everything is perfectly tested, all tools work seamlessly together, and the modern data stack turns into one large happy family of 25+ pieces of infrastructure. You've probably read their blog posts, fell asleep halfway through, then seen a VC say the article was predicting the future, and shared the blog post with your network anyway. They use terms like ‘streaming’, ‘headless BI’, ‘sub-second queries’, 'metrics layer', 'governance', ‘observability’, 'jinja', and 'CI/CD'. Some of them say that ‘BI is dead’.

The Lost Data Consumer

The Lost Data Consumer.

Lost Data Consumers often work in the commercial department. They are aware of the power of data and are usually forced into doing their own ad hoc analysis. They fiddle around in Hubspot Analytics and Google Sheets, but often end up with the same conclusion ‘I don’t have the data’. They are constantly asked for performance data by the leadership team and desperately try to convince their ‘Analytics Engineer’ to finally build a dashboard. The first metrics arrive - business definitions are wrong. It’s now 2 days until the board meeting, and the Analytics Engineer says something with ‘pull request’, ‘dbt’, ‘CI/CD’ and that it will take a few sprints to fix the metric. That obviously doesn’t work, so they decide to boot up their own Google Sheet powered by SuperMetrics.


The ongoing discussion and statements from these three personas sounds something like the following:

“Infrastructure Junkies are wasting time and money implementing technology without any lens towards business value.” - Business Value Purists

“Business Value Purists are creating tech debt.” - Infrastructure Junkies

“I don’t have any data” - Data Consumer

Some of the thought leaders within data seem to have solved the issue of polarization with ‘Data Mesh’, distributed data operations, but I think it still remains to be seen how resource-heavy that solution is.

How will this polarization develop going forward? I think it’s hard to tell, but hopefully as data tooling matures and the complexity of data infrastructure is lowered, companies shift their focus almost solely towards the use cases of data and how to make metrics trustworthy across departments. This will also enable all three personas to shine and focus on what they are best at.