AI, the data gap, and known unknowns
Författare
Anna Stankovski Clark
Senaste ändringar
8 juni 2025
In the push for sustainable transport and smarter cities, data is gold. But not just any data. As I've written before on the difference between Data and Data, we need data on how people actually travel. Real, representative, and relevant data. The right data is so important that I would argue that filling the data gap is itself part of work on sustainability.
Why data matters for sustainability
If we want to build a transport system that is fair, efficient, and sustainable, we need to understand how people actually move. That means gathering data not just about car traffic—but also about walking, cycling, micromobility, and public transport—areas where data is still surprisingly (?) sparse, patchy, and/or biased.
Too often, we hear things like:
“It’s too difficult to collect it, so we model it.”
“Some data is better than no data.”
These may sound pragmatic, but they reflect a deeper problem: when the underlying data is incomplete or skewed, the resulting models will be too. As data scientists often put it: 💩in, 💩out.
The dark side of data models (looking at you, AI): Lessons from Weapons of Math Destruction
In her brilliant book Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, data scientist Cathy O’Neil warns about the dangers of blindly trusting mathematical models. She introduces the term WMDs—Weapons of Math Destruction—to describe algorithms that are:
Opaque: Impossible to understand or audit from the outside
Unregulated: Built and deployed without public scrutiny
Harmful at scale: Affecting millions and reinforcing systemic bias
These models often masquerade as neutral, but in reality, they embed human assumptions and historical injustices. As O’Neil puts it:
“Models are opinions embedded in mathematics.”
This matters in the world of transport planning too because we know we have huge biases in our data. For example, if you were to ask an AI planning algorithm about transport investment, it is likely to prioritise a road widening scheme over new cycle infrastructure because we have plenty of data and examples of models for one, but not the other.
In sustainability work, we must scrutinise not only the models, but the values and gaps that underpin them. Otherwise, we risk building systems that are efficient for those already served—and invisible to everyone else.
AI is just maths (with a confidence problem)
A common misconception is that AI can “fill in the blanks” where data is lacking. But in reality, AI doesn’t create new knowledge—it amplifies patterns already present in the input data. And when that data is biased, the AI confidently replicates those biases.
Take Strava Metro, for instance. It’s a tool used by cities to understand cycling behaviour. But the data comes almost exclusively from fitness-oriented cyclists who use the Strava app—often MAMILs (middle-aged men in lycra). Using this data to plan city-wide cycling infrastructure risks entrenching inequities and ignoring entire groups of everyday cyclists.
Having lots of data about one group (say, car drivers) doesn’t mean we understand transport as a whole. In fact, the absence of data on other mode users is itself a barrier to meaningful progress in sustainable transport.
When is data “good enough”?
This is a thorny but essential question. In an ideal world, we’d have rich, real-time, representative data on every mode of travel and every type of traveller. But we don’t. So what counts as “good enough”?
We need to ask:
What are the known biases in this dataset?
Who is missing from the picture?
Is the data collection method transparent and improvable?
Can we triangulate or combine sources to create a more balanced view?
One thing is clear: “some data” is not always better than none. Poor data can create false confidence, and lead to wrong decisions.
The unsexy but crucial job of closing the data gap
Creating a sustainable transport system isn’t just about building bike lanes or shifting to electric vehicles. It’s also about building better datasets. That means funding and prioritising inclusive, ground-up data collection. This is not only the responsibility of public agencies—it’s something private companies, consultancies, and developers must take seriously as well.
And here's the hard truth: data collection isn’t glamorous. It doesn’t demo well. It doesn’t get the headlines. But it’s the invisible scaffolding that supports every smart, fair, AI-powered decision.
Yet, when you look at where investment goes in the AI sector, the emphasis is clear: generative AI, LLMs, autonomous systems, AI assistants, and predictive analytics dominate. Meanwhile, initiatives focused on improving the quality, representativeness, and ethics of data collection get far less attention—despite being foundational to all the rest.
Time to invest in the uncool stuff
If we want AI to drive real progress, we need to invest in the “uncool” essentials: representative sampling, inclusive surveys, and ethical data infrastructure. The payoff is not just in terms of societal benefits, but also (economic) benefits for companies and individuals, and more effective sustainability reporting. Good data leads to better decisions and better returns.
At Travalytics, we see representative data collection as sustainability work. It is an investment not only in accuracy but in equity—and in getting all of the benefits that can be gleaned from a future sustainable transport system.
Because without the right data, we’re not modelling the future, we’re just repeating the past.
Författare
Anna Stankovski Clark