The problem with language: Why we need a climate concept store

by Harrison Pim, Data Scientist

Climate policy concepts defy simple categorisation.

Consider the concept of “sea level rise” and the ways that it might be mentioned in climate policy documents - should sea level rise be labelled as a type of marine hazard, or is it a slow onset event? How about the tangle of policy instruments which are deployed to advance or hinder climate action; Is it useful to distinguish between the application of subsidies and the removal of subsidies as two different instruments? Is a “carbon tax” a market-based or an economic instrument? 

Most of our early experiments in sorting this conceptual data into meaningful taxonomies relied on conventional software tools like spreadsheets and documents, which were passed back and forth between our team of policy experts. While the prototypes were promising, they also revealed the limitations of those simple knowledge management patterns. The interconnectedness between concepts and subdomains demanded a more sophisticated approach to knowledge representation - one that could accommodate the true complexity of the conceptual space, while remaining accessible and maintainable.

While our policy experts wrestled with the taxonomies, our data science team was responsible for building ‘classifiers’ to automatically identify mentions of those concepts in our documents. And we did manage to develop some great tools through direct collaboration like our targets classifier, which was trained to find mentions of targets set by governments or organisations in their published policies. However, we knew that we would struggle to scale this process to the hundreds of concepts necessary to give a full picture of the climate policy landscape. Every change to a concept’s definition would require a change to its corresponding classifier, and vice versa, creating a bottleneck in our ability to develop new tools.

Solving these problems would require a different approach.

CPR’s solution builds on established foundations

To help smooth the handoff between data science and policy work, we’ve built a new place to store our concepts (creatively named “the concept store”). The concept store is built on Wikibase, the same technology used by Wikidata. This strategic choice gives us access to more than a decade of community-driven best practices for collaboration. Rather than reinventing the wheel, we're building on a platform that is already used for many of the complex technical challenges that we face in knowledge representation.

Wikibase is particularly well-suited to our needs, as it can naturally represent the web-like nature of climate policy concepts, where ideas are connected through multiple pathways rather than confined to rigid, disconnected hierarchies. That flexibility allows us to capture the true complexity of relationships between concepts while maintaining a structure that's both machine-readable and human-understandable.

Going further: Designing for complexity and scale

While Wikibase provides the technical foundation, the key to making it work in our domain was designing the right data model. The rules which define how our concepts are structured (you may hear us use the word “ontology” to describe this) strike a delicate balance between structure and flexibility: enough structure to ensure consistency and interoperability, with enough flexibility to remain adaptable to the evolving landscape of climate policy. We will be going into more detail in our next post, stay tuned!

Why this matters: New models for interdisciplinary collaboration

The concept store is a significant evolution in how our policy and data science teams work together. Rather than needing constant coordination and alignment between disciplines, the concept store creates clear interfaces between different types of work. That separation allows each team to focus on their core expertise, while contributing to the same goal.

Our policy team can focus on what they do best: deep domain research, defining concepts, and ensuring that our knowledge graph accurately reflects the evolving complexity of climate policy. They can work at their own pace, using familiar tools and methods, without needing to understand the technical details of classifier development or machine learning pipelines.

Meanwhile, our data science team can concentrate on building and improving classifiers, confident that the underlying concept definitions and relationships are well-structured and maintained. The consistent data model means we can develop standardised approaches to classifier development and evaluation, making the process much more efficient and scalable.

Next
Next

POWER Library aims to transform global offshore wind knowledge-sharing