A Fourth Wave of Open Data?
Exploring the Spectrum of Scenarios for
Open Data and Generative AI
Since late 2022, generative AI services and large language models (LLMs) have transformed how many individuals access, and process information. However, how generative AI and LLMs can be augmented with open data from official sources and how open data can be made more accessible with generative AI - potentially enabling a Fourth Wave of Open Data - remains an under explored area.
For these reasons, The Open Data Policy Lab decided to explore the possible intersections between open data from official sources and generative AI.
Our report, “A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI,” (May 2024) provides a new framework and recommendations to support open data providers and other interested parties in making open data “ready” for generative AI.
Spectrum of Scenarios Framework
The report includes five distinct scenarios in which open data from official sources and generative AI can intersect. Each of these scenarios includes case studies from the field and a specific set of requirements that open data providers can focus on to become ready for a scenario.
Using this approach, we aim to make progress towards these scenarios and in the long-term become ready for all possible scenarios.
Recommendations for Advancing Open Data and Generative AI
While developing the “Spectrum of Scenarios”, we learned a lot about the different requirements behind open data for generative AI. Drawing on our lessons learned, we developed the following data governance and management recommendations to help improve access to, gain greater insights from, open data.
Enhance Transparency and Documentation: Improving transparency and documentation of open data can not only foster ethical and responsible use throughout the data lifecycle, but can also help data holders and users to better evaluate the lineage, quality and impact of the output.
Uphold Quality and Integrity: Upholding data quality and integrity are key when thinking about advancing generative AI for open data inference and insight generation, data augmentation, and open ended exploration.
Promote Interoperability and Standards: Improving the interoperability of data and promoting the adoption of shared data and metadata standards would address many long standing pain points in the open data ecosystem that prevent the efficient and effective use of open data.
Improve Accessibility and Usability: As the uses of open data expand in light of developments in the generative AI ecosystem, it is more important than ever before to take action to protect data subjects and prevent harm.
Address Ethical Considerations: As the uses of open data expand in light of developments in the generative AI ecosystem, it is more important than ever before to take action to protect data subjects and prevent harm.
Visit our our growing Observatory of real-world examples where open data is combined with generative AI.
The Observatory features 80+ examples across domains and geographies–ranging from supporting legal work within the Government of France to helping researchers across the African continent in navigating cross-border data sharing laws.
The examples include generative AI chatbots to improve access to services, conversational tools to help analyze data, datasets to improve the quality of the AI output, and more.
If you have any questions or feedback or would like to share how this framework helped you, please contact us at datastewards@thegovlab.org.
The Open Data Policy Lab
The Open Data Policy Lab is a resource hub supporting decision-makers at the local, state and national levels as they work toward accelerating the responsible reuse and sharing of open data for the benefit of society and the equitable spread of economic opportunity.
The GovLab
The Governance Lab (GovLab), based at NYU’s Tandon School of Engineering, seeks to improve people’s lives by changing how we govern. We believe that increased availability and use of data and new ways to leverage the capacity, intelligence, and expertise of people can improve the way we make decisions and solve public problems.