The Subjects and Stages of AI Dataset Development

Tags: Artificial Intelligence


Presented by the Daniels Fund Ethics Initiative Collegiate Program at Colorado Law and Silicon Flatirons.

There has been increased attention toward the datasets that are used to train and build AI technologies from the computer science and social science research communities, but less from legal scholarship. Both Large-Scale Language Datasets (LSLDs) and Large-Scale Computer Vision Datasets (LSCVDs) have been at the forefront of such discussions, due to recent controversies involving the use of facial recognition technologies, and the discussion of the use of publicly-available text for the training of massive models which generate human-like text. Many of these datasets serve as “benchmarks” to develop models that are used both in academic and industry research, while others are used solely for training models. The process of developing LSLDs and LSCVDs is complex and contextual, involving dozens of decisions about what kinds of data to collect, label, and train a model on, as well as how to make the data available to other researchers. However, little attention has been paid to mapping and consolidating the legal issues that arise at different stages of this process: when the data is being collected, after the data is used to build and evaluate models and applications, and how that data is distributed more widely.

Join Alex Hanna as she discusses her recent paper.

Register Now

Can’t make it in person? Register for virtual attendance to watch the livestream via Zoom Webinar. Unique join links will be distributed approximately 24 hours prior to the event.

Artificial Intelligence, or AI, is increasingly in use by both the government and the private sector, from the allocation of government benefits to the creation of smart contracts. However, poorly designed AI systems can lead to significant harms, including but not limited to discrimination. The question of how to regulate and build ethical AI is central. This lecture series will emphasize the practical applications of AI technology and ways to ensure principle-based ethics are a key focus of both development and regulation.

This is one of four sessions scheduled for the fall 2022/spring 2023, each with a different featured speaker. Follow the links below to register for each individual date you’d like to attend:

All times Mountain.


03/20/23 5:15pm - 6:00pm
Dinner Reception

@ Wolf Law Building, Boettcher Hall

Light dinner and refreshments provided for registrants.
03/20/23 6:00pm - 7:30pm
Lecture, Discussion, Q&A

@ Wolf Law Building, Classroom 204 & Livestream

  • Alex Hanna — Presenter
    Director of Research, The Distributed AI Research Institute
  • Morgan Klaus Scheuerman — Commenter
    PhD Student in Information Science, University of Colorado Boulder

Know What’s Next