The Subjects and Stages of AI Dataset Development

Tags: Artificial Intelligence

Presented by the Daniels Fund Ethics Initiative Collegiate Program at Colorado Law and Silicon Flatirons.

There has been increased attention toward the datasets that are used to train and build AI technologies from the computer science and social science research communities, but less from legal scholarship. Both Large-Scale Language Datasets (LSLDs) and Large-Scale Computer Vision Datasets (LSCVDs) have been at the forefront of such discussions, due to recent controversies involving the use of facial recognition technologies, and the discussion of the use of publicly-available text for the training of massive models which generate human-like text. Many of these datasets serve as “benchmarks” to develop models that are used both in academic and industry research, while others are used solely for training models. The process of developing LSLDs and LSCVDs is complex and contextual, involving dozens of decisions about what kinds of data to collect, label, and train a model on, as well as how to make the data available to other researchers. However, little attention has been paid to mapping and consolidating the legal issues that arise at different stages of this process: when the data is being collected, after the data is used to build and evaluate models and applications, and how that data is distributed more widely.

Join Alex Hanna as she discusses her recent paper.

Can’t make it in person? Register for virtual attendance to watch the livestream via Zoom Webinar. Unique join links will be distributed approximately 24 hours prior to the event.

Artificial Intelligence, or AI, is increasingly in use by both the government and the private sector, from the allocation of government benefits to the creation of smart contracts. However, poorly designed AI systems can lead to significant harms, including but not limited to discrimination. The question of how to regulate and build ethical AI is central. This lecture series will emphasize the practical applications of AI technology and ways to ensure principle-based ethics are a key focus of both development and regulation.

This is one of four sessions scheduled for the fall 2022/spring 2023, each with a different featured speaker. Follow the links below to register for each individual date you’d like to attend:

All times Mountain.

Sessions

03/20/23 5:15pm - 6:00pm

Dinner Reception

@ Wolf Law Building, Boettcher Hall

Light dinner and refreshments provided for registrants.

03/20/23 6:00pm - 7:30pm

Lecture, Discussion, Q&A

@ Wolf Law Building, Classroom 204 & Livestream

Alex Hanna — Presenter
Director of Research, The Distributed AI Research Institute
Morgan Klaus Scheuerman — Commenter
PhD Student in Information Science, University of Colorado Boulder

: Alex Hanna

: Morgan Klaus Scheuerman

03/20/23 5:15pm - 7:30pm

Wolf Law Building, Classroom 204 & Livestream

Parking Map/Instructions

Registration is closed

Alex Hanna
The Distributed AI Research Institute, Director of Research

Morgan Klaus Scheuerman
University of Colorado Boulder, PhD Student in Information Science

#siliconflatirons

The State of Colorado Supreme Court Board of Continuing Legal & Judicial Education has accredited this event as a continuing legal education seminar for a total of 1 general credit. Credits in Colorado are based on a 50 minute per credit hour scale. Session intros, breaks, keynotes, and Q&A sessions are not eligible for credit.

Course ID: 826977

The Subjects and Stages of AI Dataset Development

Can’t make it in person? Register for virtual attendance to watch the livestream via Zoom Webinar. Unique join links will be distributed approximately 24 hours prior to the event.

Sessions

Dinner Reception

Light dinner and refreshments provided for registrants.

Lecture, Discussion, Q&A

When

Location

Parking

Register

Presenter

Commenter

Social

CLE Credit