Skip to main content

Data Science Distinguished Lecture Series

Cornell’s Center for Data Science for Enterprise and Society’s Data Science Distinguished Lecture Series invites top tier faculty and industry leaders from around the world who are making ground breaking contributions in data science.  Invitee’s are curated and selected by an advisory board to the Center.  The lectures are a cross-campus event, where each seminar is co-hosted with a flagship lecture series at Cornell, working cooperatively with the Bowers College of Computing and Information Science (including the deparments of Computer Science, Information Science, and Statistics and Data Science), The Schools of Operations Research & Information Engineering and Electrical & Computer Engineering in the College of Engineering, and the Departments of Mathematics and of Economics in the College of Arts and Sciences.  The mission of the Center is to provdie a focal point for data science work, both methodological and application domain-driven.  The audience for these lectures span all of Cornell and, given the breadth of the audience, the talks are meant to by accessible to a wide range of graduate students and faculty working in data science.

Spring 2024: Data Science Distinguished Lecture Series

Thursday, March 7
G01 Biotech 4:15pm
Jelena Bradic

Co-sponsored with the Department of Statistics and Data Science


Exploring Robustness: Bridging Theory and Practice for Enhanced Discovery

ABSTRACT  This talk seeks to redefine the boundaries of statistical robustness. For too long, the field has languished in the shadows of contamination models, adversarial constructs, and outlier management—approaches that, while foundational, scarcely scratch the surface of potential that model misspecification offers. Our research reveals a fundamental link between robustness and causality, initiating an innovative era in data science. This era is defined by how causality enhances robustness, and in turn, how effectively applied robustness opens up unprecedented opportunities for scientific exploration.

BIO  Jelena Bradic is a Professor at the University of California, San Diego, where she specializes in statistics and data science within the Department of Mathematics and the Halicioglu Data Science Institute. Her research focuses on developing robust statistical methods that are resistant to model misspecification, with particular emphasis on high-dimensional data analysis, causal inference, and machine learning applications. Bradic holds a Ph.D. from Princeton University and has made significant contributions to the field of statistics. She is the co-editor in Chief of the first interdisciplinary journal between ACM and IMS named ACM/IMS Journal of Data Science, and  the recipient of several prestigious awards, including the  Wijsman Lecture (2023), a Discussion Paper in the Journal of the American Statistical Association (2020) and a Hellman Fellowship, recognizing her as a leading figure in statistical science.

Tuesday, February 6
11:45 a.m.
Gates G01
Stefanie Jegelka
co-sponsored with the Department of Computer Science
Learning with Structure: Graphs, Geometry and Generalization

One grand goal of machine learning is to design widely applicable and resource-efficient learning models that are robust, even under commonly occurring distribution shifts in the input data. A promising step towards this goal is to understand and exploit “structure”, in the input data, latent space, model architecture and output. In this talk, I will illustrate examples of exploiting different types of structure in two main areas: representation learning on graphs and learning with symmetries.

First, graph representation learning has found numerous applications, including drug and materials design, traffic and weather forecasting, recommender systems and chip design. In many of these applications, it is important to understand (and improve) the robustness of relevant deep learning models. Here, we will focus on approaches to estimate and understand out-of-distribution predictions on graphs, e.g., training on small graphs and testing on large graphs with different degrees.

Second, in many applications, e.g. in chemistry, physics, biology or robotics, the data have important symmetries. Modeling such symmetries can help data efficiency and robustness of a model. Here, we will see an example of such a modeling task — neural networks on eigenvectors — and its benefits. Moreover, a formal, general analysis quantifies how symmetries improve data efficiency.

BIO  Stefanie Jegelka is a Humboldt Professor at TU Munich and an Associate Professor in the Department of EECS at MIT. Before joining MIT, she was a postdoctoral researcher at UC Berkeley, and obtained her PhD from ETH Zurich and the Max Planck Institute for Intelligent Systems. Stefanie has received a Sloan Research Fellowship, an NSF CAREER Award, a DARPA Young Faculty Award, the German Pattern Recognition Award, a Best Paper Award at ICML and an invitation as sectional lecturer at the International Congress of Mathematicians. She has co-organized multiple workshops on (discrete) optimization in machine learning and graph representation learning, and has served as an Action Editor at JMLR and a program chair of ICML 2022. Her research interests span the theory and practice of algorithmic machine learning, in particular, learning problems that involve combinatorial, algebraic or geometric structure.

Monday, February 5
4:45 p.m.
Klarman Hall, Rhodes-Rawlings Auditorium
Moon Duchin, Tufts University

 UNIVERSITY LECTURE

Algorithms, Race, and Redistricting: Can Computers Find Fairness? 

 This is a University Lecture and is co-sponsored  with the Center for Data Science for Enterprise and Society and the Jeb E. Brooks School of Public Policy

 

Today’s Supreme Court is unmistakably inclined to reject the use of race-conscious measures in law and policy — as Chief Justice Roberts memorably put it, “The way to stop discrimination on the basis of race is to stop discriminating on the basis of race.” The 2023 term saw high-profile challenges to the use of race data in college admissions and in political redistricting. On the gerrymandering front, the state of Alabama asked the court to adopt a novel standard using algorithms to certify race-neutrality, on the principle that computers don’t know what you don’t tell them. But do blind approaches find fairness? In this talk, Professor Duchin will review the very interesting developments of the last few decades — and the last few months! — on algorithms, race, and redistricting.

BIO  Moon Duchin is a Professor of Mathematics and a Senior Fellow in the Tisch College of Civic Life at Tufts University. Her pure mathematical work is in geometry, topology, groups, and dynamics, while her data science work includes collaborations in civil rights, political science, law, geography, and public policy on large-scale projects in elections and redistricting. She has recently served as an expert in redistricting litigation in Wisconsin, North Carolina, Alabama, South Carolina, Pennsylvania, Texas, and Georgia.  Her work has been recognized with an NSF CAREER grant, a Guggenheim Fellowship, and a Radcliffe Fellowship, and she is a Fellow of the American Mathematical Society. 

Friday, February 2
3:45 p.m. Gates G01
Moon Duchin, Tufts University

A Systems View of Fair Elections

This talk is part of the CAM Colloquium and Cornell’s Center for Data Science for Enterprise and Society’s Data Science Distinguished Lecture Series

ABSTRACT: The mathematical attention to voting systems has come in waves: the first wave was bound up with the development of probability, the second wave was axiomatic, and the third wave is computational. There are many beautiful results giving guarantees and obstructions when it comes to the provable properties of systems of election. But the axioms and objectives from this body of work are not a great match for the practical challenges of 21st century democracy. I will discuss ideas for bringing the tools of modeling and computation into closer conversation with the concerns of policymakers and reformers in the voting rights sphere. In particular, I’ll take a close look at ranked choice voting (or, as Politico recently called it, “the hottest political reform of the moment”).

Fall 2023: Data Science Distinguished Lecture Series

 

Monday, October 23
3:45 p.m. Gates Hall Room 114
Bin Yu, UC Berkeley

Veridical Data Science Toward Trustworthy AI

Co-sponsored with the Department of Statistics and Data Science and the CS Theory Seminar

AI is like nuclear energy–both promising and dangerous.”
Bill Gates, 2019

Data Science is central to AI and has driven most of recent advances in biomedicine and beyond. Human judgment calls are ubiquitous at every step of a data science life cycle (DSLC): problem formulation, data cleaning, EDA, modeling, and reporting. Such judgment calls are often responsible for the “dangers” of AI by creating a universe of hidden uncertainties well beyond sample-to-sample uncertainty.
To mitigate these dangers, veridical (truthful) data science is introduced based on three principles: Predictability, Computability and Stability (PCS). The PCS framework and documentation unify, streamline, and expand on the ideas and best practices of statistics and machine learning. In every step of a DSLC, PCS emphasizes reality check through predictability, considers computability up front, and takes into account of expanded uncertainty sources including those from data curation/cleaning and algorithm choice to build more trust in data results. PCS will be showcased through collaborative research in finding genetic drivers of a heart disease, stress-testing a clinical decision rule, and identifying microbiome-related metabolite signature for possible early cancer detection.

BIO   Bin Yu is Chancellor’s Distinguished Professor and Class of 1936 Second Chair in Statistics, EECS, and Computational Biology at UC Berkeley. Her recent research focuses on statistical machine learning practice, algorithm, and theory, veridical data science for trustworthy AI, and interdisciplinary data problems in neuroscience, genomics, and precision medicine. She is a member of the U. S. National Academy of Sciences and American Academy of Arts and Sciences. She was a Guggenheim Fellow, Tukey Memorial Lecturer of the Bernoulli Society, and Rietz Lecturer of the Institute of Mathematical Statistics (IMS) , and won the E. L. Scott Award given by the Committee of Presidents of Statistical Societies (COPSS). She delivered the IMS Wald Lectures and the COPSS Distinguished Achievement Award and Lecture (DAAL) (formerly Fisher) at the Joint Statistical Meetings (JSM) in August, 2023. She holds an Honorary Doctorate from The University of Lausanne. She served on the inaugural scientific advisory board of the UK Turing Institute of Data Science and AI, and is serving on the editorial board of PNAS and as a senior advisor at the Simons Institute for the Theory of Computing at UC Berkeley.


Thursday, October 5-6, 2023
4:30 pm, Statler Auditorium
Fei Fei Li, Stanford University

What We See and What We Value: AI With a Human Perspective

ABSTRACT In this talk, Dr. Li will present her research with students and collaborators to develop intelligent visual machines using machine learning and deep learning methods. The talk will focus on how neuroscience and cognitive science inspired the development of algorithms that enabled computers to see what humans see and how we can develop computer algorithms and applications to allow computers to see what humans don’t see. Dr. Li will also discuss social and ethical considerations about what we do not want to see or do not want to be seen, and corresponding work on privacy computing in computer vision, as well as the importance of addressing data bias in vision algorithms. She will conclude by discussing her current work in smart cameras and robots in healthcare as well as household robots as examples of AI’s potential to augment human capabilities.

BIO  Dr. Fei-Fei Li is the inaugural Sequoia Professor in the Computer Science Department at Stanford University and Co-Director of Stanford’s Human-Centered AI Institute. She served as the Director of Stanford’s AI Lab from 2013 to 2018. From January 2017 to September 2018, she was Vice President at Google and served as Chief Scientist of AI/ML at Google Cloud. Dr. Fei-Fei Li obtained her B.A. degree in physics from Princeton in 1999 with High Honors and her Ph.D. degree in electrical engineering from California Institute of Technology (Caltech) in 2005.

Dr. Li’s current research interests include cognitively inspired AI, machine learning, deep learning, computer vision, and AI+healthcare, especially ambient intelligent systems for healthcare delivery. She has also worked on cognitive and computational neuroscience. Dr. Li has published over 200 scientific articles in top-tier journals and conferences and is the inventor of ImageNet and ImageNet Challenge, a critical large-scale dataset and benchmarking effort that has contributed to the latest developments in deep learning and AI. A leading national voice for advocating diversity in STEM and AI, she is co-founder and chairperson of the national non-profit AI4ALL, aimed at increasing inclusion and diversity in AI education. Dr. Li is an elected Member of the National Academy of Engineering (NAE), the National Academy of Medicine (NAM) and American Academy of Arts and Sciences (AAAS).


Tuesday, September 26, 2023
Stefan Wager, Stanford University

Treatment Effects in Market Equilibrium

Co-sponsored with the ORIE Colloquium

ABSTRACT When randomized trials are run in a marketplace equilibriated by prices, interference arises. To analyze this, we build a stochastic model of treatment effects in equilibrium. We characterize the average direct (ADE) and indirect treatment effect (AIE) asymptotically. A standard RCT can consistently estimate the ADE, but confidence intervals and AIE estimation require price elasticity estimates, which we provide using a novel experimental design. We define heterogeneous treatment effects and derive an optimal targeting rule that meets an equilibrium stability condition. We illustrate our results using a freelance labor market simulation and data from a cash transfer experiment.

BIO Stefan is an associate professor of Operations, Information, and Technology at the Stanford Graduate School of Business, and an associate professor of Statistics (by courtesy). His research lies at the intersection of causal inference, optimization, and statistical learning. He is particularly interested in developing new solutions to problems in statistics, economics and decision making that leverage recent advances in machine learning.


Wednesday, August 30
4:20 p.m. Gates Hall G01 (reception 3:15, Gates 3rd floor lounge)
Andrew Piper, McGill University

VIDEO RECORDING

Computational Narrative Understanding and the Human Desire to Make-Believe
This is a University Lecture co-sponsored with the IS Colloqium and The Center for Data Science

BIO Andrew Piper is Professor and William Dawson Scholar in the Department of Languages, Literatures, and Cultures at McGill University. He directs the Bachelor of Arts and Science program at McGill and is editor of the Journal of Cultural Analytics. His work focuses on using the tools of data science, machine learning, and natural language processing to study human storytelling. He is the director of .txtlab, a laboratory for cultural analytics, and author most recently of, Enumerations: Data and Literary Study (Chicago 2018) and Can We Be Wrong? The Problem of Textual Evidence in a Time of Data (Cambridge 2020).

ABSTRACT Narratives play an essential role in shaping human beliefs, fostering social change, and providing a sense of personal meaning, purpose and joy. Humans are in many ways primed for narrative. In this talk, I will share new work from my lab that leverages emerging techniques in computational narrative understanding to study human storytelling at large scale. What are the cues that signal to readers or listeners that narrative communication is happening? How do imaginary stories differ from true ones and what can this tell us about the value of fictional storytelling for everyday life? How might we imagine large-scale narrative observatories to measure public and political health and well-being? As we face growing skepticism around the purpose of humanistic study, this talk will argue that data-driven and fundamentally inter-disciplinary approaches to the study of storytelling can help restore public confidence in the humanities and initiate new pathways for research that address pressing public needs.

Spring 2023: Data Science Distinguished Lecture Series

Friday, March 10
Kristian Lum
University of Chicago Data Science Institute

Defining, Measuring, and Reducing Algorithmic Amplification

Co-sponsored with the AI Seminar and IS Colloquium

BIO Kristian Lum is an Associate Research Professor at the University of Chicago Data Science Institute. Previously, she was a Sr. Staff Machine Learning Researcher at Twitter where she led research as part of the Machine Learning Ethics, Transparency, and Accountability (META) team. She is a founding member of the ACM Conference on Fairness, Accountability, and Transparency and has served in various leadership roles since its inception, growing this community of scholars and practitioners who care about the responsible use of machine learning systems, and she is a recent recipient of the COPSS Emerging Leaders Award and NSF Kavli Fellow. Her research looks into (un)fairness of predictive models with particular attention to those used in a criminal justice setting.

ABSTRACT  As people consume more content delivered by recommender systems, it has become increasingly important to understand how content is amplified by these recommendations. Much of the dialogue around algorithmic amplification implies that the algorithm is a single machine learning model acting on a neutrally defined, immutable corpus of content to be recommended. However, there are several other components of the system that are not traditionally considered part of the algorithm that influence what ends up on a user’s content feed. In this talk, I will enumerate some of these components that influence algorithmic amplification and discuss how these components can contribute to amplification and simultaneously confound its measurement. I will then discuss several proposals for mitigating unwanted amplification, even when it is difficult to measure precisely.

 

Wednesday, May 10 – 3:30 – 4:30 pm, Gates 114
Jessica Hullman
Northwestern University

Toward Robust Data Visualization for Inference

Co-sponsored with the IS Colloquium

BIO Dr. Jessica Hullman is the Ginni Rometty Associate Professor of Computer Science at Northwestern University. Her research addresses challenges that arise when people draw inductive inferences from data interfaces. Hullman’s work has contributed visualization techniques, applications, and evaluative frameworks for improving data-driven inference in applications like visual data analysis, data communication, privacy budget setting, and responsive design. Her current interests include theoretical frameworks for formalizing and evaluate the value of a better interface and elicitation of domain knowledge for data analysis. Hullman’s work has been awarded best paper awards at top visualization and HCI venues. She is the recipient of a Microsoft Faculty Fellowship (2019) and NSF CAREER, Medium, and Small awards as PI, among others.

ABSTRACT  Research and development in computer science and statistics have produced increasingly sophisticated software interfaces for interactive visual data analysis, and data visualizations have become ubiquitous in communicative contexts like news and scientific publishing. However, despite these successes, our understanding of how to design robust visualizations for data-driven inference remains limited. For example, designing visualizations to maximize perceptual accuracy and users’ reported satisfaction can lead people to adopt visualizations that promote overconfident interpretations. Design philosophies that emphasize data exploration and hypothesis generation over other phases of analysis can encourage pattern-finding over sensitivity analysis and quantification of uncertainty. I will motivate alternative objectives for measuring the value of a visualization, and describe design approaches that better satisfy these objectives. I will discuss how the concept of a model check can help bridge traditionally exploratory and confirmatory activities, and suggest new directions for software and empirical research.

Fall 2022: Data Science Distinguished Lecture Series

November 15, 2022 @ 4:15 : Center for Data Science for Enterprise & Society,  Data Science Distinguished Lecture Series

Nonlinear Optimization in the Presence of Noise

This talk is co-sponsored with the School of Operations Research and Information Engineering

Speaker: Jorge Nocedal

Date: Tuesday, November 15, 2022

Time/Location:
4:15 p.m. , Rhodes Hall, Room 253
Reception 3:45 p.m., Rhodes 258

Abstract:  We begin by presenting three case studies that illustrate the nature of noisy optimization problems arising in practice. They originate in atmospheric sciences, machine learning, and engineering design. We wish to understand the source of the noise (e.g. a lower fidelity model, sampling or reduced precision arithmetic), its properties, and how to estimate it. This sets the stage for the presentation of our goal of redesigning constrained and unconstrained nonlinear optimization methods to achieve noise tolerance.

Bio:  Jorge Nocedal is the Walter P. Murphy Professor in the Department of Industrial Engineering and Management Sciences at Northwestern University. He studied at UNAM (Mexico) and Rice University. His research is in optimization, both deterministic and stochastic, with emphasis on very large-scale problems.  He is a SIAM Fellow, was awarded the 2012 George B. Dantzig Prize and the 2017 Von Neumann Theory Prize, for contributions to theory and algorithms of nonlinear optimization. He is a member of the US National Academy of Engineering.

October 24, 2022 : Center for Data Science for Enterprise & Society,  Data Science Distinguished Lecture Series

Information Provision in Markets

This talk co-sponsored with Computer Science, Information Science, and ORIE

Time:  3:45 – 4:45 p.m.

Location: 114 Gates Hall

Reception prior at 3:15 in 122 Gates Hall

Abstract: Tech-mediated markets give individuals an unprecedented number of opportunities.  Students no longer need to attend their neighborhood school; through school choice programs they can apply to any school in their city.  Tourists no longer need to wander streets looking for restaurants; they can select among them on an app from the comfort of their hotel rooms.  Theoretically, this increased access improves outcomes.  However, realizing these potential gains requires individuals to be able to navigate their options and make informed decisions.  In this talk, we explore how markets can help guide individuals through this process by providing relevant information.  We first discuss a school choice market where students must exert costly effort to learn their preferences.  We show that posting exam score cutoffs breaks information deadlocks allowing students to efficiently evaluate their options.  We next study a recommendation app which can selectively reveal past reviews to users. We show that it’s possible to facilitate learning across users by creating a hierarchical network structure in which early users explore and late users exploit the results of this exploration.

Bio: Nicole Immorlica received her PhD in 2005 from MIT, joined Northwestern University as a professor in 2008, and joined Microsoft Research New England (MSR NE) in 2012 where she currently leads the economics and computation group.  She is the recipient of a number of fellowships and awards including the Sloan Fellowship, the Microsoft Faculty Fellowship and the NSF CAREER Award.  She has been on several boards including SIGecom, SIGACT, the Game Theory Society, and OneChronos; is an associate editor of Operations Research, Games and Economic Behavior and Transactions on Economics and Computation; and was program committee member and chair for several ACM, IEEE and INFORMS conferences in her area.  In her research, Nicole uses tools and modeling concepts from computer science and economics to explain, predict, and shape behavioral patterns in various online and offline markets. She is known for her work on mechanism design, market design and social networks.

Spring 2022: Data Science Distinguished Lecture Series

Sponsored by the Center for Data Science for Enterprise & Society

School Choice in Chile
Speaker: José Correa
This talk is co-sponsored with the School of Operations Research Information Engineering (ORIE)
Date: Monday, April 25th
Reception: 9:30 a.m., Rhodes 258
Talk: 10 – 11 a.m., Rhodes 253

Watch the video of Jose’s talk

More information

Policy Gradient Descent for Control: Global Optimality and Convex Parameterization
Speaker: Maryam Fazel, University of Washington
This talk is co-sponsored with the Center for Applied Mathematics (CAM)
Date: Friday, May 6
3:45 p.m., Rhodes 253

Watch the video  of Maryam’s talk

More information