LIVE LINKS! Click here for live info and links (including posters)
DOUBLE THE EXCITEMENT! Chris Ré and Ben Y. Zhao will give keynotes!

The Innovative Ideas in Data Science (IID) workshop aims to provide a venue for researchers and practitioners from both academia and industry to discuss innovative, thought-provoking, and visionary ideas in data science. The emphasis is on potentially disruptive research directions that challenge current research agendas and suggest future ones.

Many workshops associated with The Web Conference have become more like mini-conferences themselves. Our vision for IID is complementary: we will seek early-stage work on blue-sky, high-risk/high-reward research, where the authors can benefit from community feedback.


Program & Attending IID

IID will be a half-day workshop on Monday, Apr 20, at The Web Conference 2020. As the conference has gone online, and to maximize universal benefit, we decided to start the IID workshop at 14:00 GMT+00:00 (i.e., 7am Pacific time, 10am Eastern time, 4pm Central European time, 10pm Taiwan time; all on April 20). With these times, people across the world will be able to attend.

We open the workshop to anyone who would like to attend, for free. This way, the pandemic has at least one positive side affect, by spreading knowledge in addition to viruses. If you are not registered through the conference, register here (we have 200 slots).

We are proud to have Amazon as the Headline supporter of IID 2020!

In total, 7 papers were accepted at IID 2020, for either oral presentation or poster. Proceedings coming soon.

Times are in the GMT+00:00 time zone
14:00 Welcome
14:05 Featured Papers(time allocation: 10 each, including questions)
Privacy, Altruism, and Experience: Estimating the Perceived Value of Internet Data for medical uses
Gilie Gefen, Omer Ben-Porat, Moshe Tennenholtz and Elad Yom-Tov
Internet-human infrastructures: Lessons from Havana's StreetNet
Abigail Jacobs and Michaelanne Dye
Expanding the scope of reproducibility research through data analysis replications
Jake Hofman, Dan Goldstein, Siddhartha Sen and Forough Poursabzi-Sangdeh
14:35 Short creativity activity
14:55 Keynote 1
Ben Y. Zhao

User-centric Privacy in an ML-Ubiquitous Society

Ben Zhao is the Neubauer Professor of Computer Science at University of Chicago. He completed his PhD from Berkeley (2004) and his BS from Yale (1997). He is an ACM distinguished scientist, and recipient of the NSF CAREER award, MIT Technology Review's TR-35 Award (Young Innovators Under 35), ComputerWorld Magazine's Top 40 Tech Innovators award, Google Faculty award, and IEEE ITC Early Career Award. His work has been covered by media outlets such as Scientific American, New York Times, Boston Globe, LA Times, MIT Tech Review, and Slashdot. He has published more than 160 publications in areas of security and privacy, networked systems, wireless networks, data-mining and HCI (H-index 65). He served as TPC co-chair for the World Wide Web Conference (WWW 2016) and the ACM Internet Measurement Conference (IMC 2018), and is General Co-Chair for ACM HotNets 2020.
The impact of deep learning and its many applications on our lives is undeniable. Today, much of the work in the ML community is focused on developing techniques and algorithms to make them more powerful. Yet as ML models become more powerful, there is increasing evidence that these models are slowly eroding individual privacy of the citizens they affect. Governments, companies, and even nation states can use online data to build powerful classifiers that track us and identify us, usually without any warning or notification to the targets (you and me). For example, the NY Times recently profiled Clearview.AI, a company using online photos to build facial recognition models of millions of citizens without their knowledge or authorization, simply by scraping online photos from social networks and public sources. In this talk, I’m going to argue that we have crossed a line, where the balance of power has now definitively shifted towards data-rich entities like companies and nation states and away from individual citizens. There is a real need to develop user-centric privacy protections against deep learning classifiers that try to restore the balance and increase privacy protections for individuals. I will talk briefly about Fawkes, our new work that introduces user-side tools that perturb your own images (in imperceptible ways) such that, if capture and used to build a facial recognition model against you, would produce incorrect models that misclassify you as someone else. Fawkes works with 96%-100% effectiveness on state of the art facial recognition systems from Amazon, Microsoft, and Face++. I’ll wrap up by talking about other work in this direction and why this direction of research is critical for user privacy moving forward.
15:25 Keynote 2
Chris Ré
Observational Supervision & Analyst Exhaust

Christopher (Chris) Ré is an associate professor in the Department of Computer Science at Stanford University. He is in the Stanford AI Lab and is affiliated with the Statistical Machine Learning Group. His recent work is to understand how software and hardware systems will change as a result of machine learning along with a continuing, petulant drive to work on math problems. Research from his group has been incorporated into scientific and humanitarian efforts, such as the fight against human trafficking, along with products from technology and enterprise companies. He cofounded a company, based on his research into machine learning systems, that was acquired by Apple in 2017. More recently, he cofounded SambaNova systems based, in part, on his work on accelerating machine learning. He received a SIGMOD Dissertation Award in 2010, an NSF CAREER Award in 2011, an Alfred P. Sloan Fellowship in 2013, a Moore Data Driven Investigator Award in 2014, the VLDB early Career Award in 2015, the MacArthur Foundation Fellowship in 2015, and an Okawa Research Grant in 2016. His research contributions have spanned database theory, database systems, and machine learning, and his work has won best paper at a premier venue in each area, respectively, at PODS 2012, SIGMOD 2014, and ICML 2016.
Abstract
As machine learning systems become more embedded in our daily lives, there is an opportunity for these systems to learn passively from our interactions with these systems. This talk discusses some rough ideas that we have been exploring including supervision obtained by instrumenting analyst software, eye trackers with radiologists and other subject matter experts, and a foray into consumer devices. As an example, gaze data is rich: it not only reveals salient portions of an image or video, but the psychology literature suggests that gaze can also convey more subtle cues, such as confidence. Moreover, trained analysts often have routine patterns, and deviation from those patterns is significant. We are exploring is the circumstances in which one can practically and provably learn from this style of supervision with minimal or no conventional supervision. This talk will be short on results and long on other people's ideas that I've found interesting.
15:55 Fireside chat with Ben Y. Zhao and Chris Ré
16:20 Poster spotlight talks (3 min each)
Training Machine Learning Models With Causal Logic
Ang Li, Suming J. Chen, Jingzheng Qin and Zhen Qin
Learning Multi-granular Quantized Embeddings for Large-Vocab Categorical Features in Recommender Systems
Wang-Cheng Kang, Derek Zhiyuan Cheng, Ting Chen, Xinyang Yi, Dong Lin, Lichan Hong and Ed H. Chi
Methods to Evaluate Temporal Cognitive Biases in Machine Learning Prediction Models
Christopher G. Harris
Can Celebrities Burst Your Bubble?
Tuğrulcan Elmas, Kristina Hardi, Rebekah Overdorf and Karl Aberer
16:35 Closing
16:40 Virtual Poster Session

Video Recording!


Call for Papers

Many workshops associated with The Web Conference have become more like mini-conferences themselves. Our vision for IID is complementary: we will seek early-stage work on blue-sky, high-risk and high-reward research, where the authors can benefit from community feedback. In particular, we will focus on short position papers, up to 4 pages long. We are especially (but not exclusively) looking for papers that do at least one of the following:

  • identify fundamental open questions (including both new questions and old-but-forgotten questions)
  • introduce a new way of thinking
  • offer a constructive critique of current research agendas
  • reframe or debunk existing work
  • report unexpected early results
  • suggest promising but unproven ideas
  • propose novel evaluation methods

An ideal submission is one that (1) is likely to stimulate discussion and (2) has the potential to open up a new line of research. We do not require full evaluations; well-reasoned arguments or preliminary evaluations could be sufficient.

Important Dates

If authors want paper to appear in proceedings:
Submission Friday, 24 January 2020, 23:59 Anywhere-on-Earth Time
Notification Monday, 10 February, 2020
Camera-ready Monday, 17 February, 2020
Workshop Monday, April 20, 2020

If authors do not want paper to appear in proceedings:

Submission Friday, 21 February 2020, 23:59 Anywhere-on-Earth Time
Notification Friday, 6 March, 2020
Workshop Monday, April 20, 2020

Submission Information

All papers will be peer reviewed, single-blinded. We welcome novel research papers, work-in-progress papers, and visionary papers.

Submissions must be in PDF, written in English, no more than 4 pages long (not including references). Shorter papers are welcome. Please format your paper using the ACM SIG conference proceedings template (use sample-sigconf.pdf as the template) available here.

For accepted papers, at least one author must attend the workshop to present the work.

For paper submission, proceed to the IID 2020 submission website.


Keynotes

Chris Ré
Stanford
Observational Supervision & Analyst Exhaust
Christopher (Chris) Ré is an associate professor in the Department of Computer Science at Stanford University. He is in the Stanford AI Lab and is affiliated with the Statistical Machine Learning Group. His recent work is to understand how software and hardware systems will change as a result of machine learning along with a continuing, petulant drive to work on math problems. Research from his group has been incorporated into scientific and humanitarian efforts, such as the fight against human trafficking, along with products from technology and enterprise companies. He cofounded a company, based on his research into machine learning systems, that was acquired by Apple in 2017. More recently, he cofounded SambaNova systems based, in part, on his work on accelerating machine learning. He received a SIGMOD Dissertation Award in 2010, an NSF CAREER Award in 2011, an Alfred P. Sloan Fellowship in 2013, a Moore Data Driven Investigator Award in 2014, the VLDB early Career Award in 2015, the MacArthur Foundation Fellowship in 2015, and an Okawa Research Grant in 2016. His research contributions have spanned database theory, database systems, and machine learning, and his work has won best paper at a premier venue in each area, respectively, at PODS 2012, SIGMOD 2014, and ICML 2016.

As machine learning systems become more embedded in our daily lives, there is an opportunity for these systems to learn passively from our interactions with these systems. This talk discusses some rough ideas that we have been exploring including supervision obtained by instrumenting analyst software, eye trackers with radiologists and other subject matter experts, and a foray into consumer devices. As an example, gaze data is rich: it not only reveals salient portions of an image or video, but the psychology literature suggests that gaze can also convey more subtle cues, such as confidence. Moreover, trained analysts often have routine patterns, and deviation from those patterns is significant. We are exploring is the circumstances in which one can practically and provably learn from this style of supervision with minimal or no conventional supervision. This talk will be short on results and long on other people's ideas that I've found interesting.
Ben Y. Zhao
University of Chicago
User-centric Privacy in an ML-Ubiquitous Society
Ben Zhao is the Neubauer Professor of Computer Science at University of Chicago. He completed his PhD from Berkeley (2004) and his BS from Yale (1997). He is an ACM distinguished scientist, and recipient of the NSF CAREER award, MIT Technology Review's TR-35 Award (Young Innovators Under 35), ComputerWorld Magazine's Top 40 Tech Innovators award, Google Faculty award, and IEEE ITC Early Career Award. His work has been covered by media outlets such as Scientific American, New York Times, Boston Globe, LA Times, MIT Tech Review, and Slashdot. He has published more than 160 publications in areas of security and privacy, networked systems, wireless networks, data-mining and HCI (H-index 65). He served as TPC co-chair for the World Wide Web Conference (WWW 2016) and the ACM Internet Measurement Conference (IMC 2018), and is General Co-Chair for ACM HotNets 2020.
The impact of deep learning and its many applications on our lives is undeniable. Today, much of the work in the ML community is focused on developing techniques and algorithms to make them more powerful. Yet as ML models become more powerful, there is increasing evidence that these models are slowly eroding individual privacy of the citizens they affect. Governments, companies, and even nation states can use online data to build powerful classifiers that track us and identify us, usually without any warning or notification to the targets (you and me). For example, the NY Times recently profiled Clearview.AI, a company using online photos to build facial recognition models of millions of citizens without their knowledge or authorization, simply by scraping online photos from social networks and public sources. In this talk, I’m going to argue that we have crossed a line, where the balance of power has now definitively shifted towards data-rich entities like companies and nation states and away from individual citizens. There is a real need to develop user-centric privacy protections against deep learning classifiers that try to restore the balance and increase privacy protections for individuals. I will talk briefly about Fawkes, our new work that introduces user-side tools that perturb your own images (in imperceptible ways) such that, if capture and used to build a facial recognition model against you, would produce incorrect models that misclassify you as someone else. Fawkes works with 96%-100% effectiveness on state of the art facial recognition systems from Amazon, Microsoft, and Face++. I’ll wrap up by talking about other work in this direction and why this direction of research is critical for user privacy moving forward.

Organizers

Dafna Shahaf
The Hebrew University of Jerusalem
Robert West
EPFL
Ashton Anderson
University of Toronto
Jie Tang
Tsinghua University
Contact us at:
iid-workshop (at) googlegroups.com

Sponsors & Supporters


Program Committee

Paul Bennett (Microsoft Research)
Emre Kiciman (Microsoft Research)
Maxime Peyrard (EPFL)
Julian McAuley (UCSD)
David Garcia (CSH)
Alex Libov (Amazon)
Adam Kalai (Microsoft Research)
Johan Ugander (Stanford University)
Alex Peysakhovich (Facebook)
Jake Hofman (Microsoft Research)
Dan Goldstein (Microsoft Research)