BDSS-IGERT Speaker Series - Aylin Caliskan (Princeton)

"A Story of Discrimination and Unfairness: Bias in Word Embeddings"
When Jan 27, 2017
from 12:15 PM to 01:30 PM
Where B001 Sparks -- The Databasement
Contact Name
Contact Phone 814-267-2720
Attendees All interested members of the PSU community are welcome to attend.
Add event to calendar vCal

Aylin CaliskanAylin Caliskan is a Postdoctoral Research Associate and a CITP Fellow at Princeton University. Her work on the two main realms, security and privacy, involves the use of machine learning and natural language processing. She currently works on big-data-driven discrimination and inference through machine learning. She also has ongoing research on privacy preserving information disclosure and contextual integrity. 

In her previous work, she demonstrated that de-anonymization is possible through analyzing linguistic style in a variety of textual media, including social media, cyber criminal forums, source code, and executable binaries. She is extending her work to develop countermeasures against de-anonymization. Aylin's other research interests include designing privacy enhancing tools to prevent unnecessary private information disclosure whilequantifying and characterizing human privacy behavior. She holds a PhD in Computer Science from Drexel University and a Master of Science in Robotics from the University of Pennsylvania

Presentation: "A Story of Discrimination and Unfairness: Bias in Word Embeddings"

Artificial intelligence and machine learning are in a period of astounding growth. However, there are concerns that these technologies may be used, either with or without intention, to perpetuate the prejudice and unfairness that unfortunately characterizes many human institutions. Here we show for the first time that human-like semantic biases result from the application of standard machine learning to ordinary language—the same sort of language humans are exposed to every day. We replicate a spectrum of standard human biases as exposed by the Implicit Association Test and other well-known psychological studies. We replicate these using a widely used, purely statistical machine-learning model—namely, the GloVe word embedding—trained on a corpus of text from the Web. Our results indicate that language itself contains recoverable and accurate imprints of our historic biases, whether these are morally neutral as towards insects or flowers, problematic as towards race or gender, or even simply veridical, reflecting the status quo for the distribution of gender with respect to careers or first names. These regularities are captured by machine learning along with the rest of semantics. In addition to our empirical findings concerning language, we also contribute new methods for evaluating bias in text, the Word Embedding Association Test (WEAT) and the Word Embedding Factual Association Test (WEFAT). Our results have implications not only for AI and machine learning, but also for the fields of psychology, sociology, and human ethics, since they raise the possibility that mere exposure to everyday language can account for the biases we replicate here.

Further details

A light lunch will be available starting at noon. All interested members of the Penn State community are welcome to attend.

Directions to the Databasement in Sparks can be found here

The Big Data Social Science Speaker Series is sponsored by the Social Sciences Research Institute (SSRI), with additional support from the Big Data Social Science (BDSS) IGERT, the Graduate Program in Social Data Analytics (SoDA), and the Quantitative Social Science Initiative (QuaSSI).