[Bear with us ... more detailed and organized information about ongoing research, opportunities, and resources from C-SoDA is forthcoming.]


C-SoDA Award Accelerator Program Winning Proposals

2018-19 winners to be announced January 2019.


Centers, Labs, and Projects of C-SoDA Faculty and Students

  • Friendly Cities Lab
  • IGaP Lab: Interdependence in Government and Policy
  • QuantDev: The Quantitative Development Group
  • GeoVISTA Center
  • Population Research Institute
  • Intelligent Information Systems Laboratory
  • McCourtney Institute for Democracy Mood of the National Poll
  • Geoinformatics and Earth Observation Laboratory
  • Applied Cognitive Science Lab
  • Militarized Interstate Disputes (MID) Project
  • LISA: Laboratory for Intelligent Systems and Analytics
  • Design Analysis Technology Advancement (DATA) Lab
  • Media Effects Research Lab
  • Center for Big Data and Discovery Informatics
  • Machine Learning and Programming Languages Lab
  • PSU NLP LAB: The Natural Language Processing Lab at Penn State
  • Computational Social Dynamics Lab
  • Crowd-AI Lab
  • Human Language Technologies Lab
  • MAP Lab: Measurement and Applied Psychology Lab
  • The Methodology Center
  • Criminal Justice Research Center
  • Databrary
  • ChoroPhronesis
  • PIKE: Penn State Information, Knowledge, and Web
  • LIONS Center: Center for Cyber-Security, Information Privacy, and Trust
  • Center for Life Course & Longitudinal Studies

Research Themes

C-SoDA & BDSS faculty and students conduct research about a wide variety of data arising from human interaction and what and how we can learn from it, including

  • text data / natural language processing,
  • social networks,
  • spatial and geographic data, intensive longitudinal data
  • image data, video data, computer vision,
  • deep learning / neural nets, machine learning, Bayesian statistics,
  • algorithms, algorithmic bias, privacy
  • measurement, research design, causal inference,
  • crowdsourcing, citizen science,
  • visualization and visual analytics
all in the context of understanding important social phenomena.

Student Research under BDSS-IGERT

Since 2012, BDSS-IGERT PhD student trainees and affiliates have co-authored over 200 papers and related scientific products (data, software, etc.) that collectively have garnered roughly 2000 citations. For details on this research, please see the BDSS-IGERT Google Scholar Page or follow the thematic links. (Please see individual faculty pages for details on social data analytics research in the broader PSU community.)

BDSS-IGERT Research Rotations

Much of this research was catalyzed by the required 'research rotations' which were a part of the two year experience for each of the trainees.  Interested faculty could host one or more of the IGERT trainees in their lab/research group for 1-2 semesters to work on research questions that have big data and social science aspects (recent examples here) These were largely arranged through the 'matchmaking' event held each fall semester. The first was held on September, 2014, and the last was held in September, 2017. 

Externship Project Examples

Team Project Examples

Read about the time when a team of BDSS students and faculty entered a Kaggle machine learning contest sponsored by U.S. Census and used a social scientific approach to break the contest:

Read about projects undertaken by student teams in IGERT classes like SoDA 501, SoDA 502, and GeoSocial Visual Analytics:

SoDA Student Dissertations

C-SoDA / BDSS Research
Collaborating remotely: an evaluation of immersive capabilities on spatial experiences and team membership (International Journal of Digital Earth, 2018)
Citizen monitoring during hazards: validation of Fukushima radiation measurements (GeoJournal, 2018)
Text preprocessing for unsupervised learning: Why it matters, when it misleads, and what to do about it (Political Analysis, 2018)
General and specific utility measures for synthetic data (Royal Statistical Society Series A, 2018)
The evolution of youth friendship networks from 6th to 12th grade: School transitions, popularity and centrality (Social Networks & the Life Course, 2018)
Falling behind: lingering costs of the high school transition for youth friendships and grades (Sociology of Education, 2018)
Feature selection methods for optimal design of studies for developmental inquiry (Journals of Gerontology: Series B, 2018)
More than counting: An intraindividual variability approach to categorical repeated measures (Journals of Gerontology: Series B, 2018)
Learning simpler language models with the differential state framework (Neural Computation, 2017)
Event ordering with a generalized model for sieve prediction ranking (Natural Language Processing, 2017)
Statistical modeling of the default mode brain network reveals a segregated highway structure (Scientific Reports, 2017)
Piecewise latent variables for neural variational text processing (EMNLP, 2017)
Adversary resistant deep neural networks with an application to malware detection (KDD, 2017)
Damage assessment of the urban environment during disasters using volunteered geographic information (Big Data for Regional Science, 2017)
Decision-making in policy governed human-autonomous systems teams (IEEE Smart World Congress, 2017)
Multi-scale FCN with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild (IEEE Computer Vision and Pattern Recognition, 2017)
Smart Library: Identifying books on library shelves using supervised deep learning for scene text reading (Digital Libraries, 2017)
Stochastic weighted graphs: Flexible model specification and simulation (Social Networks, 2017)
NFL draft profiles are full of racial stereotypes. And that matters for when quarterbacks get drafted (Washington Post, 2017)
Statistical models for incorporating data from routine HIV testing of pregnant women at antenatal clinics into HIV/AIDS epidemic estimates (AIDS 2017)
Incorporation of hierarchical structure into estimation and projection package fitting with examples of estimating subnational HIV/AIDS dynamics (AIDS 2017)
Validating Safecast data by comparisons to a U. S. Department of Energy Fukushima Prefecture aerial survey (Journal of Environmental Radioactivity, 2017)
Unifying adversarial training algorithms with data gradient regularization (Neural Computation, 2017)
CompanionViz: Mediated platform for gauging canine health and engancing human-pet interactions (International Journal of Human-Computer Studies, 2017)
Propensity score weighting for a continuous exposure with multilevel data (Health Services and Outcomes Research Methodology, 2016)
Immersive analytics for multi-objective dynamic integrated climate-economy (DICE) models (ACM Interactive Surfaces and Spaces, 2016)
Bag of what? Simple noun phrase extraction for text analysis (EMNLP, NLP + Computational Social Science, 2016)
Using prerequisites to extract concept maps from textbooks (ACM Information and Knowledge Management, 2016)
Racially differentiated language in NFL scouting reports (KDD Large Scale Sports Analytics, 2016)
Inference on the effects of observed features in latent space models for networks (SSRN preprint, 2016)
C-SoDA / BDSS Research - More…