SoDA Workshop Series - Introduction to Data Science with R (Workshop 17: Big Data and High Performance Computing in Social Science Research)

Big Data & High Performance Computing in Social Science Research
When Mar 12, 2018
from 01:30 PM to 02:30 PM
Where B001 Sparks -- The Databasement
Contact Name
Attendees All interested members of the PSU community are welcome to attend.
Add event to calendar vCal
iCal

Big Data & High Performance Computing in Social Science Research  

In this workshop we will briefly wrap up our discussion of more advanced web scraping techniques and then launch into an overview of big data and HPC in social science research. Our main goal will be to get everyone acquainted with the strategies and tools available for working with large and complex datasets in R (and at Penn State), and when to make use of them. More specifically, we will discuss efficient programming, parallelization, programming languages, and computer hardware, and how all of these fit together to accomplish complex data management tasks.

General Information about the Workshop Series

Do you want to develop the skills to program and manage data using R? If so, this workshop series is for you! We will be meeting (almost) weekly for an hour throughout the semester to cover everything from basic R programming up through big data analytics and high performance computing. This workshop series will start with several weeks introducing R and basic R programming, so no prior experience is required (only a laptop). We will then move on to a series of workshops on reading in, cleaning, transforming, and combining multiple, complex datasets (including text and social network data) -- using our newfound R programming skills. Once we have the basics of data mangament down, we will cover web-based data collection, both from traditional web pages, and from the Twitter API. Finally, we will get into performance and scalability issues, and go over the steps for accessing the ICS cluster resources at Penn State.

These workshops will be offered (most) Mondays during the Spring 2018 semester from 1:30-2:30 in Sparks B001 (The DataBasement). Directions to the DataBasement here: http://bdss.psu.edu/pdf-folder/finding-the-sparks-databasement . Bring laptop! 

The instructor for the workshop is Matt Denny, who can be contacted at mdenny@psu.edu.

Materials (including slides, video tutorial, and pictorial tutorial)  for this and past workshops are available on the workshop website: https://github.com/matthewjdenny/SoDA-Workshop-Series-Introduction-to-Data-Science. 

Filed under: , ,