Computational Social Science (CSS)
Computational Social Science (CSS) is usually defined as the development and application of computational methods to complex, typically large-scale, human (sometimes simulated) behavioral data. link
The field of CSS has exploded in prominence over the past decade as recent years have witnessed an explosion of data from social media sites, such as Facebook and Twitter, the digitization of massive governmental administrative records, and novel opportunities for online field experiments, among other. This influx of new data sources occurs alongside the emergence of innovative forms of data analysis—often inspired by ideas from machine learning.
Yet computational social science is more than just large repositories of digital data and the computational methods needed to construct and analyze them. It also represents a convergence of different fields with different ways of thinking about and doing science. link
Together, these opportunities hold enormous potential to help understand and address some of the world’s most pressing problems. link 1 link 2
They also raise important new questions about privacy and ethics. link 1 link 2
Integrating explanation and prediction in computational social science
Jake M. Hofman1, Duncan J. Watts, Susan Athey, Filiz Garip, Thomas L. Griffiths, Jon Kleinberg, Helen Margetts, Sendhil Mullainathan, Matthew J. Salganik. Simine Vazire, Alessandro Vespignani & Tal Yarkoni.
Abstract: Computational social science is more than just large repositories of digital data an the computational methods needed to construct and analyse them. It also representsa convergence of different fields with different ways of thinking about and doing science. The goal of this Perspective is to provide some clarity around how these approaches differ from one another and to propose how they might be productively integrated. Towards this end we make two contributions. The first is a schema for thinking about research activities along two dimensions—the extent to which work is explanatory, focusing on identifying and estimating causal effects, and the degree of consideration given to testing predictions of outcomes—and how these two priorities can complement, rather than compete with, one another. Our second contribution is to advocate that computational social scientists devote more attention to combining prediction and explanation, which we call integrative modelling, and to outline some practical suggestions for realizing this goal.
Monitoring hiring discrimination through online recruitment platforms
Dominik Hangartner, Daniel Kopp & Michael Siegenthaler
Abstract: Women (compared to men) and individuals from minority ethnic groups (compared to the majority group) face unfavourable labour market outcomes in many economies, but the extent to which discrimination is responsible for these effects, and the channels through which they occur, remain unclear. Although correspondence tests—in which researchers send fictitious CVs that are identical except for the randomized minority trait to be tested (for example, names that are deemed to sound ‘Black’ versus those deemed to sound ‘white’)—are an increasingly popular method to quantify discrimination in hiring practices they can usually consider only a few applicant characteristics in select occupations at a particular point in time. To overcome these limitations, here we develop an approach to investigate hiring discrimination that combines tracking of the search behaviour of recruiters on employment websites and supervised machine learning to control for all relevant jobseeker characteristics that are visible to recruiters. We apply this methodology to the online recruitment platform of the Swiss public employment service and find that rates of contact by recruiters are 4–19% lower for individuals from immigrant and minority ethnic groups, depending on their country of origin, than for citizens from the majority group. Women experience a penalty of 7% in professions that are dominated by men, and the opposite pattern emerges for men in professions that are dominated by women. We find no evidence that recruiters spend less time evaluating the profiles of individuals from minority ethnic groups. Our methodology provides a widely applicable, non-intrusive and cost-effcient tool that researchers and policy-makers can use to continuously monitor hiring discrimination, to identify some of the drivers of discrimination and to inform approaches to counter it.
Mobility network models of COVID-19 explain inequities and inform reopening
Serina Chang, Emma Pierson, Pang Wei Koh, Jaline Gerardin, Beth Redbird, David Grusky & Jure Leskovec.
Abstract: The coronavirus disease 2019 (COVID-19) pandemic markedly changed human mobility patterns, necessitating epidemiological models that can capture the effects of these changes in mobility on the spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Here we introduce a metapopulation susceptible–exposed–infectious–removed (SEIR) model that integrates fine-grained, dynamic mobility networks to simulate the spread of SARS-CoV-2 in ten of the largest US metropolitan areas. Our mobility networks are derived from mobile phone data and map the hourly movements of 98 million people from neighbourhoods (or census block groups) to points of interest such as restaurants and religious establishments, connecting 56,945 census block groups to 552,758 points of interest with 5.4 billion hourly edges. We show that by integrating these networks, a relatively simple SEIR model can accurately ft the real case trajectory, despite substantial changes in the behaviour of the population over time. Our model predicts that a small minority of ‘superspreader’ points of interest account for a large majority of the infections, and that restricting the maximum occupancy at each point of interest is more effective than uniformly reducing mobility. Our model also correctly predicts higher infection rates among disadvantaged racial and socioeconomic groups solely as the result of differences in mobility: we find that disadvantaged groups have not been able to reduce their mobility as sharply, and that the points of interest that they visit are more crowded and are therefore associated with higher risk. By capturing who is infected at which locations, our model supports detailed analyses that can inform more-effective and equitable policy responses to COVID-19.
Computational social science: Obstacles and opportunities
David M. J. Lazer, Alex Pentland, Duncan J. Watts, Sinan Aral, Susan Athey, Noshir Contractor, Deen Freelon, Sandra Gonzalez-Bailon, Gary King, Helen Margetts, Alondra Nelson, Matthew J. Salganik, Markus Strohmaier, Alessandro Vespignani, Claudia Wagner
Abstract: The field of computational social science (CSS) has exploded in prominence over the past decade, with thousands of papers published using observational data, experimental designs, and large-scale simulations that were once unfeasible or unavailable to researchers. These studies have greatly improved our understanding of important phenomena, ranging from social inequality to the spread of infectious diseases. The institutions supporting CSS in the academy have also grown substantially, as evidenced by the proliferation of conferences, workshops, and summer schools across the globe, across disciplines, and across sources of data. But the field has also fallen short in important ways. Many institutional structures around the field—including research ethics, pedagogy, and data infrastructure—are still nascent. We suggest opportunities to address these issues, especially in improving the alignment between the organization of the 20th-century university and the intellectual requirements of the field.
Meaningful measures of human society in the twenty-first century
David Lazer, Eszter Hargittai, Deen Freelon, Sandra Gonzalez-Bailon, Kevin Munger, Katherine Ognyanova & Jason Radford
Abstract: Science rarely proceeds beyond what scientists can observe and measure, and sometimes what can be observed proceeds far ahead of scientific understanding. The twenty-first century offers such a moment in the study of human societies. A vastly larger share of behaviours is observed today than would have been imaginable at the close of the twentieth century. Our interpersonal communication, our movements and many of our everyday actions, are all potentially accessible for scientific research; sometimes through purposive instrumentation for scientific objectives (for example, satellite imagery), but far more often these objectives are, literally, an afterthought (for example, Twitter data streams). Here we evaluate the potential of this massive instrumentation—the creation of techniques for the structured representation and quantification—of human behaviour through the lens of scientific measurement and its principles. In particular, we focus on the question of how we extract scientific meaning from data that often were not created for such purposes. These data present conceptual, computational and ethical challenges that require a rejuvenation of our scientific theories to keep up with the rapidly changing social realities and our capacities to capture them. We require, in other words, new approaches to manage, use and analyse data.