Just Published! iSA: supervised aggregated sentiment analysis of social media


iSA: a fast, scalable and accurate algorithm for sentiment analysis of social media content

Information Sciences (Journal IF: 4.038)

Co-authors: Luigi Curini & Stefano M. Iacus

Replication material: www.sciencedirect.com

What is worth remembering:

  • The new algorithm iSA for sentiment/opinion analysis is presented
  • iSA is fast, scalable, accurate and language independent
  • iSA is stable if the number of classes/opinions is large and allows for cross-tabulation
  • iSA works in the case of random and non-random sampling


We present iSA (integrated Sentiment Analysis), a novel algorithm designed for social networks and Web 2.0 sphere (Twitter, blogs, etc.) opinion analysis, i.e. developed for the digital environments characterized by abundance of noise compared to the amount of information. Instead of performing an individual classification and then aggregate the predicted values, iSA directly estimates the aggregated distribution of opinions. Based on supervised hand-coding rather than NLP techniques or ontological dictionaries, iSA is a language-agnostic algorithm (based on human coders’ abilities). iSA exploits a dimensionality reduction approach which makes it scalable, fast, memory efficient, stable and statistically accurate. The cross-tabulation of opinions is possible with iSA thanks to its stability. Through empirical analysis it will be shown when iSA outperforms machine learning techniques of individual classification (e.g. SVM, Random Forests, etc) as well as the only other alternative for aggregated sentiment analysis known as ReadMe.

Call for Papers! Big Data, Digital Data, Textual Data. Milan 15-17 Sept 2016


Big Data, Digital Data, Textual Data: Restructuring Political Science? 

Chairs: Andrea Ceron & Luigi Curini (Università degli Studi di Milano)

Where and When: Milan, 15-17 September 2016.

Deadline to submit abstract: 5 June 2016

Link to: panel description. Submit here!

Call for Papers:

This panel is open to scholars from very different fields, ranging from political science to communication or computer science and information technology. The aim is to gather papers that adopt updated statistical methods (including text analyses techniques) to analyze large-N collections of digital data, either in textual or non-textual form. Any application of Big Data analysis (i.e. open data, social media data, or any large digital textual data) to the study of political institutions or to the study of public opinion dynamics is particularly welcome, but the panel also accepts papers related to other different topics linked with politics and society. Secondary analyses of Big Data performed through traditional statistical techniques are suitable too, particularly if these studies deal with the integration of different sources of data (e.g., survey data and sentiment analysis) or combine datasets from multiple sources (e.g. roll call votes, manifesto data, data on conflicts, pieces of news, etc.). We accept both case studies or longitudinal analyses related to Italy or to any other country, as well as cross-sectional comparative analyses that focus on more countries (related to the present or to the past). Thanks to such contributions, the panel aims to show how the “Big Data revolution” can allow us to solve puzzles involving traditional political science topics (e.g. legislative politics, coalition governments, electoral campaigns, accountability and responsiveness, peace and conflicts, democratization, collective action, agenda setting, etc.).


Political science is undergoing a complex threefold process of revolution, which can be summarized under the label of “Big Data revolution”. Political science is radically changing, from using sparse datasets produced by isolated scholars that work alone, to building up collaborative, interdisciplinary, lab-style research teams that analyze increasing quantities of diverse, highly informative data. Such transformation, from studying problems to solving them, can explain why – at least in some countries – “the influence of quantitative social science (including the related technologies, methodologies, and data) on the real world has been growing fast” (King 2014).

Big Data (i.e. large-N digital or textual data) certainly play a crucial role in such transformation and can contribute to restructuring political science. This process, in fact, benefits from different sources of data that are more and more available to scholars: 1) open data, provided by public or private organizations; 2) a wide array of textual data, produced by political institutions, which are increasingly available in a digital format; 3) digital data, in textual and non-textual form, generated by a growing crowd composed of Internet users and social media users (encompassing citizen-to-citizen and citizen-to-elite interactions, online news, and top-down elite communication).

Such “Big Data revolution” is not only related to data sources. The evolution of our societies toward a “digital world” is a necessary premise. However, the methodological contribution of information technology, which allows us to gather and store huge quantities of data, processing them at an incredibly fast rate, and the new developments in statistics and political methodology, particularly in the field of text analysis (Grimmer and Stewart 2013), are also important in performing such transformation.

Indeed, the recent improvements in terms of automated and supervised text analysis techniques dramatically reduce the costs of analyzing large collections of textual data and allow scholars to study politics and political conflicts through the analysis of written and spoken words. In this regard, a wide range of techniques is now increasingly used by political scientists. These methods range from scaling techniques – like Wordscore (Laver, Benoit and Garry 2003) and Wordfish (Slapin and Proksch 2008) – that measure similarities and differences between political actors, to topic models (Grimmer 2010; Quinn 2010) – that allows scholars to identify the topics discussed in a text.

These techniques can greatly enhance our knowledge on the functioning of political institutions, particularly when applied to large digital data gathered by collective research groups such as the Comparative Agenda Project (Baumgartner, Green-Pedersen and Jones 2006) or the Comparative Manifesto (Lehmann et al. 2015).

The broadening of Internet penetration and the increasing number (30% of world population in 2015) of worldwide citizens active on social networking sites, like Facebook and Twitter, pushed such revolution further. In this new “digital world” citizens share information and opinions online, thereby generating a large amount of data about their tastes and attitudes. The evolution of sentiment analysis (Hopkins and King 2010; Ceron, Curini and Iacus 2016) allows to extract information from these rich sources.

This information can then be successfully exploited to study more in depth the formation and evolution of public opinion (Schober et al. 2016) – particularly by integrating sentiment analysis with traditional survey data (Couper 2013) – in order to study political mobilization (Bennett and Segerberg 2011) or to nowcast and forecast elections (Ceron et al. 2014; Gayo-Avello 2013).


Baumgartner, Frank R., Christoffer Green-Pedersen, and Bryan D. Jones, eds. 2006. Comparative Studies of Policy Agendas. Special issue of the Journal of European Public Policy 13 (7).

Bennett, W.L. and Segerberg, A. (2011). Digital media and the personalization of collective action: Social technology and the organization of protests against the global economic crisis. In Information Communication and Society, 14(6): 770–799.

Ceron, Andrea, Luigi Curini and Stefano M. Iacus. 2016. Social Media and Politics: Nowcasting and Forecasting Elections with Big Data, London: Ashgate, forthcoming, 2016

Ceron, Andrea, Luigi Curini, Stefano M. Iacus, and Giuseppe Porro. 2014. “Every Tweet Counts? How Sentiment Analysis of Social Media Can Improve Our Knowledge of Citizens’ Political Preferences with an Application to Italy and France.” New Media & Society 16:340–58.

Couper, Mick P. 2013. “Is the Sky Falling? New Technology, Changing Media, and the Future of Surveys.” Survey Research Methods 7(3):145–56.

Gayo-Avello, D. (2013). A meta-analysis of state-of-the-art electoral prediction from Twitter data. In Social Science Computer Review, 31(6): 649–679.

Grimmer, J. and Stewart, B.M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. In Political Analysis, 21(3): 267–297.

Hopkins, Daniel, and Gary King. 2010. Extracting systematic social science meaning from text. American Journal of Political Science 54(1):229–47.

King, G. (2014). Restructuring the social sciences: Reflections from Harvard’s Institute for Quantitative Social Science. In Politics and Political Science, 47(1): 165–172.

Laver, Michael, Kenneth Benoit, and John Garry. 2003. Extracting policy positions from political texts using words as data. American Political Science Review 97(02):311–31.

Lehmann P, Matthieß T, Merz N, Regel S, Werner, A (2015) Manifesto Corpus. Version: 2015a. Berlin: WZB Berlin Social Science Center.

Quinn, Kevin. 2010. How to analyze political attention with minimal assumptions and costs. American Journal of Political Science 54(1):209–28.

Schober, Michael F., Pasek, Josh, Guggenheim, Lauren, Lampe, Cliff, and Conrad, Frederick G. (2016). Social media analyses for social measurement. Public Opinion Quarterly 80(1) 180–211

Slapin, Jonathan, and Sven-Oliver Proksch. 2008. A scaling model for estimating time-series party positions from texts. American Journal of Political Science 52(3):705–22.

Just Published! Competing Principals 2.0? Facebook, Renzi and the 2013 Head of State Election


Competing Principals 2.0? The impact of Facebook in the 2013 selection of the Italian Head of State

Italian Political Science Review / Rivista Italiana di Scienza Politica

Acknowledgments: Alberto Fragapane and Alessandra Cremonesi for their contribution to data collection. Two anonymous leaders of the former right-wing PD minority faction for providing ‘expert’ information on the factional affiliation of PD MPs.

Replication material: 

andreaceron.com/publications OR http://thedata.harvard.edu/dvn/dv/ipsr-risp

What is worth remembering:

  • Facebook pressure did not affect MPs’ propensity to express public dissent over the party line
  • Contrary to the general wisdom, unexperienced MPs selected through primaries did not conform to social media requests
  • Social media were not (yet) a new ‘competing principals’
  • More traditional ‘principals’ played a role: factional membership, seniority, primary
  • ‘Sentimeter’ guys were right
  • It’s not so easy to publish ‘negative’ findings!


Motivated by the literature on ‘competing principals’, this article studies the effect of interactive social networking sites on the behavior of politicians. For this purpose, 12,455 comments posted on the Facebook walls of 423 Italian MPs have been analyzed to assess whether Facebook played a role in the selection of the Italian Head of State in 2013, enhancing responsiveness. The statistical analysis reveals that the pressure exerted through social media did not affect MPs’ propensity to express public dissent over the party line, which is instead affected by more traditional ‘principals’ and factors: seniority, primary elections, and factional membership.

Just Published! Public Policy & Mobilization of Online Public Opinion


The “Social Side” of Public Policy: Monitoring Online Public Opinion and Its Mobilization During the Policy Cycle

Policy & Internet

Co-author: Fedra Negri

Acknowledgments: Voices from the Blogs for providing data


What is worth remembering:

  • We found similarities between 1) Survey data, 2) online Sentiment, 3) online Government Consultation
  • Social media data can disclose citizens’ reaction to public policies
  • Social media data can capture stakeholders’ mobilization and de-mobilization processes


This article addresses the potential role played by social media analysis in promoting interaction between politicians, bureaucrats, and citizens. We show that in a “Big Data” world, the comments posted online by social media users can profitably be used to extract meaningful information, which can support the action of policymakers along the policy cycle. We analyze Twitter data through the technique of Supervised Aggregated Sentiment Analysis. We develop two case studies related to the “jobs act” labor market reform and the “#labuonascuola” school reform, both formulated and implemented by the Italian Renzi cabinet in 2014–15. Our results demonstrate that social media data can help policymakers to rate the available policy alternatives according to citizens’ preferences during the formulation phase of a public policy; can help them to monitor citizens’ opinions during the implementation phase; and capture stakeholders’ mobilization and de-mobilization processes. We argue that, although social media analysis cannot replace other research methods, it provides a fast and cheap stream of information that can supplement traditional analyses, enhancing responsiveness and institutional learning.

Just Published! e-Campaigning and Valence Issues in EU Elections 2014


e-Campaigning in the 2014 European elections. The emphasis on valence issues in a two-dimensional multiparty system

Party Politics (Journal IF: 1.830)

Co-authors: Luigi Curini

Replication material: andreaceron.com/publications


What is worth remembering:

  • Parties that are closer to many rivals adopt more valence campaigning
  • In two-dimensions this effect should be higher for ‘positive’ valence campaigning rather than ‘negative’ valence
  • In two-dimensions negative campaigning can benefit many other parties apart from the one that performs it (for proximity reasons)
  • In two-dimensions there is an incentive to tone down the debate
  • e-Campaigning on Twitter provides a novel precious source of information on political issues


The article explores the relationship between the incentives of parties to campaign on valence issues and the ideological proximity between one party and its competitors. Building from the existing literature, we provide a novel theoretical model that investigates this relationship in a two-dimensional multiparty system. Our theoretical argument is then tested focusing on the 2014 European electoral campaign in the five largest European countries, through an analysis of the messages posted by parties in their official Twitter accounts. Our results highlight an inverse relationship between a party’s distance from its neighbors and its likelihood to emphasize valence issues. However, as suggested in our theoretical framework, this effect is statistically significant only with respect to valence positive campaigning. Our findings have implications for the literature on valence competition, electoral campaigns, and social media.

Just Published! Twitter vs Media: First and Second level Agenda Setting in Italy


First and Second Level Agenda-Setting in the Twitter-Sphere. An Application to the Italian Political Debate

Journal of Information Technology & Politics

Co-authors: Luigi Curini & Stefano M. Iacus

Acknowledgments: Voices from the Blogs for providing data

What is worth remembering:

  • We analyze agenda-setting focusing on two salient issues in the Italian political debate: austerity and the public funding of parties (related to Euro-skepticism and anti-politics)
  • We compared Twitter and the Online News
  • Using a Lead-Lag statistical technique we find that mass media still retain
  • First-Level Agenda-Setting: They influence the Twitter-attention toward an issue
  • Journalists can act as watch-dogs as their action can promote further (public) discussion also on anti-establishment issues
  • Using Supervised Sentiment Analysis we find that mass media do not exert Second-Level Agenda-Setting: They do not influence the Twitter-attitudes toward an issue
  • We found a citizen-elite divide between the opinions expressed on SNS and the slant spread by the media elite


The rise of Social Network Sites re-opened the debate on the ability of traditional media to influence the public opinion and act as agenda-setter. To answer this question, the present paper investigates first-level and second-level agenda-setting effects in the online environment by focusing on two Italian heated political debates (the reform of public funding of parties and the debate over austerity). By employing innovative and efficient statistical methods like the lead-lag analysis and supervised sentiment analysis, we compare the attention devoted to each issue and the content spread by online news media and Twitter users. Our results show that online media keep their first-level agenda-setting power even though we find a marked difference between the slant of online news and the Twitter sentiment.

Just Published! Social media electoral forecast: State-of-the-art


Using Social Media to Forecast Electoral Results: A Review of the State of the Art

Italian Journal of Applied Statistics – Statistica Applicata

Co-authors: Luigi Curini & Stefano M. Iacus

Replication material: andreaceron.com/publications

Acknowledgments: Voices from the Blogs for providing data

What is worth remembering:

  • Many scholars tried to predict elections using social media
  • Some methods are better than others
  • Supervised sentiment analysis seems the best choice
  • Predictions are more accurate in countries with Proportional Representation


The paper discusses the advantages of using Supervised Aggregated Sentiment Analysis (SASA) of social media to forecast electoral results and presents an extension of the ReadMe method (Hopkins and King, 2010), which is particularly suitable to addressing a large number of categories (e.g. parties) providing lower standard errors. We analyze the voting intention of social media users in several elections held between 2011 and 2013 in France, Italy, and the United States. We then compare 80 electoral forecasts made using these or other techniques of data-mining and sentiment analysis. The comparison shows that the choice of the method is crucial. Electoral forecasts are also more accurate in countries with higher Internet penetration and given the presence of electoral systems based on proportional representation.