United we Stand?

The hot topic of Climate Change, through the lens of the UN General Debate

Miguel Ramos Lopes https://www.linkedin.com/in/miguelrlopes/
2019-10-17

Context & Inspiration

The United Nations General Assembly (UNGA) is the main deliberative, policy-making and representative body of the UN, being the only principal organ in which all member states have equal representation, consecrated in one vote per country1.

Every year since its first session, held in 1946 with only 51 nations in attendance, representatives of UN member states gather at the annual sessions of the General Assembly. The centerpiece of the UNGA is the General Debate, a forum in which government officials and senior diplomats address the assembly in the form of a speech, delivering statements that reflect their country’s perspective on pressing issues in foreign policy.

Given the truly global nature of the UNGA, which congregates 193 countries and 2 permanent observer states2, the General Debate provides great insight into the affairs that shape the contemporary diplomatic context, covering subjects that encompass poverty and economic growth, peace and security, Human Rights’ violations and threats to sustainable development. On this matter, given its sense of urgency due to the potential impact on Earth’s ecosystems, human health and civilization itself, climate change overshadows all other topics.

While climate change is not new to the planet we inhabit, as stated in numerous scientific studies on global temperature change over documented climate cycles (Petit et al. 1999), the level of climate disruption stemming from human influence now threatens to introduce seemingly small but critical deviations from the stable regime that allowed human societies to develop and prosper.

Having established the significance of the statements delivered at the UN General Debate, effectively convening the foreign policy interests and priorities of each intervening country at the world stage, in this blogpost I will conduct an exploratory analysis of the UN General Debate text corpus - focusing on the topic of climate change, with the aim of answering three key questions:

Insights uncovered throughout the analysis will culminate with the assessment of one final landmark inquiry – to what extent are the nations of the world truly united in the wake of the threat posed by climate change?

Content & Methodology

The original transcription of the UN General Debate text corpus was collected by Alexander Baturo, Niheer Dasandi, and Slava Mikhaylov, who presented the findings of their work in the paper “Understanding State Preferences With Text As Data: Introducing the UN General Debate Corpus” (Baturo, Dasandi, and Mikhaylov 2017).

The actual dataset used in this analysis was retrieved from Kaggle3, uploaded and curated by Rachael Tatman to further include the complete text corpus of statements from 1970 (Session 25) to 2015 (Session 70). The data is contained in one master table, with the following fields:

The analysis was conducted in R, following the tidytext principles defined by Julia Silge and David Robinson in the book “Text Mining with R” (Silge and Robinson 2016). The R Markdown file containing the fully reproducible source code is hosted on Github.

Exploratory Analysis

This section encloses a selection of key summary metrics of the UN General Debate text corpus, allowing the reader to get acquainted with the dataset and establishing context to the analysis.

One of the first questions that naturally arises when exploring the data at hand is how rich is the text corpus? From the 25th to the 70th session a total of 7507 statements were delivered, by a total of 199 distinct countries, observer states and supranational bodies. Since the first session included in the dataset took place over 45 years ago, how has the number of speeches per year changed or time?

Held in 1970, the 25th session of the United Nations General Debate was the stage of 70 speeches, delivered by the heads of state, government officials or diplomatic representatives in attendance. This number has risen over time, surpassing 150 for the first time in 1987. The highest number of recorded interventions was registered in 2012, when the 193 countries and two permanent observer states addressed the assembly - amounting to 195 speeches. 193 speeches were delivered in 2015, the last year in the dataset, a number that coincides with the current total number of UN member states.

The number of words per speech is an auxiliary quantitative metric that conveys how the General Debate sessions have mutated over time. Computing the word count of a given statement is a nimble task in the tidytext framework - one must simply tokenize the text into individual words. The outcome of this procedure is illustrated in the table below, taking the first few words of the Albanian speech delivered in 1970 as an example.

session year country speech token
25 1970 ALB May I first convey to our President… may
25 1970 ALB May I first convey to our President… i
25 1970 ALB May I first convey to our President… first
25 1970 ALB May I first convey to our President… convey
25 1970 ALB May I first convey to our President… to
25 1970 ALB May I first convey to our President… our
25 1970 ALB May I first convey to our President… president

The evolution of speech length, measured as the word count of each country’s statement, is presented in the graph below in the form of one box plot5 per year. A smooth curve was fitted to the distribution, to assist the reader’s interpretation of the trend.

It is noticeable that speech length has decreased over time, from a median of approximately 4000 words in the 1970s and early 1980s to roughly 2000 words in recent years. This change was promoted by reform plans to the General Debate code of conduct, that introduced a voluntary guideline of 20 minutes per statement in 19976, that has since shifted to 15 minutes per intervention.

Although the time limit aims to keep the debate more concise, it is not always adhered to - take Libyan leader Muammar Gaddafi’s 2009 “rambling and unscripted” speech7, speaking for an impressive 90 minutes. The longest statement delivered at the UN General Debate during the years comprised in this dataset belongs to Cuba, as in 1979 Fidel Castro proclaimed 11500 words. The record for the longest speech ever delivered at the UN General Debate also belongs to the illustrious Cuban leader, who in 1960 reportedly spoke for over four hours and a half8.

Having uncovered Cuba’s tendency to impart in long addresses at the UN General Debate, a follow-up inquiry naturally emerges: which countries consistently deliver the longest, and shortest, speeches? The visualization below aims to answer this question, presenting the top and bottom 8 countries ranked by median speech word count.

Cuba rounds off the top 5 of the most talkative countries, behind the former state of Czechoslovakia. Ireland emerges in first place, with the highest recorded median word count during the period comprised in the dataset. On the other end of the spectrum, the Southeast Asia nation of Brunei - admitted to the UN in 1984 - ranks as the country with the lowest speech median word count, having never delivered an address with more than 2000 words.

Thus far, the focus has been on purely quantitative metrics that translate how the UN General Debate has evolved over time. Extending the scope of the exploratory analysis to the lexicon of the text corpus allows for insights on the actual topics that shaped the General Debate to be uncovered.

The first step in exploring the lexicon of the dataset is listing the most mentioned words across the considered time period. A routine pre-processing step in text analytics when conducting word frequency count is to remove stopwords, i.e. notoriously common words with low significance – such as “the”, “of” and “to”. The graph below shows the top 20 most prominent words found in the General Debate text corpus from 1970 to 2015, having removed stopwords following the tidy approach detailed in (Silge and Robinson 2016).

As expected, the list is dominated by words that are transversal to most speeches, such as “nations”, “united”, “international”, “world” and “countries”. These words, along with “assembly” and “session” (ranked #21) provide little context to the analysis, as they are featured in most opening statements and closing remarks; they are inconsequential and should, by extension, be treated as stopwords in this context. Words highlighted as meaningful, however, convey the first qualitative insights into the most prominent topics discussed at the UN General Debate. “Peace” emerges ahead of “development”, “security” and “economic”, suggesting that the main purpose of the organization - to maintain international peace and security, consecrated in the Article 19 - is reflected in the content of the debate.

In the quest to unravel the topics that dominated the UN General Debate sessions through the years it is preferable to focus on decades rather than performing the word count exercise with increased granularity at year level. The decision to aggregate the data by ten-year periods (five, in the case of the 2010s) allows the reader to ascertain which affairs warranted prolonged discussion at the General Assembly and thus truly shaped foreign policy discussion at a time scale relatable to man, the decade. However, single word frequency count at decade level is likely to yield similar results to those observed for the complete time period enclosed in the dataset. In the field of text mining and analytics it is common practice to extend the tokenization process beyond single words, to entities known as n-grams; in this case, greater insights into the actual topics and events can be gathered by exploring the occurrence of prominent word pairs, or bigrams. Similarly to single-word tokens, bigrams containing stopwords (including those labelled as “inconsequential”) were excluded from the dataset.

The plot below shows the top six bigrams per decade, from the 1970s to the early 2010s.

Now this is a rather interesting visualization.

Conflicts in the Middle East were at the forefront of the General Debate in the 1970s. This was a period of notable turmoil and outright war in the region, during which Israeli armies fought opposing forces from Palestine, Egypt and other Arab countries for control over disputed territories. Political instability and armed conflicts prevailed throughout the 1980s, 90s and 2000s, with the text corpus of the General Debate echoing the concerns expressed at the General Assembly. The Middle East remains a tumultuous region, but the bigram dropped out of the top six word pairs in the early 2010s.

The 1980s were dominated by the talks condemning the South African Apartheid system, known for racial segregation, discrimination and violence towards the native African population. This topic had been heavily debated in the previous decade, but its relevance at the world stage diminished in the 1990s, as Apartheid legislation was repealed in June 1991 and open democratic elections held in April 1994, famously own by Nelson Mandela.

The transition period between the 1980s and the 1990s was rich in talks concerning the Soviet Union and the Cold War, with the importance of Peace Keeping being reinstated several times following the fall of the Berlin Wall in November 1991. The two most prominent bigrams in this decade were Human Rights and the UN’s Security Council, two topics that have commanded the General Debate interchangeably over the years.

As the world embarked on the new millennium, special attention was awarded to the importance of establishing and binding member states to Sustainable Development Goals (SDGs), striving to improve health and education, reduce inequality, and spur economic growth while preserving natural resources. This endeavour, already prominent in the 2000s, culminated in 2015 with the adoption of the 2030 Agenda for Sustainable Development - a set of 17 SDGs to be achieved by 203010, providing a shared blueprint to which all member states should adhere to in order to address global challenges in a joint effort.

Climate Change emerged as a hot topic in the 2000s, ranking as the third most prominent bigram of this time period. This trend propelled into the 2010s, with climate change emerging as the second most frequent bigram during the first five years of the decade, on a list topped by ever-present topic of Human Rights.

Yearly mentions of climate change at the UN General Debate, along with a selection of climate-related bigrams, are presented in greater detail in the plot below.

The first mention of climate change was professed by the Canadian Prime-Minister in 1988; at the time, Mr. Mulroney stated:

[…] The signature a year ago in Montreal of the Protocol on the protection of the ozone layer is a landmark example of what nations working together can accomplish. I urge all States which have not yet done so to sign and ratify the Protocol without delay. The increasingly urgent question of global warming and climate change received serious attention at the International Conference on the Changing Atmosphere in Toronto last June. […]

Despite the gradual increase over the subsequent sessions, the number of mentions per year would remain well below 100 until 2007. A great surge was observed at the 62nd session of the General Assembly, driven by the theme of the debate, Responding to climate change11. In his opening speech, the President of the UN General Assembly stressed that “climate change and its dramatic effects are increasingly visible and increasingly violent. The irony is that those least responsible for it will suffer most”. The extent to which this statement is true will become evident over the course of this analysis.

Mentions of climate change would peak two years later, with the topic being referenced 773 times in 2009. Considering that 189 speeches were delivered at the 64th session of the General Debate, this amounts to – on average – four mentions per speech. The number of mentions dropped for three years following the maximum registered in the period considered in this dataset, but the topic remains at the forefront of the discussion and seems to be picking up momentum once again.

The inclusion of six climate-related bigrams in the previous graph further illustrates the relative importance awarded to climate change. While mentions of global warming and greenhouse effect / gas(es) seem to be correlated with climate change, both bigrams lag behind by one order of magnitude. The topic of fossil fuel(s) seems to receive low significance compared to what would be expected in the wake of the discussion surrounding clean and renewable energy sources. The Paris Climate / Agreement would only be ratified and singed by member states in 2016, hence the small number of references to the consensus-building document registered in years leading up to 2015. The Kyoto Protocol, its predecessor, was only mentioned – on average – in one third of all speeches delivered in 2007.

Lastly, we turn to the Ozone hole / layer topic. The issue that dominated climate discussion in the nineties and is now on track for a full recovery by 206012 provides a beacon of hope for the challenges posed by climate change – the international community once rallied to face a similar threat to climate change, with tangible consequences that were broadly divulged in the media. We must do the same again.

Having plotted the mentions of climate change over time and interpreted the relative importance awarded to this topic in comparison to related climate bigrams, the scope of the analysis can be narrowed in the interest of the research question that propelled this publication – to which extent are the nations of the world truly united in the face of the threat posed by climate change?

A tentative answer stemming from the scrutiny of the UN General Debate text corpus to this open-ended inquiry should consider the total number of mentions of climate change professed by each country over time, a metric presented in the plot below. It is important to underline that, while the number of mentions of climate change at the UN General Debate does not relay the full picture of a given country’s commitment to and involvement in tackling this challenge, it can certainly help to gauge the degree to which a nation is concerned about the topic.

Each dot represents one country, sorted along the horizontal axis by total number of mentions, with a random jitter effect to avoid overlaps. Points are coloured in one of six colours, ranging from dark-blue (0 to 5 mentions) to dark-red (more than 100 mentions); this colour palette allows the reader to better differentiate between clusters of countries that emerge as a consequence of the varying range of mentions of climate change, with the majority having mentioned climate change 6 to 25 times and some nations being seemingly unphased by the problem, whilst some are deeply concerned. The latter group is composed by four South-East Asia island nations, with the Federal States of Micronesia being the country with the most mentions. This cluster is followed by the countries coloured in dark-orange, a list that – with the exception of Cambodia – is also exclusively made up of island nations.

The same data is presented below in form of a map, with countries being coloured according to the same colour palette. The map is interactive, supporting pan and zoom while showing the number of mentions of climate change as the user hovers over each country.

Visualizing the data imprinted on the familiar planisphere further helps to convey the message of how critical the consequences of climate change will be for islands nations. These territories, already difficult to pinpoint on the map without the assistance of the red pins, stand to be partly or fully submerged by rising sea levels. The number of environmental refugees will undoubtedly increase as the water claims villages and cities located near the shore, a phenomenon that will extend to coast-lines across the globe if the due diligent targets for reducing emissions and behavioral changes are not swiftly met.

The visualizations presented up to this point have contributed to establish the prevalence of the topic of climate at the General Debate - per country and over time - evidencing that UN member states award varying degrees of importance to this subject. As mentioned, this observation is sustained by the hypothesis that the total number of mentions is positively correlated with the level of urgency each country expresses towards the problem. To complete the assessment of the proposed triad of research questions the blogpost culminates with an analysis of how the sentiment towards the topic of climate change at the UN General Debate has shifted over time.

Sentiment Analysis

Computational sentiment analysis is based on the premise that the overall emotive content of a text snippet (be it a sentence, speech, collection of documents or even a book) can be approximated by the cumulative contribution of its individual words.

Dictionary-based methods, the simplest implementations of sentiment analysis algorithms, resort to previously defined sentiment lexicons, compiled by linguist scholars and researchers who have assigned scores of positive or negative sentiment polarity – along with overarching emotions like joy anger, sadness or trust – to both general-purpose and domain-specific words in many languages. In the context of text analytics and natural language understanding, these dictionaries can be employed to infer whether a text excerpt carries either a positive or negative feeling, or is perhaps characterized by some other nuanced emotion. An overview of the most common sentiment lexicons and their applications can be found in chapter two of the work by Silge and Robinson (Silge and Robinson 2016). In their book the authors note that general-purpose lexicons are recurrently used as lookup unigrams (single word tokens) in tabular, inner-join methodologies, that do not account for the effect of text qualifiers (adverbs like “very”, negators like “not”, and so forth) on the contribution of polarized words.

Cognizant of the relevance of said modifiers and the consequences that their omission might entail on the output of the analysis, Tyler Rinker developed the SentimenR package to compute text sentiment at sentence level in timely fashion, while incorporating the effect of qualifiers - or valence shifters, as they are referred to by the author13. The output produced by SetitmentR is an unbounded sentiment polarity score at sentence level, meaning that sentences are attributed a score ranging from negative to positive values, whose magnitude is proportional to how polarizing the negative or positive sentiment conveyed by the sentence actually is. Sentences scored as 0 are considered to be neutral.

Concerning the task of identifying the segments of speech to be considered in the sentiment analysis, the decision was made to restrict the scope to include only sentences containing mentions of climate change. This process, which effectively creates window of interest composed by sentences explicitly pertaining to climate change, is illustrated below14.

SentimentR’s algorithm was thus employed at sentence level on the window of interest of every speech. For statements with more than one sentence enclosed in the window of interest the overall polarity score was computed as the average value assigned to each sentence. Five instances were chosen to portray the unbounded score attributed to the remarks concerning this topic, professed on five different sessions of the General Debate:

The process of computing the polarity score was extended to every statement concerning climate change, allowing for the sentiment expressed towards this topic over time to be quantified. The resulting scores are presented in the plot below, along with a smoothed curve that conveys the evolution of the average score per session of the UN General Debate.

Each point, representing the score of a given country’s statement on a given year, is colour-coded according to total number of mentions of climate change, introduced in previous visualizations. Every year a polarizing array of statements is made, with the resulting yearly average score being marginally favourable up until the early 2000’s, when it decayed slightly to a neutral level. The same data is depicted in the plot below, facetted by cluster of total mentions, in an attempt to assess if the trend exhibits meaningful differences on any specific group of countries.

The following cluster-specific trends are noticeable:

Closing Remarks & Future Work

This blogpost focused on the subject of climate change at the United Nations General Debate, covering the period from 1970 to 2015. Three research questions were answered through the analysis of the text corpus, establishing that:

Whilst the world appeared to be fragmented in 2015, a greater convergence is expected by 2020. In recent years the relevance of climate change in the mainstream media has experienced tangible growth, evolving into a truly global topic that commands the attention of politicians and the general public alike, dominating election programs and news outlets. Prominent characters have emerged and the cause has been championed by the younger generations. These recent developments are bound to have influenced the tone of the General Debate, the focus of a subsequent blogpost.

Baturo, Alexander, Niheer Dasandi, and Slava J Mikhaylov. 2017. “Understanding State Preferences with Text as Data: Introducing the Un General Debate Corpus.” Research & Politics 4 (2): 2053168017712821. https://journals.sagepub.com/doi/full/10.1177/2053168017712821.

Petit, Jean-Robert, Jean Jouzel, Dominique Raynaud, Narcisse I Barkov, J-M Barnola, Isabelle Basile, Michael Bender, et al. 1999. “Climate and Atmospheric History of the Past 420,000 Years from the Vostok Ice Core, Antarctica.” Nature 399 (6735): 429. https://www.nature.com/articles/20859.

Silge, Julia, and David Robinson. 2016. “Tidytext: Text Mining and Analysis Using Tidy Data Principles in R.” The Journal of Open Source Software 1 (3): 37. https://www.tidytextmining.com/.


  1. the General Assembly’s composition, functions, powers, voting, and procedures are inscribed in Chapter IV of the United Nations Charter, available on this link: http://www.un.org/en/sections/un-charter/chapter-iv/index.html

  2. full list of countries and observer states available on this link: http://www.un.org/en/member-states/index.html, including date of admission

  3. https://www.kaggle.com/unitednations/un-general-debates

  4. ISO 3166 Alpha-3 country code, more information in the following link: https://www.iso.org/iso-3166-country-codes.html

  5. the lower and upper hinges correspond to the first and third quartiles (the 25th and 75th percentiles). The upper whisker extends from the hinge to the largest value no further than 1.5 * IQR from the hinge (where IQR is the inter-quartile range, or distance between the first and third quartiles). The lower whisker extends from the hinge to the smallest value at most 1.5 * IQR of the hinge. Data beyond the end of the whiskers are called “outlying” points and are plotted individually - from the ggplot2 reference guide: https://ggplot2.tidyverse.org/reference/geom_boxplot.html

  6. press-release detailing the reform can be found on this link: https://www.un.org/press/en/1997/19970718.GA9285.html

  7. as reported by The Guardian: https://www.theguardian.com/world/2009/sep/23/muammar-gaddafi-general-assembly-speech

  8. as documented on this link: http://ask.un.org/faq/37127, along with other notably long speeches

  9. full charter available on this link: https://www.un.org/en/sections/un-charter/chapter-i/index.html

  10. the complete list of SDGs can be found on this link: https://www.un.org/sustainabledevelopment/sustainable-development-goals/

  11. full list of debate themes can be found on this link: https://www.un.org/en/ga/sessions/regular.shtml

  12. https://news.un.org/en/story/2018/11/1024842

  13. the author provides substantial evidence into the benefits of incorporating valence shifters, along with a comprehensive description of the algorithm’s inner workings on the package’s GitHub page – available on this link: https://github.com/trinker/sentimentr

  14. rendered using the ggpage package - available on this link: https://emilhvitfeldt.github.io/ggpage/

Citation

For attribution, please cite this work as

Lopes (2019, Oct. 17). Citizen Data Scientist: United we Stand?. Retrieved from https://citizendatascientist.github.io/posts/2019-08-04-united-we-stand/

BibTeX citation

@misc{lopes2019united,
  author = {Lopes, Miguel Ramos},
  title = {Citizen Data Scientist: United we Stand?},
  url = {https://citizendatascientist.github.io/posts/2019-08-04-united-we-stand/},
  year = {2019}
}