

  1. Quotebank, an open corpus of 178 million quotations attributed to the speakers who uttered them, extracted from 162
    million English news articles published between 2008 and 2020. Quotebank is publicly available at
  2. Metadata about the speakers from Wikidata (speaker_attributes.parquet): We assigned each speaker his party affiliation, age, gender, ethnic group, academic degree, religion from this dataset.
  3. Total annual greenhouse gas emissions in the United States from 2008 to 2019 from EAP, including transportation, electricity generation, industry, agriculture, commercial, residential, U.S. territories and total amount of GHG in all sectors. This data is accessible at


  • Information Retrieval
    We used a basic search engine of regular expressions for climate-related quotations retrieval.
  • Sentiment Analysis
    We applied nltk Vader labeling to determine attitudes of quotations.
  • Observational Study
    Due to the influence of confounders, we chose Inverse Propensity Score Weight (IPTW) to better explore the actual causality.IPTW is similar to propensity score model(PSM) and it provides an estimation of the ATE(Average Treatment Effect) to quantify the difference between the treatment and control groups.

About Us

  • Jingqi Liu

  • Maocheng Xu

  • Yinghui Jiang

  • Yu Zhou