Dataset

Quotebank, an open corpus of 178 million quotations attributed to the speakers who uttered them, extracted from 162
million English news articles published between 2008 and 2020. Quotebank is publicly available at https://doi.org/10.5281/zenodo.4277311.
Metadata about the speakers from Wikidata (speaker_attributes.parquet): We assigned each speaker his party affiliation, age, gender, ethnic group, academic degree, religion from this dataset.
Total annual greenhouse gas emissions in the United States from 2008 to 2019 from EAP, including transportation, electricity generation, industry, agriculture, commercial, residential, U.S. territories and total amount of GHG in all sectors. This data is accessible at https://cfpub.epa.gov/ghgdata/inventoryexplorer/#allsectors/allsectors/allgas/econsect/all.

Methods

Information Retrieval
We used a basic search engine of regular expressions for climate-related quotations retrieval.
Sentiment Analysis
We applied nltk Vader labeling to determine attitudes of quotations.
Observational Study
Due to the influence of confounders, we chose Inverse Propensity Score Weight (IPTW) to better explore the actual causality.IPTW is similar to propensity score model(PSM) and it provides an estimation of the ATE(Average Treatment Effect) to quantify the difference between the treatment and control groups.