Project 1: Climate Change
Climate change and temperature anomalies
If we wanted to study climate change, we can find data on the Combined Land-Surface Air and Sea-Surface Water Temperature Anomalies in the Northern Hemisphere at NASA’s Goddard Institute for Space Studies. The tabular data of temperature anomalies can be found here
To define temperature anomalies you need to have a reference, or base, period which NASA clearly states that it is the period between 1951-1980.
For each month and year, the dataframe shows the deviation of temperature from the normal (expected). Further the dataframe is in wide format.
tidyweather <- weather %>%
# Drop columns that are not needed
select(Year:Dec) %>%
# Pivot into longer format with columns 2:13 that contain the monthly data
pivot_longer(cols = 2:13,
names_to = "Month",
values_to = "delta")
Plotting Information
Let us plot the data using a time-series scatter plot, and add a trendline.
tidyweather <- tidyweather %>%
# use ymd function from lubridate package to create a date variable
mutate(date = ymd(paste(as.character(Year), Month, "1")),
# Create month and year variable in the lubridate package
month = month(date, label=TRUE),
year = year(date))
# Create plot with smoothing line
ggplot(tidyweather, aes(x=date, y = delta))+
geom_point()+
geom_smooth(color="red") +
theme_bw() +
labs (
title = "Monthly Weather Anomalies"
)
Is the effect of increasing temperature more pronounced in some months?
Use facet_wrap()
to produce a separate scatter plot for each month,
again with a smoothing line. Your chart should human-readable labels;
that is, each month should be labeled “Jan”, “Feb”, “Mar” (full or
abbreviated month names are fine), not 1
, 2
, 3
.
# Create plot
ggplot(tidyweather, aes(x = date, y = delta))+
geom_point() +
geom_smooth(color="red") +
theme_bw() +
labs (
title = "Weather Anomalies by month") +
# One plot per month
facet_wrap(~month)
It is sometimes useful to group data into different time periods to
study historical data. For example, we often refer to decades such as
1970s, 1980s, 1990s etc. to refer to a period of time. NASA calculates
a temperature anomaly, as difference form the base period of 1951-1980.
The code below creates a new data frame called comparison
that groups
data in five time periods: 1881-1920, 1921-1950, 1951-1980, 1981-2010
and 2011-present.
We remove data before 1800 and before using filter
. Then, we use the
mutate
function to create a new variable interval
which contains
information on which period each observation belongs to. We can assign
the different periods using case_when()
.
comparison <- tidyweather %>%
filter(Year>= 1881) %>% # remove years prior to 1881
#create new variable 'interval', and assign values based on criteria below:
mutate(interval = case_when(
Year %in% c(1881:1920) ~ "1881-1920",
Year %in% c(1921:1950) ~ "1921-1950",
Year %in% c(1951:1980) ~ "1951-1980",
Year %in% c(1981:2010) ~ "1981-2010",
TRUE ~ "2011-present"
))
Inspect the comparison
dataframe by clicking on it in the
Environment
pane.
Now that we have the interval
variable, we can create a density plot
to study the distribution of monthly deviations (delta
), grouped by
the different time periods we are interested in. Set fill
to
interval
to group and colour the data by different time periods.
ggplot(comparison, aes(x = delta, fill = interval)) +
geom_density() +
labs (
title = "Density plot for monthly deviations"
)
So far, we have been working with monthly anomalies. However, we might
be interested in average annual anomalies. We can do this by using
group_by()
and summarise()
, followed by a scatter plot to display
the result.
#creating yearly averages
average_annual_anomaly <- tidyweather %>%
group_by(Year) %>% #grouping data by Year
# use `na.rm=TRUE` to eliminate NA (not available) values
summarise(avg_anomalies = mean(delta, na.rm = TRUE))
#plotting the data:
ggplot(average_annual_anomaly, aes(x = Year, y = avg_anomalies)) +
geom_point() +
# Fitting the best line using LOESS method
geom_smooth(color = "red", method = "loess") +
# Set theme to have white background with black frame
theme_bw() +
labs(title = "Average annual anomalies", y = "Average anomaly")
Confidence Interval for delta
NASA points out on their website that
A one-degree global change is significant because it takes a vast amount of heat to warm all the oceans, atmosphere, and land by that much. In the past, a one- to two-degree drop was all it took to plunge the Earth into the Little Ice Age.
Your task is to construct a confidence interval for the average
delta since 2011, both using a formula and using a bootstrap simulation
with the infer
package.
formula_ci <- comparison %>%
filter(interval == "2011-present") %>%
# drop na values to remove 2022 values that are in the future and are not yet available
drop_na() %>%
# Calculate avg, standard deviation and count from the data points
summarise(avg = mean(delta, na.rm = TRUE),
sd = sd(delta),
count = n(),
# Calculate standard error based on standard deviation and count
se = sd / sqrt(count),
# Use t distribution because population standard deviation is not known
# Use 0.975 to have tails for 2.5% of values on both sides --> results in 95% confidence interval
t_critical = qt(0.975, count-1),
lower_boundary = avg - se * t_critical,
upper_boundary = avg + se * t_critical)
#print out formula_CI
formula_ci
## # A tibble: 1 × 7
## avg sd count se t_critical lower_boundary upper_boundary
## <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
## 1 1.07 0.266 139 0.0226 1.98 1.02 1.11
# Calculate confidence interval using bootstrap simulation
# Set seed to ensure reproducibiliity
set.seed(1)
formula_ci_bootstrap <- comparison %>%
filter(interval == "2011-present") %>%
# Specify for which variable we want to construct a C.I.
specify(response = delta) %>%
# Generate 1000 samples from the same data using bootstrap method (sampling with replacement)
generate(reps = 1000, type = "bootstrap") %>%
# Calculate the mean of the 1000 bootstrapped samples
calculate(stat = "mean") %>%
# Calculate the confidence interval based on the 1000 means from the bootstrapped samples
get_confidence_interval(level = 0.95, type = "percentile")
formula_ci_bootstrap
## # A tibble: 1 × 2
## lower_ci upper_ci
## <dbl> <dbl>
## 1 1.02 1.11
What is the data showing us?
The data is showing us that there has been a significant increase in temperatures. We are 95% confident that the average increase in temperatures is between 1.02 and 1.11. For the t-distribution, we took all monthly deltas and calculated the mean and the standard deviation of this. With that, we were able to calculate the standard error and the t-value and estimate the confidence interval. For the bootstrapping simulation, we generated 1000 bootstrapped samples. The code then generated an approximate sampling distribution and calculated the confidence interval based on that. The results suggest that we have a global change of temperatures of more than 1 degree. If one also consider the average annual anomaly plot (see task before), one can clearly see that we are in a critical situation: The rise is significant and is currently increasing dramatically.