---
title: "Estimate Trophic Position - One Source Model - Multiple Groups"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Estimate Trophic Position - One Source Model - Multiple Groups}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Our Objectives

The purpose of this vignette is to learn how to estimate trophic position for multiple species or groups using stable isotope data ($\delta^{13}C$ and $\delta^{15}N$). We can estimate trophic position using a one source model based on equations from [Post (2002)](ttps://esajournals.onlinelibrary.wiley.com/doi/abs/10.1890/0012-9658%282002%29083%5B0703%3AUSITET%5D2.0.CO%3B2).

## Trophic Position Model

The equation for a one source model consists of the following:

$$
\text{Trophic Position} = \lambda + \frac{(\delta^{15}N_c - \delta^{15}N_b)}{\Delta N}
$$

Where $\lambda$ is the trophic position of the baseline (e.g., `2`), $\delta^{15}N_c$ is the $\delta^{15}N$ of the consumer, $\delta^{15}N_b$ is the mean $\delta^{15}N$ of the baseline, and $\Delta N$ is the trophic enrichment factor (e.g., 3.4).

To use this model with a Bayesian framework, we need to rearrange this equation to the following:

$$
\delta^{15}N_c = \delta^{15}N_b + \Delta N \times (\text{Trophic Position} - \lambda) 
$$

The function `one_source_model()` uses this rearranged equation.

## Vignette structure

First we need to organize the data prior to running the model. To do this work we will use [{dplyr}](https://dplyr.tidyverse.org/) and [{tidyr}](https://tidyr.tidyverse.org/) but we could also use [{data.table}](https://rdatatable.gitlab.io/data.table/).

When running the model we will use [{trps}](https://benjaminhlina.github.io/trps/) and [{brms}](https://paulbuerkner.com/brms/) and iterative processes provided by [{purrr}](https://purrr.tidyverse.org/).

Once we have run the model we will use [{bayesplot}](https://mc-stan.org/bayesplot/) to assess models and then extract posterior draws using [{tidybayes}](http://mjskay.github.io/tidybayes/). Posterior distributions will be plotted using [{ggplot2}](https://ggplot2.tidyverse.org/) and [{ggdist}](https://mjskay.github.io/ggdist/) with colours provided by [{viridis}](https://sjmgarnier.github.io/viridis/).

## Load packages

First we load all the packages needed to carry out the analysis.

```{r setup, message=FALSE}
{
library(bayesplot)
library(brms)
library(dplyr)
library(ggplot2)
library(ggdist)
library(grid)
library(purrr)
library(tidybayes)
library(tidyr)
library(trps)
library(viridis)
}
```


## Working with multiple groups

In [{trps}](https://benjaminhlina.github.io/trps/) we have a data set that has consumer and baseline data already joined for two ecoregions (`combined_iso`) using the same methods in 
[getting started with trps](trps.html). 
Let's look at this data frame.

### Organize data - multiple groups

```{r}
combined_iso
```

We can see that this data frame has isotope data for a second baseline (dreissenids; `d13c_b2` and `d15n_b2`) as well as the mean values for both baselines (`c1`-`n2`). These columns for the second baseline are useful when estimating trophic position using a two source model but we do not need them for this analysis and they can be removed.

We can also confirm that this data set has one species, lake trout.

```{r}
unique(combined_iso$common_name)
```

collected from two ecoregions in Lake Ontario.

```{r}
unique(combined_iso$ecoregion)
```

Let's remove the columns we don't need, `d13c_b2`, `d15n_b2`, `c2`, `n2`, and add $\lambda$ to the data frame (`l1`). To do so we make a `name` column that will be the two groups we have, `common_name` and `ecoregion` pasted together. We are doing this to make the iterative processes easier.

```{r}
combined_iso_update <- combined_iso %>% 
  dplyr::select(-c(d13c_b2, d15n_b2, c2, n2)) %>% 
  mutate(
    l1 = 2, 
    name = paste(ecoregion, common_name, sep = "_")
  ) %>% 
  dplyr::select(id, common_name, ecoregion, name, d13c:l1)
```

Let's view our completed data set.

```{r}
combined_iso_update
```

This example data is now ready to be analyzed.

## Estimate trophic position - multiple groups

We will use similar structure used in [getting started with trps](trps.html) 
to model trophic position, however, we first `split()` the data into a list 
for all groups and then use `map()` from [{purrr}](https://purrr.tidyverse.org/) 
to run the model for each group.

You will notice that the `brm()` call is 
exactly the same as 
when we ran the model for one group. 
The only difference here is when using `map()`, 
the `data` argument in `brm()` needs to be 
replaced with `.x` to tell `brm()` where 
to get the data.

### Model - multiple groups

Let's run the model!

```{r}
m1 <- combined_iso_update %>% 
  split(.$name) %>% 
  map( ~ brm(
    formula = one_source_model(),
    prior = one_source_priors(),
    stanvars = one_source_priors_params(),
    data = .x,
    family = gaussian(),
    chains = 2,
    iter = 4000,
    warmup = 1000,
    cores = 4,
    seed = 4,
    control = list(adapt_delta = 0.95)
  ), 
  .progress = TRUE
  )
```

### Model output - multiple groups

Let's look at the summary of both models.

```{r}
m1
```

We can see that $\hat R$ is 1, meaning that the variance among and within chains are equal (see [{rstan} docmentation on $\hat R$](https://mc-stan.org/rstan/reference/Rhat.html)) and that ESS is quite large for both groups. Overall, this means that both models are converging and fitting accordingly.

### Trace plots - multiple groups

Let's look at the trace plots and distributions. We use `iwalk()` instead of `map()`, as `iwalk()` invisibly returns `.x` which is handy when you want to call a function (e.g., `plot()`) for its side effects rather than its returned value. I have also added `grid.text()` from `{grid}` to add the group names to each plot.

```{r}
m1 %>% 
  iwalk(~ {
    plot(.x) 
    grid.text(.y, x = 0.50, y = 0.98)
  })
```

We can see that the trace plots look "grassy" meaning the model is converging!

## Posterior draws - multiple groups

Let's again look at the summary output from the model.

```{r}
m1
```

We can see that, for lake trout from the `Anthropogenic` ecoregion, $\Delta N$ is estimated to be `3.38` with `l-95% CI` of `2.88`, and `u-95% CI` of `3.87`. If we move down to trophic position (`tp`) we see trophic position is estimated to be `4.82` with `l-95% CI` of `4.44`, and `u-95% CI` of `5.29`.

We can see that, for lake trout from the `Embayment` ecoregion, $\Delta N$ is estimated to be `3.37` with `l-95% CI` of `2.89`, and `u-95% CI` of `3.86`. If we move down to trophic position (`tp`) we see trophic position is estimated to be `4.54` with `l-95% CI` of `4.21`, and `u-95% CI` of `4.96`.

## Predictive posterior check 

We can check how well the model is predicting the $\delta^{15}N$ of the consumer
using `pp_check()` from `{bayesplot}`. We have to use `map()` from `{purrr}` 
to iterate over the list that has our model objects. 

```{r}
m1 %>% 
  map(~ .x %>% 
        pp_check()
  )
```

We can see that posteriors draws ($y_{rep}$; light lines) for both groups are
are effectively modeling $\delta^{15}N$ of the consumer ($y$; dark line). 

## Extract posterior draws - multiple groups

We use functions from [{tidybayes}](http://mjskay.github.io/tidybayes/) to do this work. First we look at the the names of the variables we want to extract using `get_variables()`. Considering we have multiple models in `m1` that all have the same structure, we can just look at the names of the first model object in `m1`.

```{r}
get_variables(m1[[1]])
```

You will notice that `"b_tp_Intercept"` is the name of the variable that we are wanting to extract. Next we extract posterior draws using `gather_draws()`, and rename `"b_tp_Intercept"` to `tp`.

Again, considering we have multiple models in `m1` we need to use `map()` to iterate over `m1` to get the posterior draws. Once we have iterated over `m1` to extract draws we can combine the results using `bind_rows()` from [{dplyr}](https://dplyr.tidyverse.org/). The variable `name` will have the name of the ecoregion and common name of the species pasted to together by a `_`. We need to separate this string into the two variables we want, being `ecoregion` and `common_name`. We can do this by using `separate_wider_delim()` from [{tidyr}](https://tidyr.tidyverse.org/). When using this function it will separate the columns and keep them as `characters`, hence why the last step is to convert `ecoregion` into a `factor`.

For your data you will likely have category names other than `ecoregion` and `common_name`. Please replace with the columns that fit your data structure.

```{r}
post_draws_mg <- m1 %>%
  map(~ .x %>% 
        gather_draws(b_tp_Intercept) %>% 
        mutate(
          .variable = "tp"
        ) %>% 
        ungroup()
  ) %>% 
  bind_rows(.id = "name") %>% 
  separate_wider_delim(name, names = c("ecoregion", "common_name"), 
                       delim = "_", cols_remove = FALSE) %>% 
  mutate(
    ecoregion = factor(ecoregion, 
                       levels = c("Anthropogenic", "Embayment")), 
  )
```

Let's view the `post_draws_mg`

```{r}
post_draws_mg
```

We can see that the posterior draws data frame consists of seven variables:

1.  `ecoregion`
2.  `common_name`
3.  `.chain`
4.  `.iteration` (number of samples after burn-in)
5.  `.draw` (number of samples from `iter`)
6.  `.variable` (this will have different variables depending on what is supplied to `gather_draws()`)
7.  `.value` (estimated value)

Note - the names of and items in the first two columns will vary depending on the names you split your data into.

## Extracting credible intervals - multiple groups

Considering we are likely using this information for a paper or presentation, it is nice to be able to report the median and credible intervals (e.g., equal-tailed intervals; ETI). We can extract and export these values using `spread_draws()` and `median_qi` from [{tidybayes}](http://mjskay.github.io/tidybayes/).

Again, because `m1` is a `list` of our model objects, we need to `map()` over the list to calculate these values. Then we do the same procedures we have done before to combine and restructure the outputs. Lastly, we use `mutate_if()` to round all columns that are numeric to two decimal points.

```{r, message=FALSE}
post_medians_ci <- m1 %>%
  map(~ .x %>% 
        spread_draws(b_tp_Intercept) %>%
        median_qi() %>% 
        rename(
          tp = b_tp_Intercept
        ) 
  ) %>% 
  bind_rows(.id = "name") %>% 
  separate_wider_delim(name, names = c("ecoregion", "common_name"), 
                       delim = "_", cols_remove = FALSE) %>% 
  mutate(
    ecoregion = factor(ecoregion, 
                       levels = c("Anthropogenic", "Embayment")), 
  ) %>% 
  mutate_if(is.numeric, round, digits = 2)
```

Let's view the output.

```{r}
post_medians_ci
```

I like to use [{openxlsx}](https://ycphs.github.io/openxlsx/index.html) to export these values into a table that I can use for presentations and papers. For the vignette I am not going to demonstrate how to do this but please check out `{openxlsx}`.

## Plotting posterior distributions - multiple groups

Now that we have our posterior draws extracted we can plot them. For comparing trophic position among species or groups, I like using either violin plots, interval points, or slab plots for posteriors. We can access violins through [{ggplot2}](https://ggplot2.tidyverse.org/) with the later being available in [{ggdist}](https://mjskay.github.io/ggdist/).

### Violin plot

Let's first look at the violin plot.

```{r}
ggplot(data = post_draws_mg, aes(x = common_name, 
                                 y = .value, 
                                 fill = ecoregion)) + 
  geom_violin() + 
  stat_summary(fun = median, geom = "point", 
               size = 3, 
               position = position_dodge(0.9)
  ) + 
  scale_fill_viridis_d(name = "Ecoregion", 
                       option = "G", 
                       begin = 0.35, 
                       end = 0.75, alpha = 0.65) + 
  theme_bw(base_size = 15) + 
  theme(
    panel.grid = element_blank(),
    legend.position = "inside", 
    legend.position.inside = c(0.85, 0.86)
  ) +
  labs(
    x = "Species", 
    y = "P(Trophic Position | X)"
  )
```

### Point interval plot

Next, we'll look at the point interval plot – but first we need to create our colour palette.

```{r}
viridis_colours <- viridis(2,
                           option = "G",
                           begin = 0.35, 
                           end = 0.75, 
                           alpha = 0.65)
```

Now let's plot the point intervals.

```{r}
ggplot(data = post_draws_mg, aes(x = common_name, 
                                 y = .value, 
                                 group = ecoregion)) + 
  stat_pointinterval(
    aes(point_fill = ecoregion),
    point_size = 4,
    interval_colour = "grey60",
    position = position_dodge(0.4),
    shape = 21,
  ) + 
  scale_fill_manual(aesthetics = "point_fill",
                    values = viridis_colours,
                    name = "Ecoregion") +
  theme_bw(base_size = 15) + 
  theme(
    panel.grid = element_blank(),
    legend.position = "inside", 
    legend.position.inside = c(0.85, 0.86)
  ) +
  labs(
    x = "Species", 
    y = "P(Trophic Position | X)"
  )
```

Congratulations we have successfully run a Bayesian one source trophic position model for one species in two ecoregions of Lake Ontario!