--- title: "Estimate Trophic Position - One Source Model - Multiple Groups" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Estimate Trophic Position - One Source Model - Multiple Groups} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Our Objectives The purpose of this vignette is to learn how to estimate trophic position for multiple species or groups using stable isotope data ($\delta^{13}C$ and $\delta^{15}N$). We can estimate trophic position using a one source model based on equations from [Post (2002)](ttps://esajournals.onlinelibrary.wiley.com/doi/abs/10.1890/0012-9658%282002%29083%5B0703%3AUSITET%5D2.0.CO%3B2). ## Trophic Position Model The equation for a one source model consists of the following: $$ \text{Trophic Position} = \lambda + \frac{(\delta^{15}N_c - \delta^{15}N_b)}{\Delta N} $$ Where $\lambda$ is the trophic position of the baseline (e.g., `2`), $\delta^{15}N_c$ is the $\delta^{15}N$ of the consumer, $\delta^{15}N_b$ is the mean $\delta^{15}N$ of the baseline, and $\Delta N$ is the trophic enrichment factor (e.g., 3.4). To use this model with a Bayesian framework, we need to rearrange this equation to the following: $$ \delta^{15}N_c = \delta^{15}N_b + \Delta N \times (\text{Trophic Position} - \lambda) $$ The function `one_source_model()` uses this rearranged equation. ## Vignette structure First we need to organize the data prior to running the model. To do this work we will use [{dplyr}](https://dplyr.tidyverse.org/) and [{tidyr}](https://tidyr.tidyverse.org/) but we could also use [{data.table}](https://rdatatable.gitlab.io/data.table/). When running the model we will use [{trps}](https://benjaminhlina.github.io/trps/) and [{brms}](https://paulbuerkner.com/brms/) and iterative processes provided by [{purrr}](https://purrr.tidyverse.org/). Once we have run the model we will use [{bayesplot}](https://mc-stan.org/bayesplot/) to assess models and then extract posterior draws using [{tidybayes}](http://mjskay.github.io/tidybayes/). Posterior distributions will be plotted using [{ggplot2}](https://ggplot2.tidyverse.org/) and [{ggdist}](https://mjskay.github.io/ggdist/) with colours provided by [{viridis}](https://sjmgarnier.github.io/viridis/). ## Load packages First we load all the packages needed to carry out the analysis. ```{r setup, message=FALSE} { library(bayesplot) library(brms) library(dplyr) library(ggplot2) library(ggdist) library(grid) library(purrr) library(tidybayes) library(tidyr) library(trps) library(viridis) } ``` ## Working with multiple groups In [{trps}](https://benjaminhlina.github.io/trps/) we have a data set that has consumer and baseline data already joined for two ecoregions (`combined_iso`) using the same methods in [getting started with trps](trps.html). Let's look at this data frame. ### Organize data - multiple groups ```{r} combined_iso ``` We can see that this data frame has isotope data for a second baseline (dreissenids; `d13c_b2` and `d15n_b2`) as well as the mean values for both baselines (`c1`-`n2`). These columns for the second baseline are useful when estimating trophic position using a two source model but we do not need them for this analysis and they can be removed. We can also confirm that this data set has one species, lake trout. ```{r} unique(combined_iso$common_name) ``` collected from two ecoregions in Lake Ontario. ```{r} unique(combined_iso$ecoregion) ``` Let's remove the columns we don't need, `d13c_b2`, `d15n_b2`, `c2`, `n2`, and add $\lambda$ to the data frame (`l1`). To do so we make a `name` column that will be the two groups we have, `common_name` and `ecoregion` pasted together. We are doing this to make the iterative processes easier. ```{r} combined_iso_update <- combined_iso %>% dplyr::select(-c(d13c_b2, d15n_b2, c2, n2)) %>% mutate( l1 = 2, name = paste(ecoregion, common_name, sep = "_") ) %>% dplyr::select(id, common_name, ecoregion, name, d13c:l1) ``` Let's view our completed data set. ```{r} combined_iso_update ``` This example data is now ready to be analyzed. ## Estimate trophic position - multiple groups We will use similar structure used in [getting started with trps](trps.html) to model trophic position, however, we first `split()` the data into a list for all groups and then use `map()` from [{purrr}](https://purrr.tidyverse.org/) to run the model for each group. You will notice that the `brm()` call is exactly the same as when we ran the model for one group. The only difference here is when using `map()`, the `data` argument in `brm()` needs to be replaced with `.x` to tell `brm()` where to get the data. ### Model - multiple groups Let's run the model! ```{r} m1 <- combined_iso_update %>% split(.$name) %>% map( ~ brm( formula = one_source_model(), prior = one_source_priors(), stanvars = one_source_priors_params(), data = .x, family = gaussian(), chains = 2, iter = 4000, warmup = 1000, cores = 4, seed = 4, control = list(adapt_delta = 0.95) ), .progress = TRUE ) ``` ### Model output - multiple groups Let's look at the summary of both models. ```{r} m1 ``` We can see that $\hat R$ is 1, meaning that the variance among and within chains are equal (see [{rstan} docmentation on $\hat R$](https://mc-stan.org/rstan/reference/Rhat.html)) and that ESS is quite large for both groups. Overall, this means that both models are converging and fitting accordingly. ### Trace plots - multiple groups Let's look at the trace plots and distributions. We use `iwalk()` instead of `map()`, as `iwalk()` invisibly returns `.x` which is handy when you want to call a function (e.g., `plot()`) for its side effects rather than its returned value. I have also added `grid.text()` from `{grid}` to add the group names to each plot. ```{r} m1 %>% iwalk(~ { plot(.x) grid.text(.y, x = 0.50, y = 0.98) }) ``` We can see that the trace plots look "grassy" meaning the model is converging! ## Posterior draws - multiple groups Let's again look at the summary output from the model. ```{r} m1 ``` We can see that, for lake trout from the `Anthropogenic` ecoregion, $\Delta N$ is estimated to be `3.38` with `l-95% CI` of `2.88`, and `u-95% CI` of `3.87`. If we move down to trophic position (`tp`) we see trophic position is estimated to be `4.82` with `l-95% CI` of `4.44`, and `u-95% CI` of `5.29`. We can see that, for lake trout from the `Embayment` ecoregion, $\Delta N$ is estimated to be `3.37` with `l-95% CI` of `2.89`, and `u-95% CI` of `3.86`. If we move down to trophic position (`tp`) we see trophic position is estimated to be `4.54` with `l-95% CI` of `4.21`, and `u-95% CI` of `4.96`. ## Predictive posterior check We can check how well the model is predicting the $\delta^{15}N$ of the consumer using `pp_check()` from `{bayesplot}`. We have to use `map()` from `{purrr}` to iterate over the list that has our model objects. ```{r} m1 %>% map(~ .x %>% pp_check() ) ``` We can see that posteriors draws ($y_{rep}$; light lines) for both groups are are effectively modeling $\delta^{15}N$ of the consumer ($y$; dark line). ## Extract posterior draws - multiple groups We use functions from [{tidybayes}](http://mjskay.github.io/tidybayes/) to do this work. First we look at the the names of the variables we want to extract using `get_variables()`. Considering we have multiple models in `m1` that all have the same structure, we can just look at the names of the first model object in `m1`. ```{r} get_variables(m1[[1]]) ``` You will notice that `"b_tp_Intercept"` is the name of the variable that we are wanting to extract. Next we extract posterior draws using `gather_draws()`, and rename `"b_tp_Intercept"` to `tp`. Again, considering we have multiple models in `m1` we need to use `map()` to iterate over `m1` to get the posterior draws. Once we have iterated over `m1` to extract draws we can combine the results using `bind_rows()` from [{dplyr}](https://dplyr.tidyverse.org/). The variable `name` will have the name of the ecoregion and common name of the species pasted to together by a `_`. We need to separate this string into the two variables we want, being `ecoregion` and `common_name`. We can do this by using `separate_wider_delim()` from [{tidyr}](https://tidyr.tidyverse.org/). When using this function it will separate the columns and keep them as `characters`, hence why the last step is to convert `ecoregion` into a `factor`. For your data you will likely have category names other than `ecoregion` and `common_name`. Please replace with the columns that fit your data structure. ```{r} post_draws_mg <- m1 %>% map(~ .x %>% gather_draws(b_tp_Intercept) %>% mutate( .variable = "tp" ) %>% ungroup() ) %>% bind_rows(.id = "name") %>% separate_wider_delim(name, names = c("ecoregion", "common_name"), delim = "_", cols_remove = FALSE) %>% mutate( ecoregion = factor(ecoregion, levels = c("Anthropogenic", "Embayment")), ) ``` Let's view the `post_draws_mg` ```{r} post_draws_mg ``` We can see that the posterior draws data frame consists of seven variables: 1. `ecoregion` 2. `common_name` 3. `.chain` 4. `.iteration` (number of samples after burn-in) 5. `.draw` (number of samples from `iter`) 6. `.variable` (this will have different variables depending on what is supplied to `gather_draws()`) 7. `.value` (estimated value) Note - the names of and items in the first two columns will vary depending on the names you split your data into. ## Extracting credible intervals - multiple groups Considering we are likely using this information for a paper or presentation, it is nice to be able to report the median and credible intervals (e.g., equal-tailed intervals; ETI). We can extract and export these values using `spread_draws()` and `median_qi` from [{tidybayes}](http://mjskay.github.io/tidybayes/). Again, because `m1` is a `list` of our model objects, we need to `map()` over the list to calculate these values. Then we do the same procedures we have done before to combine and restructure the outputs. Lastly, we use `mutate_if()` to round all columns that are numeric to two decimal points. ```{r, message=FALSE} post_medians_ci <- m1 %>% map(~ .x %>% spread_draws(b_tp_Intercept) %>% median_qi() %>% rename( tp = b_tp_Intercept ) ) %>% bind_rows(.id = "name") %>% separate_wider_delim(name, names = c("ecoregion", "common_name"), delim = "_", cols_remove = FALSE) %>% mutate( ecoregion = factor(ecoregion, levels = c("Anthropogenic", "Embayment")), ) %>% mutate_if(is.numeric, round, digits = 2) ``` Let's view the output. ```{r} post_medians_ci ``` I like to use [{openxlsx}](https://ycphs.github.io/openxlsx/index.html) to export these values into a table that I can use for presentations and papers. For the vignette I am not going to demonstrate how to do this but please check out `{openxlsx}`. ## Plotting posterior distributions - multiple groups Now that we have our posterior draws extracted we can plot them. For comparing trophic position among species or groups, I like using either violin plots, interval points, or slab plots for posteriors. We can access violins through [{ggplot2}](https://ggplot2.tidyverse.org/) with the later being available in [{ggdist}](https://mjskay.github.io/ggdist/). ### Violin plot Let's first look at the violin plot. ```{r} ggplot(data = post_draws_mg, aes(x = common_name, y = .value, fill = ecoregion)) + geom_violin() + stat_summary(fun = median, geom = "point", size = 3, position = position_dodge(0.9) ) + scale_fill_viridis_d(name = "Ecoregion", option = "G", begin = 0.35, end = 0.75, alpha = 0.65) + theme_bw(base_size = 15) + theme( panel.grid = element_blank(), legend.position = "inside", legend.position.inside = c(0.85, 0.86) ) + labs( x = "Species", y = "P(Trophic Position | X)" ) ``` ### Point interval plot Next, we'll look at the point interval plot – but first we need to create our colour palette. ```{r} viridis_colours <- viridis(2, option = "G", begin = 0.35, end = 0.75, alpha = 0.65) ``` Now let's plot the point intervals. ```{r} ggplot(data = post_draws_mg, aes(x = common_name, y = .value, group = ecoregion)) + stat_pointinterval( aes(point_fill = ecoregion), point_size = 4, interval_colour = "grey60", position = position_dodge(0.4), shape = 21, ) + scale_fill_manual(aesthetics = "point_fill", values = viridis_colours, name = "Ecoregion") + theme_bw(base_size = 15) + theme( panel.grid = element_blank(), legend.position = "inside", legend.position.inside = c(0.85, 0.86) ) + labs( x = "Species", y = "P(Trophic Position | X)" ) ``` Congratulations we have successfully run a Bayesian one source trophic position model for one species in two ecoregions of Lake Ontario!