|A practical guide for combining data to model species distributions
|Year of Publication
|Fletcher, RJ, Hefley, TJ, Robertson, EP, Zuckerberg, B, McCleery, RA, Dorazio, RM
Understanding and accurately modeling species distributions lies at the heart of many problems in ecology, evolution, and conservation. Multiple sources of data are increasingly available for modeling species distributions, such as data from citizen science programs, atlases, museums, and planned surveys. Yet reliably combining data sources can be challenging because data sources can vary considerably in their design, gradients covered, and potential sampling biases. We review, synthesize, and illustrate recent developments in combining multiple sources of data for species distribution modeling. We identify five ways in which multiple sources of data are typically combined for modeling species distributions. These approaches vary in their ability to accommodate sampling design, bias, and uncertainty when quantifying environmental relationships in species distribution models. Many of the challenges for combining data are solved through the prudent use of integrated species distribution models: models that simultaneously combine different data sources on species locations to quantify environmental relationships for explaining species distribution. We illustrate these approaches using planned survey data on 24 species of birds coupled with opportunistically collected eBird data in the southeastern United States. This example illustrates some of the benefits of data integration, such as increased precision in environmental relationships, greater predictive accuracy, and accounting for sample bias. Yet it also illustrates challenges of combining data sources with vastly different sampling methodologies and amounts of data. We provide one solution to this challenge through the use of weighted joint likelihoods. Weighted joint likelihoods provide a means to emphasize data sources based on different criteria (e.g., sample size), and we find that weighting improves predictions for all species considered. We conclude by providing practical guidance on combining multiple sources of data for modeling species distributions.