Species Richness 1 – observed species richness
PART II. Estimated Richness – DIVA-GIS and EstimateS/Eco-Tools
Please read pages 35-38 in the DIVA-GIS manual. Now, using the same shape file for Australian mammals that you generate in step 2, above and making sure it is the active layer, go to analysis -> Point to Grid and choose Estimators of Richness. Try using both ACE and Chao2 estimators. ACE is the abundance coverage estimator and this one typically performs poorly if the number of species, and number of singletons and doubletons, in different grids varies widely. It basically assumes that species are randomly distributed. Chao2 is an incidence based measure and performs better in moderately “patchiness” where species are clumped in certain areas and absent in others. Try both of these estimators. Be aware how DIVA calculates estimated richness for each grid cell. For the incidence based approaches, it basically creates subgrids in each grid cell of 4 or 9 cells (2x2 or 3x3 grids) and then determines incidence based on that very coarse grid. One might question this approach. Note: you may need to manually fiddle with the legend and colors to get meaningful views of richness. I set mine to 0-0=white, 1-20, 20-40, 40-60, 60-80, 80-100, 100-200, 200-300, 300-500, 500-700, 700-900, 900-1100 and 1100-1270 all scaled from green->red.
Subtract the grids for ACE and for Chao2 to determine how much the different estimators differ. You may need to fiddle with the legend to get meaningful views.
Now we are going to move away from DIVA-GIS for a bit and to EstimateS and Eco-tools. First, download EstimateS 8.0 (http://viceroy.eeb.uconn.edu/estimates) and the seed bank dataset that is on the class homepage (also distributed with EstimateS 8.0). This is a “classic” dataset for species richness estimation. Discussion of how this dataset was collected is provided here: http://links.jstor.org/sici?sici=0006-3606%28199806%2930%3A2%3C214%3ASRSVAA%3E2.0.CO%3B2-S&size=LARGE&origin=JSTOR-enlargePage (Butler and Chazdon, 1998, Biotropica). The seed bank data has 34 rows and 121 columns, which represents 34 species arranged across 121 samples. For each species (row), the number of individuals collected in the sample plot (column) is recorded. For example, for species 1 in sample plot 1, 2 individuals were found. For species 1 in sample plot 2, no individuals were found. This kind of file format can be uploaded directly into EstimateS.
Start EstimateS. If a file navigation window appears asking you to select a "Data File," choose the file called Statistics.4DD (Windows) or Statistics.data (Mac OS). You should see a “welcome” screen. Hit “ok” and you should get a set of menu options. Load the seed bank data and make sure the “format 1” radio button is selected. Note there are a number of different possible formats for an EstimateS file. The file formats differ but the information content in any one file format is exactly the same. The big difference in file formats is that some file formats don’t bother showing those samples where there are zero individuals for a species.
Now go to Diversity Settings and look over the options. Defaults should be fine. Hit compute. You should get a results table. Interpreting the results table is important. Note that you should get an output with the row length equal to the number of samples you input into the program. So there should be 121 samples arranged as rows. The number of individuals and observed species richness is directly calculated, and over all the samples must be equal to actual overall abundance and species observed. The other measurements for estimators will in nearly all cases be higher than the actual observed species richness because of the correction factors for rare or infrequent species in a sample (remember this is from Clint and Liesl’s presentation last week). Before getting too worried about that interpretation, it is very useful to step away from EstimateS for a second and try something else.
Go to: http://eco-tools.njit.edu/webMathematica/EcoTools/index.html, and choose “input” link under species diversity/species richness. Click the radio button for “Analyse the seed bank dataset used in Colwell and Coddington (1994)”. This is the exact same dataset you just used in EstimateS 8.0. All other defaults should be fine. Hit submit. The nice thing about the Eco-Tools output is that it is “summary rich” while the EstimateS 8.0 outputs are not at all. The Eco-Tools output should show a full return of very useful information. The first table just summarizes the data showing number of rare and infrequent species, and number of singletons and doubletons. This information is surprisingly useful and from it you can start to understand how the estimators differ. Next are some summary outputs for observed and estimate richness. After that are some estimator curves. These curves would be identical to the curves you would generate from the EstimateS 8.0 outputs if you plotted the output data (with x-axis being samples arranged from 1-121 and y-axis being estimated richness for whichever estimator as you add samples). Finally you see the sample based rarefaction curve from the dataset and true richness as analytically determined.
CHALLENGE: How would you prepare the Australia Mammal dataset for analysis in EstimateS 8.0 or Eco-Tools? Hint: You would need to decide on a “post-hoc” gridding size for the dataset. Then you would need to count individuals in for each species across the gridded samples.
PART III. Testing Explanatory Factors in Species Richness.
1. You already started doing this in Part 1, running regressions with abiotic factors as independent variable and species richness as dependent variables and assessing the linear relationship between variables. This week, we are going to be doing a lot more of that, utilizing a software package called Spatial Analysis in Macroecology or SAM
2. SAM is available free from here (please download it): http://www.ecoevol.ufg.br/sam/
Also note that the paper in Global Ecology and Biogeography discussing SAM is available on the website: http://www.ecoevol.ufg.br/sam/rangel2006.pdf
That paper is a very nice, very readable introduction to SAM and why it is such a valuable tool. The great value of SAM is that it deals with an important issue in any geographical analysis. That issue is spatial autocorrelation. Much of the lab today is meant to help you understand why spatial autocorrelation is an essential topic in macroecology.
3. Spatial autocorrelation is “the property of random variables taking values, at pairs of locations a certain apart, that are more similar (positive autocorrelation) or less similar (negative autocorrelation) than expected for randomly associated pairs of observations”. Because of the shape of the globe and the way the sun angle hits the earth, there is almost always spatial autocorrelation in climatic variables. How do we deal with spatial autocorrelation in macroecological analyses?
4. Before we begin using SAM, please download three example datasets available from the SAM folks. Take a close look at input files in Excel format. Note that you could generate similar files from DIVA-GIS outputs – these are just grid file format outputs with size of grid, values for each cell, for different grids.
a. http://www.ecoevol.ufg.br/sam/sa_sam.zip (South America domain, divided among 374 equal-are grid cells, arbitrary planar coordinate system. Dataset includes all birds, passeriformes, non-passeriformes and snakes species richness. Geographic distribution of each species was redrawn (by hand!) across South America, and then species presence/absence matrix were typed in a worksheet)
b. http://www.ecoevol.ufg.br/sam/wh_sam.zip. A large dataset on Western Hemisphere. This domain is divided among 4220 equal-distanced cells, geographic projection, and the coordinate system is measured by decimal degrees of lat/long, which allows the calculation of geodesic surface distances. Includes species richness data of birds and mammals all over the domain, coming from GIS databases.
c. http://www.ecoevol.ufg.br/sam/bc_sam.zip. Brazilian Cerrado (Savannah). The domain was dived among 181 grid cells, equal distanced. Coordinate system is measured by decimal degrees of lat/long, allowing calculation of geodesic surface distances in kilometers units. This dataset is a subsample of the Western Hemisphere dataset, but can be very useful to test SAM routines when you have computational constraints to run a large dataset. Includes species richness of birds and mammals all over the domain, and also several environmental variables. (NOTE: SAM IS REALLY SLOW SOMETIMES, SO THIS IS A VERY GOOD DATASET TO RUN!!)
5. Load the Brazilian Cerrado dataset into SAM. The first thing to do is play with the Basic Statistics and Mapping functions. Basic statistics will output typical statistics for all the variables included in the input files. Under Data->Graphs and Maps -> Map Data Matrix, you can select Longitude as the X-axis and Latitude as the Y-axis and a mapping variable like bird species richness, or mean temperature, and visualize a map of those variables (much like what you’d see in DIVA, actually). You can also perform a principle components analysis to reduce the dimensionality of the environmental variables in your analysis. I don’t want to dwell on PCAs for a bunch of reasons, but if are already familiar with this kind of analysis, you can quickly get a visual view of the results.
6. The next thing to do is perform a Moran’s I calculation on a variable of interest. Go to Structure -> Spatial Autocorrelation -> Moran’s I. Then make sure the “Compute Geographic Distances” radio button is clicked. Next make sure Long and Lat coordinates are set to Long and Lat. Finally, choose a variable of interest (eg. Mean Temp) and hit compute. The Moran’s I is a very commonly used spatial autocorrelation measurement. What it will do is show you amount of positive to negative spatial autocorrelation on the y-axis and distance classes on the x-axis (ranging from 1 to -1, with 1 being perfect positive autocorrelation and -1 being perfect negative autocorrelation). For most environmental parameters, spatial autocorrelation is positive at close distance classes, near zero at intermediate distance classes, and negative at larger distance classes. Does it make sense why this would be? The Moran’s I plot is shown as an output. This output, to be clear, is still “descriptive”.
7. There are a number of next steps in an analysis. I am going to discuss two analysis of the many you can run in SAM. The first is a Spatial Correlation analysis and the second is a Linear Regression Analysis. In both cases, what you’d like to do is know the importance of correlated variable or independent variable like PET or mean temperature removing the effect of spatial autocorrelation. For example, lets say you wanted to know how well two niche model outputs correlated with each other, trying to remove the effect of spatial autocorrelation. In this case, you would use a Spatial Correlation analysis. If you wanted to know how much variation an independent variable explains in relation to a dependent variable removing the effect of simply “space”, you would use a Linear Regression Analysis. We will try both.
a. Try running a spatial correlation between bird or mammal species richness and an abiotic factor. To do so, choose the two variables you want to test and make sure that Lat and Lon are set correctly. A spatial correlation is like a regular correlation, except that the degrees of freedom are reduced to take into account the spatial autocorrelation in both variables. You should get outputs that show the correlation values, and then uncorrected and corrected degrees of freedom and uncorrected and corrected significance values. As you know, reducing degrees of freedom due to spatial autocorrelation will reduce the power of the test. So the corrected p-value may be non-significant even if the uncorrected is sig.
b. I am not as familiar with the OLS (linear regression) analysis process. But first choose a response variable (like species richness) and a predictor variable (like PET, AET, mean temperature). You can select multiple predictor variables. Check “spatial partial regression” and select third-order (this is the part I don’t quite understand --- but the manual says that 3rd-order is often sufficient). This basically seems to determine how many interaction terms are included in the model. Make sure long and lat. are set correctly. Now hit compute. You get a typical regression result with overall regression fit given the predictor variables, and the significance of those predictors. You can look at predicted versus actual richness given the regression, and residuals under the “regression graphs” tab. You can also examine the Moran’s I outputs for the response variable, as well as maps of actual, predicted and residuals (cool!). Finally, an most importantly, you can examine the partial regression outputs which gives how much variance is explained by the predictor variable and by space separately and together. Very cool.
8. CONTINUE TO EXPERIMENT WITH SPATIAL ANALYSIS IN MACROECOLOGY; there are a lot more options and analyses.