Species Richness 1 – observed species richness

 

  1. Collect data for a higher-level taxon from GBIF, for a region of interest.  There are many ways to do this from GBIF. One is to search on a country, and then choose a secondary data filter like “Class = mammalia” (or whatever you want).  Also make sure to only download those records with geospatial coordinates – there is a filter in GBIF to only get those with coordinates (Coordinate Status is “includes coordinates”).  I will email you all such a dataset for mammals of Australia from GBIF.   You should try collecting your own records for a region of interest.   If you want a smaller region of interest than a country, you could collect records for the country, import the records into DIVA-GIS and cut the layer to the desired extent. 
    1. Some important points:  you want to only have valid binomial names in your dataset.  So you need to remove all records that lack a binomial or are of the form “genusname sp.” or “genusname indet.” These are records that are likely “incomplete” but will still show up in your spreadsheet.  Check the dataset you accumulate to make sure these are weeded out.  You should also validate the records to make sure there aren’t other obvious mistakes (you can do this in DIVA).
  2. Load the spreadsheet into DIVA-GIS using Data -> Import Points to Shapefile
  3. Go to Analysis -> Point to Grid and chose Richness.   Click on the parameters tab and select “Scientific Name” or whatever you have called your binomial column.  Now go back to the Main tab and make sure that “Define Grid” says “Create a New Grid”.  Click on options, and keep cell size set to 1 for this first try.  Keep everything else as defaults, but click “output” to set an output grid file name.  Hit apply.  You shouldsee an observed species richness grid for the input data.  Try this again with a smaller cell size under the options tab for the “define grid” text box.
  4. Now select “Analysis” -> Summarize Points.  Under the “select field” tab,  again select “Scientific Name” or whatever you have called your binomial column as the field.  Then hit apply.  You should get a summary output for Observations, richess, and some commonly used species richness estimators (Chao1 and Chao 2, Jacknife 1 and 2, ACE (abundance coverage estimator)). 
  5. Go to data -> Climate -> Map and proceed to extract a map of mean annual temperature using the richness grid layer’s extent.  You can do this by selecting “read from layer” and making sure the active layer in the side window is the richness layer.  Now you have a richness layer and annual mean temperature layer at the same extent.  You can regress richness on annual mean temperature using Analysis -> Regression, but before you do, you need to get the grid sizes to match.  The grid size for the climate data I am using was .083, while the richness layer is 1.  In order to figure out how I need to aggregate my climate grids, I divide 1 by .083, and get 12.  So I need to go to Grid->Aggregate, using kind of mean and factor of 12 (from above).  The new climate grid should now have the same grid size and extent as the richness layer and I can run a regression.  Note, this regression is “for fun”, just to see if there is a relationship between climate and richness.  We need better tools to run such analyses, and we will use one such tool – Spatial Analysis in Macroecology -  in the following weeks.

 

PART II.  Estimated Richness – DIVA-GIS and EstimateS/Eco-Tools

 

1.      Please read pages 35-38 in the DIVA-GIS manual.   Now, using the same shape file for Australian mammals that you generate in step 2, above and making sure it is the active layer, go to analysis -> Point to Grid and choose Estimators of Richness.   Try using both ACE and Chao2 estimators.  ACE is the abundance coverage estimator and this one typically performs poorly if the number of species, and number of singletons and doubletons, in different grids varies widely.  It basically assumes that species are randomly distributed.  Chao2 is an incidence based measure and performs better in moderately “patchiness” where species are clumped in certain areas and absent in others.  Try both of these estimators.  Be aware how DIVA calculates estimated richness for each grid cell.   For the incidence based approaches, it basically creates subgrids in each grid cell of 4 or 9 cells (2x2 or 3x3 grids) and then determines incidence based on that very coarse grid.  One might question this approach.   Note:  you may need to manually fiddle with the legend and colors to get meaningful views of richness.  I set mine to 0-0=white, 1-20, 20-40, 40-60, 60-80, 80-100, 100-200, 200-300, 300-500, 500-700, 700-900, 900-1100 and 1100-1270 all scaled from green->red.

2.      Subtract the grids for ACE and for Chao2 to determine how much the different estimators differ.  You may need to fiddle with the legend to get meaningful views.

 

 

3.       Now we are going to move away from DIVA-GIS for a bit and to EstimateS and Eco-tools.  First, download EstimateS 8.0 (http://viceroy.eeb.uconn.edu/estimates) and the seed bank dataset that is on the class homepage (also distributed with EstimateS 8.0).  This is a  classic” dataset for species richness estimation.  Discussion of how this dataset was collected is provided here: http://links.jstor.org/sici?sici=0006-3606%28199806%2930%3A2%3C214%3ASRSVAA%3E2.0.CO%3B2-S&size=LARGE&origin=JSTOR-enlargePage (Butler and Chazdon, 1998, Biotropica).  The seed bank data has 34 rows and 121 columns, which represents 34 species arranged across 121 samples.  For each species (row), the number of individuals collected in the sample plot (column) is recorded.  For example, for species 1 in sample plot 1, 2 individuals were found.  For species 1 in sample plot 2, no individuals were found.  This kind of file format can be uploaded directly into EstimateS. 

4.      Start EstimateS.  If a file navigation window appears asking you to select a "Data File," choose the file called Statistics.4DD (Windows) or Statistics.data (Mac OS).  You should see a “welcome” screen.  Hit “ok” and you should get a set of menu options.   Load the seed bank data and make sure the “format 1” radio button is selected.   Note there are a number of different possible formats for an EstimateS file.  The file formats differ but the information content in any one file format is exactly the same.  The big difference in file formats is that some file formats don’t bother showing those samples where there are zero individuals for a species. 

5.      Now go to Diversity Settings and look over the options.  Defaults should be fine.  Hit compute.  You should get a results table.  Interpreting the results table is important.  Note that you should get an output with the row length equal to the number of samples you input into the program.  So there should be 121 samples arranged as rows.  The number of individuals and observed species richness is directly calculated, and over all the samples must be equal to actual overall abundance and species observed.  The other measurements for estimators will in nearly all cases be higher than the actual observed species richness because of the correction factors for rare or infrequent species in a sample (remember this is from Clint and Liesl’s presentation last week).  Before getting too worried about that interpretation, it is very useful to step away from EstimateS for a second and try something else.

6.      Go to: http://eco-tools.njit.edu/webMathematica/EcoTools/index.html, and choose “input” link under species diversity/species richness.  Click the radio button for “Analyse the seed bank dataset used in Colwell and Coddington (1994)”.  This is the exact same dataset you just used in EstimateS 8.0.  All other defaults should be fine.  Hit submit.    The nice thing about the Eco-Tools output is that it is “summary rich” while the EstimateS 8.0 outputs are not at all.  The Eco-Tools output should show a full return of very useful information.    The first table just summarizes the data showing number of rare and infrequent species, and number of singletons and doubletons.  This information is surprisingly useful and from it you can start to understand how the estimators differ.  Next are some summary outputs for observed and estimate richness.  After that are some estimator curves.  These curves would be identical to the curves you would generate from the EstimateS 8.0 outputs if you plotted the output data (with x-axis being samples arranged from 1-121 and y-axis being estimated richness for whichever estimator as you add samples).  Finally you see the sample based rarefaction curve from the dataset and true richness as analytically determined.

 

CHALLENGE:  How would you prepare the Australia Mammal dataset for analysis in EstimateS 8.0 or Eco-Tools?  Hint:  You would need to decide on a “post-hoc” gridding size for the dataset.  Then you would need to count individuals in for each species across the gridded samples. 

 

FOR NEXT WEEK:  Spatial Analysis in Macroecology.  Testing relationships between species richness and abiotic variables across latitude and elevation.