The recently-released ALA4R package allows data from the Atlas of Living Australia to be used directly within R. It provides a number of functions that are typically needed for many conservation-management and ecological-research activities. However, it also provides opportunities for less conventional data analyses. In this example, we'll combine species occurrences with text analysis to try and reconstruct spatial patterns in the heights of eucalypts across Australia.

Finding height information

The ALA4R species_info function returns an information profile about a species or taxon, including free-form descriptive text. For plants, this may include information about the height, using phrasing such as "tree up to 90 metres", "grows to 15 metres", "may reach 30 - 40 metres in height", or "growing to a maximum height of 4 metres".

Although this is free-form text, these particular phrases are consistent enough that we can write some fairly simple pattern-matching routines to find and extract them.


The full R code can be found here. Briefly:

First, we download a gridded matrix of eucalypt data across Australia, using the sites_by_species function. This returns a data frame where each row is a site (grid cell) and each column corresponds to a particular eucalypt species (or sub-species, or variety):

ss=sites_by_species("genus:Eucalyptus",wkt="POLYGON((110 -45,155 -45,155 -10,110 -10,110 -45))",gridsize=0.5)

A fragment of this data frame might look something like:

longitude latitude eucalyptusSocialis eucalyptusGlobulus eucalyptusCamaldulensis
   134.75   -33.75                 16                  0                       0
   135.25   -33.75                 28                  1                       0
   135.75   -33.75                 47                  0                       0
   136.25   -33.75                 49                  0                       3
   136.75   -33.75                 63                  1                       0

Now for each species, we can pass its identifier ("guid") to the species_info function, and then extract height information from each species profile using our set of text searches. It's then a straightforward matter to check each grid cell of our sites-by-species matrix and calculate the mean height of the species present in each cell.


We find that only 13% of our eucalypts have height information (142 out of a total of 1080 taxa). However, many of those that are missing this information are rarely recorded — 38% of all eucalypt occurrence records (228689 of 604902) are associated with a taxon for which we have height information. This isn't great (perhaps we could improve it by searching other sources of height information) but it will serve for our purposes.

The map of the mean eucalypt height across Australia shows that tall eucalypts (20m or more) are typically found on much of the east coast, southern Victoria, Tasmania, south-western Western Australia, and other parts of New South Wales and Queensland.


In the next post, we'll explore some environmental drivers that might be related to these patterns in height, and use these to fill in the gaps in the map.