# Statistics

We are not primarily a statistical consultancy, but we do work a great deal with statistics because even with the hundreds of spatial-analytical techniques which GIS provides, it is still important to understand other dimensions of the data.

GIS can supplement statistical analyses with important additional data, such as satellite imagery. Mapping the outputs from statistical analysis can often transform difficult statistical information such as correlation coefficients and p-values into intuitive pictures that let you really understand what the data shows.

The table below provides a brief overview of the kind of statistical techniques with which we have training. It is not exhaustive and we are always willing to look at new methods.

 Common Acronym Approach What is it for? Software we use MLR Multi Linear Regression Any data that can be indexed to a continuous scale, e.g. “What is the relationship between the Total_Sales_Value of a shop, and the Population_Size and Average_Income in the local area?” R, Excel, MS Access, SPSS. LR Logistic Regression (Also Binomial Confidence Estimation, REML). Yes / No type data and data in qualitative classes, e.g.  “How frequently might one find a certain species given various habitat characteristics?” Or “What demographic and economic characteristics predict the likelihood that a customer will make a purchase in a given price bracket?” R, Excel, MS Access, SPSS. GLM / GLMM General Linear Modelling and Mixed Modelling Allows for more complex Linear Regression models, including using various types of data, e.g. “How well can one predict Likely_Income given : Age (Continuous), Sex (Binary), Education Level (Scalar), Industry of Occupation (Ordinal)?” R GAM / GAMM General Additive Modelling Regression for data who’s trend is non-linear, and does not match parametric trends. How does temperature predict ice cream sales rates? R Spatial MM Spatial GLM or GAM Uses a method known as “random effects” to identify and handle the effect of multiple different sampling sites, e.g. “How does the Sale_Value to local area income relationship vary between individual shops?” R ZIR Zero Inflated Regression A way of handling data with a large proportion of Zero values. For example survey data where not all respondents complete all questions, or ecological monitoring of rare species. This method can both help reduce the distorting effect of “false” zeros due to sampling error,  and may also explain why those Zero’s occur. R MCMC Monte-Carlo Markov Chain A method of simulating distributions to allow use of one of the above regression analyses when the datasets are too complex for a standard technique. R  (JAGS) PCA & RDA Principle Component Analysis PCA allows the visualisation of linear relationships between data sets in a very intuitive way. RDA provides a similar visual map to PCA but also provides correlation statistics. R, SPSS S-MLR Spatial-MLR Analyses and maps how one data set changes over space with respect to one or more other datasets e.g. “Are there Health outcomes in some areas which are not explained by income or other life style factors and so might suggest some hidden environmental risk?” R, GIS GWR Geo-Weighted Regression Allows the nature of a relationship to change continuously over an area and provides maps for each regression component. Particularly useful if spatial auto-correlation is a problem for other kinds of analysis, e.g. “What predicts future vegetation growth at particular locations given; rerrain, soil type, climate data, existing vegetation.” R, ArcGIS EVT Extreme Value Theory Predicts the expected frequency of very rare events, given historical data. e.g. “How often is this particular location likely to be flooded?” R (extRemes) + GIS. KS Kolmogrov-Smirnov Test for comparison of two trends, particularly useful for sampling design (See OISIN project). R