CC-BY-SA, Edzer Pebesma 2015.
The packages used in this course are as follows:
pkgs <- c(
  "sp", # foundational spatial package
  "spacetime", # core spacetime classes and methods
  "stpp", # space-time point pattern simulation
  "foreign", # for loading data
  "plm", # time series for panel data
  "zoo", # for time-series analysis
  "xts", # extensible time series analysis
  "geonames", # query the geonames api
  "geosphere", # for plotting on geographic coordinates
  "gstat" # geostatistics
)
It is likely you will need to install some of these on your computer.
To find out which ones need to be loaded we can as R, using require().
# Which packages are already installed?
reqs <- vapply(pkgs, require, character.only = TRUE, FUN.VALUE = logical(1))
reqs # print which packages you need
##        sp spacetime      stpp   foreign       plm       zoo       xts 
##      TRUE      TRUE      TRUE      TRUE      TRUE      TRUE      TRUE 
##  geonames geosphere     gstat 
##      TRUE      TRUE      TRUE
# Install the ones that are needed
if(!all(reqs)){
  install.packages(pkgs[!reqs])
}
Data in R are often organized in vectors,
a = c(15, 20, 30)
a
## [1] 15 20 30
b = c("blue", "brown", "blue")
b
## [1] "blue"  "brown" "blue"
in matrices
m = matrix(1:4, 2, 2)
m
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
or in data.frames
d = data.frame(a, b = c("blue", "brown", "brown"), c = c(65, 77, 69.6))
d
##    a     b    c
## 1 15  blue 65.0
## 2 20 brown 77.0
## 3 30 brown 69.6
Such data has little meaning, because it is unclear what the variables refer to, and what the records refer to. A more descriptive way would be
Patients = data.frame(PatientID = a, EyeColor = b, Weight_kg = c(65, 77, 69.6))
Patients
##   PatientID EyeColor Weight_kg
## 1        15     blue      65.0
## 2        20    brown      77.0
## 3        30     blue      69.6
where another table would be needed for the personal details related
to PatientID.
Note here that
data.frame) correspond to objects (persons), Weight_kg has a measurement unit encoded it its variable nameThe weight numbers
Patients$Weight_kg
## [1] 65.0 77.0 69.6
carry no information about their measurement unit, allowing for meaningless computations such as
with(Patients, Weight_kg + PatientID)
## [1] 80.0 97.0 99.6
The answer is correct, though, as + requires two numeric arguments. If we try to add
eye color to body weight, we get a warning if EyeColor is a factor (by default,
data.frame converts character vectors into factors!),
with(Patients, EyeColor + Weight_kg)
## Warning in Ops.factor(EyeColor, Weight_kg): '+' not meaningful for factors
## [1] NA NA NA
or in case EyeColor is character, we get an error:
Patients$EyeColor = as.character(Patients$EyeColor)
print(try(with(Patients, EyeColor + Weight_kg)))
## [1] "Error in EyeColor + Weight_kg : non-numeric argument to binary operator\n"
## attr(,"class")
## [1] "try-error"
## attr(,"condition")
## <simpleError in EyeColor + Weight_kg: non-numeric argument to binary operator>
Take home messages:
data.frame may change your information (convert character into factor), which
is useful for some purposes (categorical variables, models), but not for others (DNA sequences).data.frame) often refer to objects, variables (columns) often to properties of these objects.Exercises
data.frame command above be modified such that
the character variable is not modified into a factor??options)data.frame related to a list?d[1,]d[,1]d[,1,drop=FALSE]d[1,1]d[1,1,drop=FALSE]d[1]d["a"]d[["a"]]d$ad[,"a"]d[,c("a","b")]
as you will see, the logic behind selection for spatial, temporal, and spatiotemporal objects follows this very closely. matrix and data.frame both represent two-dimensional organisations of data; why don't we use matrix for everything?People communicate time by using words, which can be written down as character information. Phrases like Sunday, 16 Aug 2015 and 18:36 are widely understood (although language dependent) to indicate date and time. Combined, we sometime see
if(Sys.info()["sysname"] == "Linux"){
  system("date > date.file") # works on unix systems!
  readLines("date.file")     # read & print back in R
}
## [1] "Tue Aug 25 12:34:56 CEST 2015"
There are actually a lot of different ways in which we can represent time information, and still understand it. For computers, this is less easy:
data(wind, package = "gstat")
head(wind[,1:7])
##   year month day   RPT   VAL   ROS   KIL
## 1   61     1   1 15.04 14.96 13.17  9.29
## 2   61     1   2 14.71 16.88 10.83  6.50
## 3   61     1   3 18.50 16.88 12.33 10.13
## 4   61     1   4 10.58  6.63 11.75  4.58
## 5   61     1   5 13.33 13.25 11.42  6.17
## 6   61     1   6 13.21  8.12  9.96  6.67
data(Produc, package = "plm")
head(Produc[,1:6])
##     state year     pcap     hwy   water    util
## 1 ALABAMA 1970 15032.67 7325.80 1655.68 6051.20
## 2 ALABAMA 1971 15501.94 7525.94 1721.02 6254.98
## 3 ALABAMA 1972 15972.41 7765.42 1764.75 6442.23
## 4 ALABAMA 1973 16406.26 7907.66 1742.41 6756.19
## 5 ALABAMA 1974 16762.67 8025.52 1734.85 7002.29
## 6 ALABAMA 1975 17316.26 8158.23 1752.27 7405.76
library(foreign)
read.dbf(system.file("shapes/sids.dbf", package="maptools"))[1:5,c(5,9:14)]
##          NAME BIR74 SID74 NWBIR74 BIR79 SID79 NWBIR79
## 1        Ashe  1091     1      10  1364     0      19
## 2   Alleghany   487     0      10   542     3      12
## 3       Surry  3188     5     208  3616     6     260
## 4   Currituck   508     1     123   830     2     145
## 5 Northampton  1421     9    1066  1606     3    1197
ISO 8601 - Data elements
and interchange formats - Information interchange - Representation
of dates and times tries to create some order in this mess. It uses
e.g. 2015-08-07 for Aug 7 2015, and 2015-08-07T18:30:27+00:00
to specify a given time in a particular time zone on that date.
ISO 8601 implicitly defines time stamps to refer to time intervals: 2015-08-07 refers to the full day of Aug 7, i.e. from 2015-08-07 00:00 up to but not including 2015-08-08 00:00. The time stamp 2015-08-07 00:00 has a duration of one second. 2015-08 refers to a full month.
Coordinated Universal Time UTC “It is, within about 1 second, mean solar time at 0 degree longitude; it does not observe daylight saving time.'' (formerly: GMT). It keeps track of the Earth's rotation and International Atomic Time (TAI), by introducing leap seconds now and then. Real-time approximations to UTC are broadcast wireless by GPS and radio time signals; computers usually use the network time protocol to synchronize clocks.
Kind of - for instance
as.Date(.leap.seconds)
##  [1] "1972-07-01" "1973-01-01" "1974-01-01" "1975-01-01" "1976-01-01"
##  [6] "1977-01-01" "1978-01-01" "1979-01-01" "1980-01-01" "1981-07-01"
## [11] "1982-07-01" "1983-07-01" "1985-07-01" "1988-01-01" "1990-01-01"
## [16] "1991-01-01" "1992-07-01" "1993-07-01" "1994-07-01" "1996-01-01"
## [21] "1997-07-01" "1999-01-01" "2006-01-01" "2009-01-01" "2012-07-01"
## [26] "2015-07-01"
To find out what time it is, R of course relies on the system clock. The R source code is updated every time a new leap second is announced. It is based on the code that operating systems use to handle date/time.
Although one could argue that time is represented by a real number (\(\mathbb{R}\)), for the sequence
c(55, 60, 65, 70)
## [1] 55 60 65 70
it is not clear to which points in time we refer to: are these years since 1900, or seconds since the start of Today? To fix this, one needs to set an offset (when is zero?) and a unit (how long takes a unit increas?).
R has two built-in formats: for date it has Date, for time POSIXt:
(d = Sys.Date())
## [1] "2015-08-25"
class(d)
## [1] "Date"
print(as.numeric(d), digits = 15)
## [1] 16672
(t = Sys.time())
## [1] "2015-08-25 12:34:56 CEST"
class(t)
## [1] "POSIXct" "POSIXt"
print(as.numeric(t), digits = 15)
## [1] 1440498896.6631
Date is stored as the (integer) number of days since 1970-01-01 (in the local time zone), time is stored as the (fractional) number of seconds since 1970-01-01 00:00 UTC. We can see that time is printed in the current time zone.
R also understands'' time differences, e.g.
d - (d+1)
## Time difference of -1 days
d - (d+1.1)
## Time difference of -1.1 days
t - (t+24*3600)
## Time difference of -1 days
t - t+24*3600
## Time difference of 86400 secs
Exercise
POSIXct uses one double to representation a time instance, POSIXlt represents time as a list (each time stamp one list), and is convenient to isolate time
components:
t
## [1] "2015-08-25 12:34:56 CEST"
(lt = as.POSIXlt(t))
## [1] "2015-08-25 12:34:56 CEST"
unlist(lt)
##               sec               min              hour              mday 
## "56.663104057312"              "34"              "12"              "25" 
##               mon              year              wday              yday 
##               "7"             "115"               "2"             "236" 
##             isdst              zone            gmtoff 
##               "1"            "CEST"            "7200"
sapply(lt, class)
##         sec         min        hour        mday         mon        year 
##   "numeric"   "integer"   "integer"   "integer"   "integer"   "integer" 
##        wday        yday       isdst        zone      gmtoff 
##   "integer"   "integer"   "integer" "character"   "integer"
lt$sec
## [1] 56.6631
Exercise
Time zones make it easy to understand whether it is day or night without asking where you are, and take care of daylight saving time. There is a time zone database and a wikipedia entry, and R uses this either directly or through the GNU C library.
Sys.setenv(TZ="CET")
as.POSIXct("2015-03-28") - as.POSIXct("2015-03-27") # regular
## Time difference of 1 days
as.POSIXct("2015-03-30") - as.POSIXct("2015-03-29") # 23 hrs: DST starts
## Time difference of 23 hours
as.POSIXct("2015-10-26") - as.POSIXct("2015-10-25") # 25 hours: DST ends
## Time difference of 1.041667 days
whereas in UTC,
Sys.setenv(TZ="UTC")
as.POSIXct("2015-03-30") - as.POSIXct("2015-03-29")
## Time difference of 1 days
as.POSIXct("2015-10-26") - as.POSIXct("2015-10-25")
## Time difference of 1 days
Sys.setenv(TZ="CET")
having irregular intervals (day lengths) makes it complicated to compute daily means from e.g. hourly values; working with UTC may solve this problem. Climate scientists often model climate for years with 360 days, to get rid of the complexity of handling unequal month lengths and leap years. Aligning such data with POSIX time would be fun!
Date and time are understood from character strings (or factor) when entered like this:
as.Date("2015-08-17")
## [1] "2015-08-17"
as.POSIXct("2015-08-17")
## [1] "2015-08-17 CEST"
as.POSIXct("2015-08-17 12:25")
## [1] "2015-08-17 12:25:00 CEST"
but not like this
as.POSIXct("2015-08-07T18:30:27Z")
## [1] "2015-08-07 CEST"
where everything after the T is ignored, without warning!
Arbitrary time formats can be specified using strptime, e.g.
strptime("2015-08-07T18:30:27Z", "%Y-%m-%dT%H:%M:%SZ", tz = "UTC")
## [1] "2015-08-07 18:30:27 UTC"
The (implicit) interval information of ISO 8601 time expressions is lost when read as POSIXt objects, as
identical(as.POSIXct("2015-08-17"), as.POSIXct("2015-08-17 00:00"))
## [1] TRUE
To convert numerical values into dates or time, the offset needs to be specified:
print(try(as.Date(365)))
## [1] "1971-01-01"
as.Date(365, origin = "1970-01-01")
## [1] "1971-01-01"
as.POSIXct(365 * 24 * 3600, origin = "1970-01-01")
## [1] "1971-01-01 01:00:00 CET"
although package zoo takes a somewhat different take on this:
library(zoo)
as.Date(365)
## [1] "1971-01-01"
The CRAN Time Series Analysis Task View describes all packages dealing with time, time series data, and its analysis. Several packages provide other ways of representing time; package lubridate for instance provides explicit representation of time intervals.
The time series task view is complete here, but I will mention two important packages: zoo and xts.
Zoo is for ordered observations, where the ordering does not have to be time:
library(zoo)
zoo(rnorm(10), 1:10)
##           1           2           3           4           5           6 
##  0.27072595 -2.25979086 -1.11897038  0.35686920 -0.25076671 -0.12688160 
##           7           8           9          10 
## -0.05619057  1.49861427 -1.47440469  0.63996567
it provides a lot of convenient functions, including aggregate, na.fill, and e.g. moving average filters. See for instance ?aggregate.zoo for many examples aggregating time series to daily, monthly, quarterly or yearly values.
Package xts builds on top of zoo (xts is a subclass of zoo), and requires that data are ordered by time. It allows for different time classes (e.g. Date and POSIXct) but works with POSIXct under the hood. xts adds very convenient ISO 8601 style time interval selection:
library(xts)
x = xts(1:365, as.Date("2015-01-01")+0:364)
nrow(x)
## [1] 365
nrow(x["2015-05"]) # all days in May
## [1] 31
nrow(x["2015-05/06"]) # all days in May and June
## [1] 61
Geonames is a gazetteer (geographical dictionary or directory) that has a collection of place names, place types and geographical coordinates in WGS84, along with several other things.
You can register yourself at geonames.
options(geonamesUsername = "myUserName") # register, then use your own name
library(geonames)
pts = rbind(
 LA = GNsearch(name = "Lancaster University", adminCode1 = "ENG"),
 MS = GNsearch(name = "Münster")[1,]
)[c("lng", "lat")]
pts
##         lng      lat
## LA -2.78679  54.0082
## MS  7.62571 51.96236
It turns out that pts is still filled with character information, 
class(pts$lng)
## [1] "character"
so we need to transform it into numeric:
pts = sapply(pts, as.numeric) # this drops rownames, so:
rownames(pts) = c("LA", "MS") # Lancaster, Muenster
pts
##         lng      lat
## LA -2.78679 54.00820
## MS  7.62571 51.96236
R has a function called dist, which computes (euclidean) distances in \(\mathbb{R}^n\):
dist(pts)
##          LA
## MS 10.61158
which returns 10.6, but 10.6 what? Degrees? Nautical miles? Oranges? No,
plane nonsense, and dist is only useful for geographical coordinates (long/lat) in small areas close to the equator: our data are not in planar space, so we need to be more clever. 
Package sp provides classes for points, lines, polygons and grids which register their coordinate reference system (CRS), and chooses appropriate distance functions:
library(sp)
pts = SpatialPoints(pts, CRS("+proj=longlat +datum=WGS84"))
spDists(pts) # km, great circle distance over the WGS84 ellipsoid
##          [,1]     [,2]
## [1,]   0.0000 734.5819
## [2,] 734.5819   0.0000
package geosphere provides several alternative spherical/ellipsoidal distance measures:
library(geosphere)
distHaversine(pts[1],pts[2])
## [1] 733229.8
# distGeo(pts[1],pts[2]) # different from spDist: # commented out - bug in geosphere?
spDists(pts)[2,1]*1000 - distHaversine(pts[1],pts[2])  # difference in m
## [1] 1352.072
as we have seen, a SpatialPoints object can be subsetted by its index, but
also by its row name; pts[1] and pts[1,] yield the same: we think of these geometry-only objects as having only records:
pts[1] # selects record 1, but looks ambiguous
## SpatialPoints:
##         lng     lat
## LA -2.78679 54.0082
## Coordinate Reference System (CRS) arguments: +proj=longlat
## +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
pts[1,] # looks less ambiguous
## SpatialPoints:
##         lng     lat
## LA -2.78679 54.0082
## Coordinate Reference System (CRS) arguments: +proj=longlat
## +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
pts["MS",]
## SpatialPoints:
##        lng      lat
## MS 7.62571 51.96236
## Coordinate Reference System (CRS) arguments: +proj=longlat
## +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
If we add attributes to pts, e.g. by
pts$pop = c(299.708, 45.952) # according to Wikipedia
pts$area = c(302.9, 19.2) # I asked google
pts
##           coordinates     pop  area
## 1 (-2.78679, 54.0082) 299.708 302.9
## 2 (7.62571, 51.96236)  45.952  19.2
we see that the class of pts now has changed from SpatialPoints to SpatialPointsDataFrame, the reason being that in addition to geometry (locations), the object now has attributes (in a data.frame slot). We can subset or manipulate these attributes in the same fashion as we can with plane data.frames:
pts[1] # now selects a variable!
##           coordinates     pop
## 1 (-2.78679, 54.0082) 299.708
## 2 (7.62571, 51.96236)  45.952
pts[,1] # does the same, less ambiguous
##           coordinates     pop
## 1 (-2.78679, 54.0082) 299.708
## 2 (7.62571, 51.96236)  45.952
pts["pop"] # does the same, by name
##           coordinates     pop
## 1 (-2.78679, 54.0082) 299.708
## 2 (7.62571, 51.96236)  45.952
pts[,"pop"] # does the same
##           coordinates     pop
## 1 (-2.78679, 54.0082) 299.708
## 2 (7.62571, 51.96236)  45.952
pts[["pop"]] # extract vector
## [1] 299.708  45.952
pts$pop      # synonymous
## [1] 299.708  45.952
pts[1,"pop"] # select 1 geometry, 1 variable
##           coordinates     pop
## 1 (-2.78679, 54.0082) 299.708
pts[1,]$pop  # select 1 geometry, extract variale
## [1] 299.708
(pts$popDensity = pts$pop / pts$area) # create & add new variable
## [1] 0.9894619 2.3933333
We can even select using the spatial predicate intersect, as in
library(spacetime)
data(air) # loads the boundary of Germany, called DE_NUTS1
proj4string(DE_NUTS1) = proj4string(pts) # semantically identical, but syntactically different
## Warning in ReplProj4string(obj, CRS(value)): A new CRS was assigned to an object with an existing CRS:
## +init=epsg:4326 +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0
## without reprojecting.
## For reprojection, use function spTransform in package rgdal
pts[DE_NUTS1,] # selects the point(s) in pts intersecting with polygon DE_NUTS1
##           coordinates    pop area popDensity
## 2 (7.62571, 51.96236) 45.952 19.2   2.393333
plot(DE_NUTS1, axes = TRUE)
points(pts[DE_NUTS1,], col = 'red', pch = 16)
 
Note that
pts and DE_NUTS1)sp objects register coordinate reference systems, and warn/err in case of  mismatchExercises
The SpatioTempofal Task View describes the packages that represent and/or analyze spatiotemporal data available on CRAN.
Essentially, spacetime distinguishes between three types of spatiotemporal data:
Support for trajectories in package spacetime is rudimentary, and is more developed in package trajectories (see below)
Irregular spacetime data are stored in STI objects, or STIDF if they have attributes (magnitude of an earth quake, cause of a disease):
showClass("STI")
## Class "STI" [package "spacetime"]
## 
## Slots:
##                               
## Name:       sp    time endTime
## Class: Spatial     xts POSIXct
## 
## Extends: "ST"
## 
## Known Subclasses: "STIDF"
the sp and time slots need to have the same number of records, and are simply matched by order to identify the location and time of a particular event. A
data slot of STIDF has the same number of records and is matched accordingly.
Regular spacetime data have recurrent observations for fixed spatial entities. The STF, space-time-full, data structure represents this by
showClass("STF")
## Class "STF" [package "spacetime"]
## 
## Slots:
##                               
## Name:       sp    time endTime
## Class: Spatial     xts POSIXct
## 
## Extends: "ST"
## 
## Known Subclasses: "STFDF"
Although the class structure seems identical, sp and time may have different number of records, and the assumption here is that every geometry record has a time series of data of length nrow(time). The data slot of STFDF has length(sp) * nrow(time) records, space cycling fastest.
A third type,  STF (space-time-sparse), defined as
showClass("STS")
## Class "STS" [package "spacetime"]
## 
## Slots:
##                                       
## Name:    index      sp    time endTime
## Class:  matrix Spatial     xts POSIXct
## 
## Extends: "ST"
## 
## Known Subclasses: "STSDF"
mimics STF but does not store all cell (space x time) values; it keeps an index vector to those cells that are filled, in order to be efficient for very sparse ST regular layouts.
Note that
sp slot can contain any type of spatial data (points, lines, polygons, grid)endTime can be used to define intervals; by default STI objects have zero interval width (indicating time instance), STF have interval length equal to the time to next observation (consecutive intervals).Quite often, regular or irregular data are aggregated to new spatial and/or temporal units. Package spacetime tries hard to accomodate all possibilities.
Package spacetime contains a dataset called air, which contains
daily PM\(_{10}\) values for rural air quality stations over Germany,
from 1998-2009.
library(sp)
library(spacetime)
data(air)
# rural = STFDF(stations, dates, data.frame(PM10 = as.vector(air))) # stations not there!
dim(rural)
## Error in eval(expr, envir, enclos): object 'rural' not found
# stbox(rural) # commented for now: Error: object 'stbox' not found
We now aggregate the 2008 daily values to NUTS-1 region-average daily values by
x = as(rural[,"2008"], "xts")
## Error in .class1(object): object 'rural' not found
apply(x, 1, mean, na.rm=TRUE)[1:5]
## 2015-01-01 2015-01-02 2015-01-03 2015-01-04 2015-01-05 
##          1          2          3          4          5
dim(rural[,"2008"])
## Error in eval(expr, envir, enclos): object 'rural' not found
x = aggregate(rural[,"2008"], DE_NUTS1, mean, na.rm=TRUE)
## Error in aggregate(rural[, "2008"], DE_NUTS1, mean, na.rm = TRUE): error in evaluating the argument 'x' in selecting a method for function 'aggregate': Error: object 'rural' not found
dim(x)
## [1] 365   1
stplot(x, mode = "tp", par.strip.text = list(cex=.6))
## Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'stplot' for signature '"xts"'
We can aggregate to the complete region, by which the object becomes a xts object (unless we would specify simplify=FALSE in the call to aggregate):
x = aggregate(rural[,"2008"], DE_NUTS1, mean, na.rm=TRUE)
## Error in aggregate(rural[, "2008"], DE_NUTS1, mean, na.rm = TRUE): error in evaluating the argument 'x' in selecting a method for function 'aggregate': Error: object 'rural' not found
class(x)
## [1] "xts" "zoo"
plot(x[,"PM10"])
## Error in plot(x[, "PM10"]): error in evaluating the argument 'x' in selecting a method for function 'plot': Error in `[.xts`(x, , "PM10") : subscript out of bounds
## Calls: [ -> [.xts
Some stations contain only missing values for certain time periods; we can de-select those, and aggregate to monthly means:
x = as(rural[,"2008"], "xts")
## Error in .class1(object): object 'rural' not found
apply(x, 2, mean, na.rm=TRUE)[1:5]
## [1] 183  NA  NA  NA  NA
sel = which(!apply(as(rural[,"2008"], "xts"), 2, function(x) all(is.na(x))))
## Error in .class1(object): object 'rural' not found
x = aggregate(rural[sel, "2008"], "month", mean, na.rm=TRUE)
## Error in aggregate(rural[sel, "2008"], "month", mean, na.rm = TRUE): error in evaluating the argument 'x' in selecting a method for function 'aggregate': Error: object 'rural' not found
stplot(x, mode = "tp", par.strip.text = list(cex=.6))
## Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'stplot' for signature '"xts"'
Next, we use zoo::as.yearqtr to aggregate to quarterly values:
library(zoo)
x = aggregate(rural[sel,"2005::2011"], as.yearqtr, median, na.rm=TRUE)
## Error in aggregate(rural[sel, "2005::2011"], as.yearqtr, median, na.rm = TRUE): error in evaluating the argument 'x' in selecting a method for function 'aggregate': Error: object 'rural' not found
stplot(x, mode = "tp", par.strip.text = list(cex=.6))
## Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'stplot' for signature '"xts"'
Finally, we compute a yearly mean values for 2008 and 2009, for the whole country:
DE.years = STF(DE_NUTS1, as.Date(c("2008-01-01", "2009-01-01")))
x_subset <- aggregate(rural[,"2008::2009"], DE.years, mean, na.rm=TRUE)
## Error in aggregate(rural[, "2008::2009"], DE.years, mean, na.rm = TRUE): error in evaluating the argument 'x' in selecting a method for function 'aggregate': Error: object 'rural' not found
We can do some space-time geostatistics on these data, e.g. by
rr = rural[,"2005::2010"]
## Error in eval(expr, envir, enclos): object 'rural' not found
unsel = which(apply(as(rr, "xts"), 2, function(x) all(is.na(x))))
## Error in .class1(object): object 'rr' not found
r5to10 = rr[-unsel,] # remove series that are empty for this time period
## Error in eval(expr, envir, enclos): object 'rr' not found
summary(r5to10)
## Error in summary(r5to10): error in evaluating the argument 'object' in selecting a method for function 'summary': Error: object 'r5to10' not found
dim(r5to10)
## Error in eval(expr, envir, enclos): object 'r5to10' not found
rn = row.names(r5to10@sp)[4:7]
## Error in row.names(r5to10@sp): object 'r5to10' not found
rn
## Error in eval(expr, envir, enclos): object 'rn' not found
To keep computation time in bounds, we select 100 random time instances, rbind the Spatial objects, and compute a pooled (pure spatial) variogram:
rs = sample(dim(r5to10)[2], 100)
## Error in sample(dim(r5to10)[2], 100): object 'r5to10' not found
lst = lapply(rs, function(i) { x = r5to10[,i]; x$ti = i; x} )
## Error in lapply(rs, function(i) {: object 'rs' not found
pts = do.call(rbind, lst)
## Error in do.call(rbind, lst): object 'lst' not found
library(gstat)
v = variogram(PM10~ti, pts[!is.na(pts$PM10),], dX=0)
## Error in eval(expr, envir, enclos): object 'PM10' not found
vmod = fit.variogram(v, vgm(1, "Exp", 200, 1))
## Error in inherits(object, "gstatVariogram"): object 'v' not found
plot(v, vmod)
## Error in plot(v, vmod): error in evaluating the argument 'x' in selecting a method for function 'plot': Error: object 'v' not found
vmod
## Error in eval(expr, envir, enclos): object 'vmod' not found
The full, pre-computed space-time variogram is obtained by
rr = rural[,"2005::2010"]
unsel = which(apply(as(rr, "xts"), 2, function(x) all(is.na(x))))
r5to10 = rr[-unsel,]
vv = variogram(PM10~1, r5to10, width=20, cutoff = 200, tlags=0:5)
but we will load pre-computed values from package gstat:
data(vv, package = "gstat")
vv <- vv[c("np", "dist", "gamma", "id", "timelag", "spacelag")]
The following two graphs first show a variogram map in space (x) and time (y), and then a set of spatial variograms where color denotes the time lag:
print(plot(vv), split = c(1,1,1,2), more = TRUE)
print(plot(vv, map = FALSE), split = c(1,2,1,2))
 
We see that lag(spatial lag = 0, time lag = 0) is always missing, which is typical for regular data without replicates.
We can fit a metric variogram model to these data by
metricVgm <- vgmST("metric",
                   joint=vgm(50,"Exp",100,0),
                   stAni=50)
metricVgm <- fit.StVariogram(vv, metricVgm)
attr(metricVgm, "optim")$value
## [1] 60.60854
plot(vv, metricVgm)
 
or a separable model by
# commented - get error "Error in switch(model$stModel,...: EXPR must be a length 1 vector ...)"
# sepVgm <- vgmST("separable",
#                 space=vgm(0.9,"Exp", 123, 0.1),
#                 time =vgm(0.9,"Exp", 2.9, 0.1),
#                 sill=100)
# sepVgm <- fit.StVariogram(vv, sepVgm, method = "L-BFGS-B",
#                           lower = c(10,0,0.01,0,1),
#                           upper = c(500,1,20,1,200))
# attr(sepVgm, "optim")$value
# plot(vv, list(sepVgm, metricVgm))
These models can then be used for spatiotemporal kriging; the
interested reader is refered to vignettes of package gstat, and
examples and demo scripts using the function krigeST found there.
we'll analyse the events from the foot-and-mouth (fmd) disease data that come with the stpp package:
library(stpp)
data("fmd")
data("northcumbria")
head(fmd)
##           X      Y ReportedDay
## [1,] 350850 557590          28
## [2,] 337320 569500          28
## [3,] 341730 572090          28
## [4,] 350410 527040          28
## [5,] 347800 541850          29
## [6,] 356900 541120          29
head(northcumbria)
##             x        y
## [1,] 346618.7 583056.7
## [2,] 340011.7 575941.6
## [3,] 338487.0 573908.6
## [4,] 332896.5 574162.8
## [5,] 332896.5 573400.4
## [6,] 333658.8 571621.6
To make the northcumbria polygon more useful, we'll convert it into a SpatialPolygons object by
nc = rbind(northcumbria, northcumbria[1,]) # close polygon
library(sp)
nc = SpatialPolygons(list(Polygons(list(Polygon(nc)), "NC")))
Next, we'll creat a SpatialPoints object for the locations
pts = SpatialPoints(fmd[,1:2])     # CRS unknown!
plot(nc, axes = TRUE)
plot(pts, cex = 0.5, add = TRUE, col = "#88888888")
 
We can do some simple cell counts by creating a grid first
grd = SpatialPixels(SpatialPoints(makegrid(nc, n = 100)))
plot(grd)
plot(nc, add = TRUE)
 
and then assigning a dummy (1) attribute, and aggregating this using sum:
pts$id = rep(1, length(pts))
image(aggregate(pts, grd, sum))
plot(nc, add = TRUE)
points(pts, cex = .3, col = "#88888888")
title("cell counts")
 
and get an idea of the temporal development by computing average event time per cell
day = as.Date("2001-01-01") + fmd[,3] # also unknown!
pts$day = day
image(aggregate(pts["day"], grd, mean))
plot(nc, add = TRUE)
points(pts, cex = .3, col = "#88888888")
title("cell mean event time")
 
We can create an irregular space-time object (STI) from this, without attributes (DF), by
library(spacetime)
sti = STI(pts, day)
plot(sti) # space-time layout
 
This plot shows the space-time (time=x, space=y) layout of the
individual events, where points are simply numbered.  Note that STI objects
are always time ordered.
We can create a plot that shows attributes if we add an attribute; here we add a simple index, normalized to \([0,1]\), to indicate temporal distribution.
stidf = STIDF(pts, day, data.frame(one = (0:(length(pts)-1))/(length(pts)-1)))
stplot(stidf, sp.layout = nc, key.space = "right", cex = .5,
    xlim = bbox(nc)[1,], ylim = bbox(nc)[2,],
    main = "colour indicates quintile")
 
A variety of the number of observations per grid cell per time interval
is obtained by aggregating stidf by a regular space-time grid; we create
this grid by
grd = SpatialPixels(SpatialPoints(makegrid(nc, n = 25)))
n = 6 # number of time classes
tcuts = seq(min(index(sti)), max(index(sti)), length.out=n)
grd = STF(grd, tcuts[1:(n-1)])
plot(grd)
 
STF creates end times of the time intervals by default by
Sys.setenv(TZ = "UTC")
delta(tcuts[1:(n-1)])
## [1] "2001-03-04 UTC" "2001-04-07 UTC" "2001-05-11 UTC" "2001-06-14 UTC"
## [5] "2001-07-18 UTC"
The number of events per space-time blocks is then computed by
aggregate, and plotted with stplot:
# commented due to error flagged: "Error in over(ax(x, "STI") ..."
# a = aggregate(stidf, grd, sum)
# summary(a)
# stplot(a, xlim = bbox(nc)[1,], ylim = bbox(nc)[2,], 
#   col.regions=gray((45:1)/51), sp.layout = list(nc, first=FALSE),
#   main = "number of observations")
A third type of spatiotemporal data concerns trajectories, describing the paths of objects moving through space. Data consists of sequences of fixes, time-stamped locations measurements. Methods for data analysis have been developed by very different communities, ranging from the transportation and logistics sector to animal ecology. Problems addressed include
Package trajectories provides classes and some methods for handling and analysing such data, building upon the work in packages sp and spacetime.