CC-BY-SA, Edzer Pebesma 2015.
The packages used in this course are as follows:
pkgs <- c(
"sp", # foundational spatial package
"spacetime", # core spacetime classes and methods
"stpp", # space-time point pattern simulation
"foreign", # for loading data
"plm", # time series for panel data
"zoo", # for time-series analysis
"xts", # extensible time series analysis
"geonames", # query the geonames api
"geosphere", # for plotting on geographic coordinates
"gstat" # geostatistics
)
It is likely you will need to install some of these on your computer.
To find out which ones need to be loaded we can as R, using require()
.
# Which packages are already installed?
reqs <- vapply(pkgs, require, character.only = TRUE, FUN.VALUE = logical(1))
reqs # print which packages you need
## sp spacetime stpp foreign plm zoo xts
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## geonames geosphere gstat
## TRUE TRUE TRUE
# Install the ones that are needed
if(!all(reqs)){
install.packages(pkgs[!reqs])
}
Data in R are often organized in vectors,
a = c(15, 20, 30)
a
## [1] 15 20 30
b = c("blue", "brown", "blue")
b
## [1] "blue" "brown" "blue"
in matrices
m = matrix(1:4, 2, 2)
m
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
or in data.frame
s
d = data.frame(a, b = c("blue", "brown", "brown"), c = c(65, 77, 69.6))
d
## a b c
## 1 15 blue 65.0
## 2 20 brown 77.0
## 3 30 brown 69.6
Such data has little meaning, because it is unclear what the variables refer to, and what the records refer to. A more descriptive way would be
Patients = data.frame(PatientID = a, EyeColor = b, Weight_kg = c(65, 77, 69.6))
Patients
## PatientID EyeColor Weight_kg
## 1 15 blue 65.0
## 2 20 brown 77.0
## 3 30 blue 69.6
where another table would be needed for the personal details related
to PatientID
.
Note here that
data.frame
) correspond to objects (persons), Weight_kg
has a measurement unit encoded it its variable nameThe weight numbers
Patients$Weight_kg
## [1] 65.0 77.0 69.6
carry no information about their measurement unit, allowing for meaningless computations such as
with(Patients, Weight_kg + PatientID)
## [1] 80.0 97.0 99.6
The answer is correct, though, as +
requires two numeric arguments. If we try to add
eye color to body weight, we get a warning if EyeColor
is a factor
(by default,
data.frame
converts character
vectors into factor
s!),
with(Patients, EyeColor + Weight_kg)
## Warning in Ops.factor(EyeColor, Weight_kg): '+' not meaningful for factors
## [1] NA NA NA
or in case EyeColor
is character
, we get an error:
Patients$EyeColor = as.character(Patients$EyeColor)
print(try(with(Patients, EyeColor + Weight_kg)))
## [1] "Error in EyeColor + Weight_kg : non-numeric argument to binary operator\n"
## attr(,"class")
## [1] "try-error"
## attr(,"condition")
## <simpleError in EyeColor + Weight_kg: non-numeric argument to binary operator>
Take home messages:
data.frame
may change your information (convert character into factor
), which
is useful for some purposes (categorical variables, models), but not for others (DNA sequences).data.frame
) often refer to objects, variables (columns) often to properties of these objects.Exercises
data.frame
command above be modified such that
the character variable is not modified into a factor
??options
)data.frame
related to a list
?d[1,]
d[,1]
d[,1,drop=FALSE]
d[1,1]
d[1,1,drop=FALSE]
d[1]
d["a"]
d[["a"]]
d$a
d[,"a"]
d[,c("a","b")]
as you will see, the logic behind selection for spatial, temporal, and spatiotemporal objects follows this very closely. matrix
and data.frame
both represent two-dimensional organisations of data; why don't we use matrix
for everything?People communicate time by using words, which can be written down as character information. Phrases like Sunday, 16 Aug 2015 and 18:36 are widely understood (although language dependent) to indicate date and time. Combined, we sometime see
if(Sys.info()["sysname"] == "Linux"){
system("date > date.file") # works on unix systems!
readLines("date.file") # read & print back in R
}
## [1] "Tue Aug 25 12:34:56 CEST 2015"
There are actually a lot of different ways in which we can represent time information, and still understand it. For computers, this is less easy:
data(wind, package = "gstat")
head(wind[,1:7])
## year month day RPT VAL ROS KIL
## 1 61 1 1 15.04 14.96 13.17 9.29
## 2 61 1 2 14.71 16.88 10.83 6.50
## 3 61 1 3 18.50 16.88 12.33 10.13
## 4 61 1 4 10.58 6.63 11.75 4.58
## 5 61 1 5 13.33 13.25 11.42 6.17
## 6 61 1 6 13.21 8.12 9.96 6.67
data(Produc, package = "plm")
head(Produc[,1:6])
## state year pcap hwy water util
## 1 ALABAMA 1970 15032.67 7325.80 1655.68 6051.20
## 2 ALABAMA 1971 15501.94 7525.94 1721.02 6254.98
## 3 ALABAMA 1972 15972.41 7765.42 1764.75 6442.23
## 4 ALABAMA 1973 16406.26 7907.66 1742.41 6756.19
## 5 ALABAMA 1974 16762.67 8025.52 1734.85 7002.29
## 6 ALABAMA 1975 17316.26 8158.23 1752.27 7405.76
library(foreign)
read.dbf(system.file("shapes/sids.dbf", package="maptools"))[1:5,c(5,9:14)]
## NAME BIR74 SID74 NWBIR74 BIR79 SID79 NWBIR79
## 1 Ashe 1091 1 10 1364 0 19
## 2 Alleghany 487 0 10 542 3 12
## 3 Surry 3188 5 208 3616 6 260
## 4 Currituck 508 1 123 830 2 145
## 5 Northampton 1421 9 1066 1606 3 1197
ISO 8601 - Data elements
and interchange formats - Information interchange - Representation
of dates and times tries to create some order in this mess. It uses
e.g. 2015-08-07
for Aug 7 2015, and 2015-08-07T18:30:27+00:00
to specify a given time in a particular time zone on that date.
ISO 8601 implicitly defines time stamps to refer to time intervals: 2015-08-07 refers to the full day of Aug 7, i.e. from 2015-08-07 00:00 up to but not including 2015-08-08 00:00. The time stamp 2015-08-07 00:00 has a duration of one second. 2015-08 refers to a full month.
Coordinated Universal Time UTC “It is, within about 1 second, mean solar time at 0 degree longitude; it does not observe daylight saving time.'' (formerly: GMT). It keeps track of the Earth's rotation and International Atomic Time (TAI), by introducing leap seconds now and then. Real-time approximations to UTC are broadcast wireless by GPS and radio time signals; computers usually use the network time protocol to synchronize clocks.
Kind of - for instance
as.Date(.leap.seconds)
## [1] "1972-07-01" "1973-01-01" "1974-01-01" "1975-01-01" "1976-01-01"
## [6] "1977-01-01" "1978-01-01" "1979-01-01" "1980-01-01" "1981-07-01"
## [11] "1982-07-01" "1983-07-01" "1985-07-01" "1988-01-01" "1990-01-01"
## [16] "1991-01-01" "1992-07-01" "1993-07-01" "1994-07-01" "1996-01-01"
## [21] "1997-07-01" "1999-01-01" "2006-01-01" "2009-01-01" "2012-07-01"
## [26] "2015-07-01"
To find out what time it is, R of course relies on the system clock. The R source code is updated every time a new leap second is announced. It is based on the code that operating systems use to handle date/time.
Although one could argue that time is represented by a real number (\(\mathbb{R}\)), for the sequence
c(55, 60, 65, 70)
## [1] 55 60 65 70
it is not clear to which points in time we refer to: are these years since 1900, or seconds since the start of Today? To fix this, one needs to set an offset (when is zero?) and a unit (how long takes a unit increas?).
R has two built-in formats: for date it has Date
, for time POSIXt
:
(d = Sys.Date())
## [1] "2015-08-25"
class(d)
## [1] "Date"
print(as.numeric(d), digits = 15)
## [1] 16672
(t = Sys.time())
## [1] "2015-08-25 12:34:56 CEST"
class(t)
## [1] "POSIXct" "POSIXt"
print(as.numeric(t), digits = 15)
## [1] 1440498896.6631
Date
is stored as the (integer) number of days since 1970-01-01 (in the local time zone), time is stored as the (fractional) number of seconds since 1970-01-01 00:00 UTC. We can see that time is printed in the current time zone.
R also understands'' time differences, e.g.
d - (d+1)
## Time difference of -1 days
d - (d+1.1)
## Time difference of -1.1 days
t - (t+24*3600)
## Time difference of -1 days
t - t+24*3600
## Time difference of 86400 secs
Exercise
POSIXct
uses one double to representation a time instance, POSIXlt
represents time as a list (each time stamp one list), and is convenient to isolate time
components:
t
## [1] "2015-08-25 12:34:56 CEST"
(lt = as.POSIXlt(t))
## [1] "2015-08-25 12:34:56 CEST"
unlist(lt)
## sec min hour mday
## "56.663104057312" "34" "12" "25"
## mon year wday yday
## "7" "115" "2" "236"
## isdst zone gmtoff
## "1" "CEST" "7200"
sapply(lt, class)
## sec min hour mday mon year
## "numeric" "integer" "integer" "integer" "integer" "integer"
## wday yday isdst zone gmtoff
## "integer" "integer" "integer" "character" "integer"
lt$sec
## [1] 56.6631
Exercise
Time zones make it easy to understand whether it is day or night without asking where you are, and take care of daylight saving time. There is a time zone database and a wikipedia entry, and R uses this either directly or through the GNU C library.
Sys.setenv(TZ="CET")
as.POSIXct("2015-03-28") - as.POSIXct("2015-03-27") # regular
## Time difference of 1 days
as.POSIXct("2015-03-30") - as.POSIXct("2015-03-29") # 23 hrs: DST starts
## Time difference of 23 hours
as.POSIXct("2015-10-26") - as.POSIXct("2015-10-25") # 25 hours: DST ends
## Time difference of 1.041667 days
whereas in UTC,
Sys.setenv(TZ="UTC")
as.POSIXct("2015-03-30") - as.POSIXct("2015-03-29")
## Time difference of 1 days
as.POSIXct("2015-10-26") - as.POSIXct("2015-10-25")
## Time difference of 1 days
Sys.setenv(TZ="CET")
having irregular intervals (day lengths) makes it complicated to compute daily means from e.g. hourly values; working with UTC may solve this problem. Climate scientists often model climate for years with 360 days, to get rid of the complexity of handling unequal month lengths and leap years. Aligning such data with POSIX time would be fun!
Date and time are understood from character strings (or factor) when entered like this:
as.Date("2015-08-17")
## [1] "2015-08-17"
as.POSIXct("2015-08-17")
## [1] "2015-08-17 CEST"
as.POSIXct("2015-08-17 12:25")
## [1] "2015-08-17 12:25:00 CEST"
but not like this
as.POSIXct("2015-08-07T18:30:27Z")
## [1] "2015-08-07 CEST"
where everything after the T is ignored, without warning!
Arbitrary time formats can be specified using strptime
, e.g.
strptime("2015-08-07T18:30:27Z", "%Y-%m-%dT%H:%M:%SZ", tz = "UTC")
## [1] "2015-08-07 18:30:27 UTC"
The (implicit) interval information of ISO 8601 time expressions is lost when read as POSIXt objects, as
identical(as.POSIXct("2015-08-17"), as.POSIXct("2015-08-17 00:00"))
## [1] TRUE
To convert numerical values into dates or time, the offset needs to be specified:
print(try(as.Date(365)))
## [1] "1971-01-01"
as.Date(365, origin = "1970-01-01")
## [1] "1971-01-01"
as.POSIXct(365 * 24 * 3600, origin = "1970-01-01")
## [1] "1971-01-01 01:00:00 CET"
although package zoo takes a somewhat different take on this:
library(zoo)
as.Date(365)
## [1] "1971-01-01"
The CRAN Time Series Analysis Task View describes all packages dealing with time, time series data, and its analysis. Several packages provide other ways of representing time; package lubridate for instance provides explicit representation of time intervals.
The time series task view is complete here, but I will mention two important packages: zoo and xts.
Zoo is for ordered observations, where the ordering does not have to be time:
library(zoo)
zoo(rnorm(10), 1:10)
## 1 2 3 4 5 6
## 0.27072595 -2.25979086 -1.11897038 0.35686920 -0.25076671 -0.12688160
## 7 8 9 10
## -0.05619057 1.49861427 -1.47440469 0.63996567
it provides a lot of convenient functions, including aggregate
, na.fill
, and e.g. moving average filters. See for instance ?aggregate.zoo
for many examples aggregating time series to daily, monthly, quarterly or yearly values.
Package xts builds on top of zoo (xts is a subclass of zoo), and requires that data are ordered by time. It allows for different time classes (e.g. Date
and POSIXct
) but works with POSIXct
under the hood. xts adds very convenient ISO 8601 style time interval selection:
library(xts)
x = xts(1:365, as.Date("2015-01-01")+0:364)
nrow(x)
## [1] 365
nrow(x["2015-05"]) # all days in May
## [1] 31
nrow(x["2015-05/06"]) # all days in May and June
## [1] 61
Geonames is a gazetteer (geographical dictionary or directory) that has a collection of place names, place types and geographical coordinates in WGS84, along with several other things.
You can register yourself at geonames.
options(geonamesUsername = "myUserName") # register, then use your own name
library(geonames)
pts = rbind(
LA = GNsearch(name = "Lancaster University", adminCode1 = "ENG"),
MS = GNsearch(name = "Münster")[1,]
)[c("lng", "lat")]
pts
## lng lat
## LA -2.78679 54.0082
## MS 7.62571 51.96236
It turns out that pts
is still filled with character information,
class(pts$lng)
## [1] "character"
so we need to transform it into numeric:
pts = sapply(pts, as.numeric) # this drops rownames, so:
rownames(pts) = c("LA", "MS") # Lancaster, Muenster
pts
## lng lat
## LA -2.78679 54.00820
## MS 7.62571 51.96236
R has a function called dist
, which computes (euclidean) distances in \(\mathbb{R}^n\):
dist(pts)
## LA
## MS 10.61158
which returns 10.6
, but 10.6 what? Degrees? Nautical miles? Oranges? No,
plane nonsense, and dist
is only useful for geographical coordinates (long/lat) in small areas close to the equator: our data are not in planar space, so we need to be more clever.
Package sp
provides classes for points, lines, polygons and grids which register their coordinate reference system (CRS), and chooses appropriate distance functions:
library(sp)
pts = SpatialPoints(pts, CRS("+proj=longlat +datum=WGS84"))
spDists(pts) # km, great circle distance over the WGS84 ellipsoid
## [,1] [,2]
## [1,] 0.0000 734.5819
## [2,] 734.5819 0.0000
package geosphere provides several alternative spherical/ellipsoidal distance measures:
library(geosphere)
distHaversine(pts[1],pts[2])
## [1] 733229.8
# distGeo(pts[1],pts[2]) # different from spDist: # commented out - bug in geosphere?
spDists(pts)[2,1]*1000 - distHaversine(pts[1],pts[2]) # difference in m
## [1] 1352.072
as we have seen, a SpatialPoints
object can be subsetted by its index, but
also by its row name; pts[1]
and pts[1,]
yield the same: we think of these geometry-only objects as having only records:
pts[1] # selects record 1, but looks ambiguous
## SpatialPoints:
## lng lat
## LA -2.78679 54.0082
## Coordinate Reference System (CRS) arguments: +proj=longlat
## +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
pts[1,] # looks less ambiguous
## SpatialPoints:
## lng lat
## LA -2.78679 54.0082
## Coordinate Reference System (CRS) arguments: +proj=longlat
## +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
pts["MS",]
## SpatialPoints:
## lng lat
## MS 7.62571 51.96236
## Coordinate Reference System (CRS) arguments: +proj=longlat
## +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
If we add attributes to pts
, e.g. by
pts$pop = c(299.708, 45.952) # according to Wikipedia
pts$area = c(302.9, 19.2) # I asked google
pts
## coordinates pop area
## 1 (-2.78679, 54.0082) 299.708 302.9
## 2 (7.62571, 51.96236) 45.952 19.2
we see that the class of pts
now has changed from SpatialPoints
to SpatialPointsDataFrame
, the reason being that in addition to geometry (locations), the object now has attributes (in a data.frame
slot). We can subset or manipulate these attributes in the same fashion as we can with plane data.frame
s:
pts[1] # now selects a variable!
## coordinates pop
## 1 (-2.78679, 54.0082) 299.708
## 2 (7.62571, 51.96236) 45.952
pts[,1] # does the same, less ambiguous
## coordinates pop
## 1 (-2.78679, 54.0082) 299.708
## 2 (7.62571, 51.96236) 45.952
pts["pop"] # does the same, by name
## coordinates pop
## 1 (-2.78679, 54.0082) 299.708
## 2 (7.62571, 51.96236) 45.952
pts[,"pop"] # does the same
## coordinates pop
## 1 (-2.78679, 54.0082) 299.708
## 2 (7.62571, 51.96236) 45.952
pts[["pop"]] # extract vector
## [1] 299.708 45.952
pts$pop # synonymous
## [1] 299.708 45.952
pts[1,"pop"] # select 1 geometry, 1 variable
## coordinates pop
## 1 (-2.78679, 54.0082) 299.708
pts[1,]$pop # select 1 geometry, extract variale
## [1] 299.708
(pts$popDensity = pts$pop / pts$area) # create & add new variable
## [1] 0.9894619 2.3933333
We can even select using the spatial predicate intersect, as in
library(spacetime)
data(air) # loads the boundary of Germany, called DE_NUTS1
proj4string(DE_NUTS1) = proj4string(pts) # semantically identical, but syntactically different
## Warning in ReplProj4string(obj, CRS(value)): A new CRS was assigned to an object with an existing CRS:
## +init=epsg:4326 +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0
## without reprojecting.
## For reprojection, use function spTransform in package rgdal
pts[DE_NUTS1,] # selects the point(s) in pts intersecting with polygon DE_NUTS1
## coordinates pop area popDensity
## 2 (7.62571, 51.96236) 45.952 19.2 2.393333
plot(DE_NUTS1, axes = TRUE)
points(pts[DE_NUTS1,], col = 'red', pch = 16)
Note that
pts
and DE_NUTS1
)sp
objects register coordinate reference systems, and warn/err in case of mismatchExercises
The SpatioTempofal Task View describes the packages that represent and/or analyze spatiotemporal data available on CRAN.
Essentially, spacetime distinguishes between three types of spatiotemporal data:
Support for trajectories in package spacetime
is rudimentary, and is more developed in package trajectories
(see below)
Irregular spacetime data are stored in STI
objects, or STIDF
if they have attributes (magnitude of an earth quake, cause of a disease):
showClass("STI")
## Class "STI" [package "spacetime"]
##
## Slots:
##
## Name: sp time endTime
## Class: Spatial xts POSIXct
##
## Extends: "ST"
##
## Known Subclasses: "STIDF"
the sp
and time
slots need to have the same number of records, and are simply matched by order to identify the location and time of a particular event. A
data
slot of STIDF
has the same number of records and is matched accordingly.
Regular spacetime data have recurrent observations for fixed spatial entities. The STF
, space-time-full, data structure represents this by
showClass("STF")
## Class "STF" [package "spacetime"]
##
## Slots:
##
## Name: sp time endTime
## Class: Spatial xts POSIXct
##
## Extends: "ST"
##
## Known Subclasses: "STFDF"
Although the class structure seems identical, sp
and time
may have different number of records, and the assumption here is that every geometry record has a time series of data of length nrow(time)
. The data
slot of STFDF
has length(sp) * nrow(time)
records, space cycling fastest.
A third type, STF
(space-time-sparse), defined as
showClass("STS")
## Class "STS" [package "spacetime"]
##
## Slots:
##
## Name: index sp time endTime
## Class: matrix Spatial xts POSIXct
##
## Extends: "ST"
##
## Known Subclasses: "STSDF"
mimics STF
but does not store all cell (space x time) values; it keeps an index vector to those cells that are filled, in order to be efficient for very sparse ST regular layouts.
Note that
sp
slot can contain any type of spatial data (points, lines, polygons, grid)endTime
can be used to define intervals; by default STI
objects have zero interval width (indicating time instance), STF
have interval length equal to the time to next observation (consecutive intervals).Quite often, regular or irregular data are aggregated to new spatial and/or temporal units. Package spacetime
tries hard to accomodate all possibilities.
Package spacetime
contains a dataset called air
, which contains
daily PM\(_{10}\) values for rural air quality stations over Germany,
from 1998-2009.
library(sp)
library(spacetime)
data(air)
# rural = STFDF(stations, dates, data.frame(PM10 = as.vector(air))) # stations not there!
dim(rural)
## Error in eval(expr, envir, enclos): object 'rural' not found
# stbox(rural) # commented for now: Error: object 'stbox' not found
We now aggregate the 2008 daily values to NUTS-1 region-average daily values by
x = as(rural[,"2008"], "xts")
## Error in .class1(object): object 'rural' not found
apply(x, 1, mean, na.rm=TRUE)[1:5]
## 2015-01-01 2015-01-02 2015-01-03 2015-01-04 2015-01-05
## 1 2 3 4 5
dim(rural[,"2008"])
## Error in eval(expr, envir, enclos): object 'rural' not found
x = aggregate(rural[,"2008"], DE_NUTS1, mean, na.rm=TRUE)
## Error in aggregate(rural[, "2008"], DE_NUTS1, mean, na.rm = TRUE): error in evaluating the argument 'x' in selecting a method for function 'aggregate': Error: object 'rural' not found
dim(x)
## [1] 365 1
stplot(x, mode = "tp", par.strip.text = list(cex=.6))
## Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'stplot' for signature '"xts"'
We can aggregate to the complete region, by which the object becomes a xts
object (unless we would specify simplify=FALSE
in the call to aggregate
):
x = aggregate(rural[,"2008"], DE_NUTS1, mean, na.rm=TRUE)
## Error in aggregate(rural[, "2008"], DE_NUTS1, mean, na.rm = TRUE): error in evaluating the argument 'x' in selecting a method for function 'aggregate': Error: object 'rural' not found
class(x)
## [1] "xts" "zoo"
plot(x[,"PM10"])
## Error in plot(x[, "PM10"]): error in evaluating the argument 'x' in selecting a method for function 'plot': Error in `[.xts`(x, , "PM10") : subscript out of bounds
## Calls: [ -> [.xts
Some stations contain only missing values for certain time periods; we can de-select those, and aggregate to monthly means:
x = as(rural[,"2008"], "xts")
## Error in .class1(object): object 'rural' not found
apply(x, 2, mean, na.rm=TRUE)[1:5]
## [1] 183 NA NA NA NA
sel = which(!apply(as(rural[,"2008"], "xts"), 2, function(x) all(is.na(x))))
## Error in .class1(object): object 'rural' not found
x = aggregate(rural[sel, "2008"], "month", mean, na.rm=TRUE)
## Error in aggregate(rural[sel, "2008"], "month", mean, na.rm = TRUE): error in evaluating the argument 'x' in selecting a method for function 'aggregate': Error: object 'rural' not found
stplot(x, mode = "tp", par.strip.text = list(cex=.6))
## Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'stplot' for signature '"xts"'
Next, we use zoo::as.yearqtr
to aggregate to quarterly values:
library(zoo)
x = aggregate(rural[sel,"2005::2011"], as.yearqtr, median, na.rm=TRUE)
## Error in aggregate(rural[sel, "2005::2011"], as.yearqtr, median, na.rm = TRUE): error in evaluating the argument 'x' in selecting a method for function 'aggregate': Error: object 'rural' not found
stplot(x, mode = "tp", par.strip.text = list(cex=.6))
## Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'stplot' for signature '"xts"'
Finally, we compute a yearly mean values for 2008 and 2009, for the whole country:
DE.years = STF(DE_NUTS1, as.Date(c("2008-01-01", "2009-01-01")))
x_subset <- aggregate(rural[,"2008::2009"], DE.years, mean, na.rm=TRUE)
## Error in aggregate(rural[, "2008::2009"], DE.years, mean, na.rm = TRUE): error in evaluating the argument 'x' in selecting a method for function 'aggregate': Error: object 'rural' not found
We can do some space-time geostatistics on these data, e.g. by
rr = rural[,"2005::2010"]
## Error in eval(expr, envir, enclos): object 'rural' not found
unsel = which(apply(as(rr, "xts"), 2, function(x) all(is.na(x))))
## Error in .class1(object): object 'rr' not found
r5to10 = rr[-unsel,] # remove series that are empty for this time period
## Error in eval(expr, envir, enclos): object 'rr' not found
summary(r5to10)
## Error in summary(r5to10): error in evaluating the argument 'object' in selecting a method for function 'summary': Error: object 'r5to10' not found
dim(r5to10)
## Error in eval(expr, envir, enclos): object 'r5to10' not found
rn = row.names(r5to10@sp)[4:7]
## Error in row.names(r5to10@sp): object 'r5to10' not found
rn
## Error in eval(expr, envir, enclos): object 'rn' not found
To keep computation time in bounds, we select 100 random time instances, rbind
the Spatial
objects, and compute a pooled (pure spatial) variogram:
rs = sample(dim(r5to10)[2], 100)
## Error in sample(dim(r5to10)[2], 100): object 'r5to10' not found
lst = lapply(rs, function(i) { x = r5to10[,i]; x$ti = i; x} )
## Error in lapply(rs, function(i) {: object 'rs' not found
pts = do.call(rbind, lst)
## Error in do.call(rbind, lst): object 'lst' not found
library(gstat)
v = variogram(PM10~ti, pts[!is.na(pts$PM10),], dX=0)
## Error in eval(expr, envir, enclos): object 'PM10' not found
vmod = fit.variogram(v, vgm(1, "Exp", 200, 1))
## Error in inherits(object, "gstatVariogram"): object 'v' not found
plot(v, vmod)
## Error in plot(v, vmod): error in evaluating the argument 'x' in selecting a method for function 'plot': Error: object 'v' not found
vmod
## Error in eval(expr, envir, enclos): object 'vmod' not found
The full, pre-computed space-time variogram is obtained by
rr = rural[,"2005::2010"]
unsel = which(apply(as(rr, "xts"), 2, function(x) all(is.na(x))))
r5to10 = rr[-unsel,]
vv = variogram(PM10~1, r5to10, width=20, cutoff = 200, tlags=0:5)
but we will load pre-computed values from package gstat:
data(vv, package = "gstat")
vv <- vv[c("np", "dist", "gamma", "id", "timelag", "spacelag")]
The following two graphs first show a variogram map in space (x) and time (y), and then a set of spatial variograms where color denotes the time lag:
print(plot(vv), split = c(1,1,1,2), more = TRUE)
print(plot(vv, map = FALSE), split = c(1,2,1,2))
We see that lag(spatial lag = 0, time lag = 0) is always missing, which is typical for regular data without replicates.
We can fit a metric variogram model to these data by
metricVgm <- vgmST("metric",
joint=vgm(50,"Exp",100,0),
stAni=50)
metricVgm <- fit.StVariogram(vv, metricVgm)
attr(metricVgm, "optim")$value
## [1] 60.60854
plot(vv, metricVgm)
or a separable model by
# commented - get error "Error in switch(model$stModel,...: EXPR must be a length 1 vector ...)"
# sepVgm <- vgmST("separable",
# space=vgm(0.9,"Exp", 123, 0.1),
# time =vgm(0.9,"Exp", 2.9, 0.1),
# sill=100)
# sepVgm <- fit.StVariogram(vv, sepVgm, method = "L-BFGS-B",
# lower = c(10,0,0.01,0,1),
# upper = c(500,1,20,1,200))
# attr(sepVgm, "optim")$value
# plot(vv, list(sepVgm, metricVgm))
These models can then be used for spatiotemporal kriging; the
interested reader is refered to vignettes of package gstat
, and
examples and demo scripts using the function krigeST
found there.
we'll analyse the events from the foot-and-mouth (fmd) disease data that come with the stpp package:
library(stpp)
data("fmd")
data("northcumbria")
head(fmd)
## X Y ReportedDay
## [1,] 350850 557590 28
## [2,] 337320 569500 28
## [3,] 341730 572090 28
## [4,] 350410 527040 28
## [5,] 347800 541850 29
## [6,] 356900 541120 29
head(northcumbria)
## x y
## [1,] 346618.7 583056.7
## [2,] 340011.7 575941.6
## [3,] 338487.0 573908.6
## [4,] 332896.5 574162.8
## [5,] 332896.5 573400.4
## [6,] 333658.8 571621.6
To make the northcumbria polygon more useful, we'll convert it into a SpatialPolygons
object by
nc = rbind(northcumbria, northcumbria[1,]) # close polygon
library(sp)
nc = SpatialPolygons(list(Polygons(list(Polygon(nc)), "NC")))
Next, we'll creat a SpatialPoints
object for the locations
pts = SpatialPoints(fmd[,1:2]) # CRS unknown!
plot(nc, axes = TRUE)
plot(pts, cex = 0.5, add = TRUE, col = "#88888888")
We can do some simple cell counts by creating a grid first
grd = SpatialPixels(SpatialPoints(makegrid(nc, n = 100)))
plot(grd)
plot(nc, add = TRUE)
and then assigning a dummy (1) attribute, and aggregating this using sum
:
pts$id = rep(1, length(pts))
image(aggregate(pts, grd, sum))
plot(nc, add = TRUE)
points(pts, cex = .3, col = "#88888888")
title("cell counts")
and get an idea of the temporal development by computing average event time per cell
day = as.Date("2001-01-01") + fmd[,3] # also unknown!
pts$day = day
image(aggregate(pts["day"], grd, mean))
plot(nc, add = TRUE)
points(pts, cex = .3, col = "#88888888")
title("cell mean event time")
We can create an irregular space-time object (STI
) from this, without attributes (DF), by
library(spacetime)
sti = STI(pts, day)
plot(sti) # space-time layout
This plot shows the space-time (time=x, space=y) layout of the
individual events, where points are simply numbered. Note that STI
objects
are always time ordered.
We can create a plot that shows attributes if we add an attribute; here we add a simple index, normalized to \([0,1]\), to indicate temporal distribution.
stidf = STIDF(pts, day, data.frame(one = (0:(length(pts)-1))/(length(pts)-1)))
stplot(stidf, sp.layout = nc, key.space = "right", cex = .5,
xlim = bbox(nc)[1,], ylim = bbox(nc)[2,],
main = "colour indicates quintile")
A variety of the number of observations per grid cell per time interval
is obtained by aggregating stidf
by a regular space-time grid; we create
this grid by
grd = SpatialPixels(SpatialPoints(makegrid(nc, n = 25)))
n = 6 # number of time classes
tcuts = seq(min(index(sti)), max(index(sti)), length.out=n)
grd = STF(grd, tcuts[1:(n-1)])
plot(grd)
STF
creates end times of the time intervals by default by
Sys.setenv(TZ = "UTC")
delta(tcuts[1:(n-1)])
## [1] "2001-03-04 UTC" "2001-04-07 UTC" "2001-05-11 UTC" "2001-06-14 UTC"
## [5] "2001-07-18 UTC"
The number of events per space-time blocks is then computed by
aggregate
, and plotted with stplot
:
# commented due to error flagged: "Error in over(ax(x, "STI") ..."
# a = aggregate(stidf, grd, sum)
# summary(a)
# stplot(a, xlim = bbox(nc)[1,], ylim = bbox(nc)[2,],
# col.regions=gray((45:1)/51), sp.layout = list(nc, first=FALSE),
# main = "number of observations")
A third type of spatiotemporal data concerns trajectories, describing the paths of objects moving through space. Data consists of sequences of fixes, time-stamped locations measurements. Methods for data analysis have been developed by very different communities, ranging from the transportation and logistics sector to animal ecology. Problems addressed include
Package trajectories provides classes and some methods for handling and analysing such data, building upon the work in packages sp and spacetime.