Chapter 3 Time series and function definitions

Last updated: 2020-08-12 00:35:49

Aims

Our aims in this chapter are:

  • Working with data which represent time (dates)
  • Learn how to visualize our data with graphical functions
  • Learn to define custom functions

3.1 Dates

3.1.1 Date and time classes in R

In R, there are several special classes for representing times and time-series (time+data). For example:

  • Times
    • Date
    • POSIXct
    • POSIXlt
  • Time series
    • ts
    • zoo (package zoo)
    • xts (package xts)

In this book, we will only be working with the Date class, which is used to represent dates.

3.1.2 Working with Date objects

3.1.2.1 Today’s date

The simplest data structure for representing times is Date, used to represent dates (without time of day). For example, we can get the current date with Sys.Date:

Using class (Section 1.3.11) reveals this is indeed an object of class Date:

3.1.2.2 Converting character to Date

We can also convert character values to Date, using as.Date. That way, we can create a Date object representing not just today’s date, but any date we want:

When the character values are in the standard date format (YYYY-MM-DD), such as in the above example, the as.Date function works without any additional arguments. However, when the character values are in a non-standard format, we need to specify the format definition with format, using the various component symbols. Table 3.1 lists the most commonly used symbols for specifying date formats in R. The full list of symbols can be found in ?strptime.

Table 3.1: Common Date format components
Symbol Meaning
%d Day ("15")
%m Month, numeric ("08")
%b Month, 3-letter ("Aug")
%B Month, full ("August")
%y Year, 2-digit (14)
%Y Year, 4-digit (2014)

Before going into examples of date formatting, it is useful to set the standard "C" locale in R. That way we make sure that month or weekday names are interpreted in English:

For example, converting the following character date—which is in a non-standard format—to Date fails when format is not specified:

Specifying the right format, which is "%d/%b/%y" in this case, leads to a successful conversion:

Here is another example with a different non-standart format ("%Y-%B-%d"):

3.1.2.3 Converting Date to character

A Date can always be converted back to character using as.character:

Note that both the Date and the character objects are printed te same way, so we have to use class to figure out which class we are dealing with.

The as.character function, by default, returns a text string with all date components in the standard YYYY-MM-DD format. Using the format argument, however, lets us compose different date formats or extract individual date components out of a Date object:

Note that as.character consistently returns a character, even when the result contains nothing but numbers, as in %Y. We can always convert from character to numeric with as.numeric if necessary:

3.1.2.4 Arithmetic operations with dates

At this point, you may ask yourself why do we even bother to create Date objects and deal with date formats, rahter than just keep working with character. The reason is that representing dates as Date makes it possible to do extremely useful operations, such as date arithmetic.

Date arithmetic means that Date objects act like numeric vectors with respect to certain operations that make sense for dates, such as:

  • Logical operators—Comparing which date is earlier/later
  • Subtraction—Calculating time differences
  • Creating sequences with seq—Finding consecutive dates

For example, the following expression checks whether today’s date is after 2013-01-01:

The subtraction operator (-) can be used to calculate the time difference between two dates. The result is an object of class difftime, which can be converted to numeric using as.numeric along with the unit we are interested in:

Using seq, we can create a sequence of consecutive dates within a given date range and with a particular time step (such as, every 7 days):

3.1.3 Time series

In this Book, we will not be working with specialized time series classes, such as ts. Instead, we will use the straightforward “manual” approach of treating a sequence of measurements and a corresponding sequence of times when those measurements were taken as a time series. For example, let’s define two numeric vectors, water level in Lake Kinneret, in May and in November, in each year during 1991-2011:

And the corresponding vector of measurement times, in this case—numeric values representing years:

What was the average water level in May? in November?

Was the water level ever below -213 (the “red line”) in May? in November? We can find out using the any function (Section 2.4.1)13:

How can we find out in which year(s) was the water level below -213 in May? in November? We can use the logical vector nov < -213 to subset (Section 2.3.10.2) the year vector:

A table is more natural for representing a collection of corresponding vectors, such as the times and measurements that comprise a time series:

We will learn about tables in Chapter 4.

3.2 Graphics

3.2.1 Generic functions

Some of the functions we learned about are generic functions. Generic functions are functions that can accept arguments of different classes. What the function does depends on the class, according to the method defined for that class. The advantages of having generic functions are easier remembering function names and ability to run the same code on different types of objects.

For example, print is a generic functions. When the print function gets a vector it prints the values, but when it gets a raster stars object its prints a summary of its properties (Section 1.1.5). Similarly, the graphical function plot (below) displays different graphical output depending on the type of input(s).

3.2.2 Graphical functions

The graphical function plot, given a numeric vector, displays its values in a two dimensional plot where:

  • Vector indices are on the x-axis
  • Vector values are on the y-axis

For example (Figure 3.1):

Plot of the `nov` vector

Figure 3.1: Plot of the nov vector

The type="b" argument means draw both points and lines. Other useful options for type include:

  • type="p" for points (the default)
  • type="l" for lines
  • type="o" for overplotted lines and points

If we pass two vectors to plot, the values of the first vector appears on the x-axis, while the values of the second vector appear on the y-axis. For example, we can put the years of water level measurement on the x-axis, as follows (Figure 3.2):

`nov` as function of `year`

Figure 3.2: nov as function of year

We can add a horizontal line displaying the Kinneret “red line” using abline with the h parameter. The h parameter determines the y-axis value for the horizontal line. Note that abline draws an additional “layer” in an existing graphical device, which was initiated with plot (Figure 3.3):

Adding a horizontal line with abline

Figure 3.3: Adding a horizontal line with abline

Other additional “layers” can be added to an existing plot using the functions points and lines. For example, the following code section draws both the nov and may time series in the same plot. We are using the graphical parameter col to specify a different line color. In addition, we are setting the y-axis range with ylim to make sure both time series fit inside the displayed range. The ylim argument needs to be a vector of length two, the minimum and the maximum (Figure 3.4):

Adding a second series with lines

Figure 3.4: Adding a second series with lines

Finally, we can set the axis labels using the xlab and ylab parameters of the plot function (Figure 3.5):

Setting axis labels

Figure 3.5: Setting axis labels

3.2.3 Consecutive differences

The diff function can be used to create a vector of differences between consecutive elements:

Why do you think we added NA at the beginning of the vector?

Now we can find out which year had the biggest water level increase or decrease:

These results are visualized in Figure 3.6.

Years of biggest increase (2003) and decrease (2008) in the `nov` time series

Figure 3.6: Years of biggest increase (2003) and decrease (2008) in the nov time series

Note that which.min and which.max ignore NA values.

3.3 Defining custom functions

3.3.1 Function definition components

In Section 1.3.6, we learned that a function call is an instruction to execute a certain function, as in:

The function itself is actually an object containing code, which is loaded into the RAM and can be executed with specific parameters. So far we met functions defined in the default R packages (e.g., mean, seq, length, etc.). Later on we will also use functions from external packages (e.g., st_read). In this section, we learn how to define our own custom functions.

Here is the structure of a function definition expression in R:

The expression is composed of:

  • A function name (add_five)
  • The assignment operator (=)
  • The function keyword (function)
  • Parameter(s) ((x))
  • Brackets ({)
  • Code (x_plus_five = x + 5)
  • Returned value (return(x_plus_five))
  • Brackets (})

3.3.2 Function definition vs. function call

The idea is that the code inside the function gets executed each time the function is called. For example, the function we just defines, add_five, can be used to calculate the sum of various numbers and five:

3.3.3 Local variables

When we make a function call, the values we pass as function arguments are assigned to local variables which the function code can use. The local variables are not accessible in the global environment. For example, even though we just executed two function calls of add_five, where the x_plus_five was defined, x_plus_five is unavailable in the global environment:

3.3.4 Returned value

Every function returns a value. We can assign the returned value to a variable to keep it in memory for later use:

A return expression, such as the one we used in add_five, is optional and can be omitted:

If the return expression is omitted, the returned value is the result of the last expression in the function body. The following alternative definition of add_five, where the assignment and the return expressions were omitted, is therefore identical:

We can also omit the { and } parentheses in case the code consists of a single expression. Therefore the add_five function can be defined with shorter code:

3.3.5 Default arguments

Default arguments (Section 2.3.7) can be defined as part of the function definition. In case there is a default value, we can skip that parameter in function calls.

For example, the following definition of add_five does not specify a default value for x. Therefore, trying to call add_five without passing an argument for x gives an error:

The following alternative definition does specify the default value of 1 for x. The default value is used when calling the function without specifying x:

3.3.6 Argument types

There are no restrictions for the classes and dimensions of arguments a function can accept, as long as we did not set such restrictions ourselves, e.g., using conditionals (Section 4.2.2). However, we get an error if one of the expressions in the function code is illegal given the arguments.

For example, the add_five function accepts vectors of length >1, adding five to each element:

However, passing a character value gives an error, because the internal expression x+5 cannot be executed:

3.3.7 More examples

As another example, let’s define a function named first_last which accepts a vector and returns the difference between the last and the first elements:

Here are three different function calls to demontrate that our function indeed works as expected:

Define a function named modify that accepts three arguments:

  • x
  • index
  • value

The function assigns value into the element at the index position of vector x. The function returns the modified vector x, as shown below.


  1. When typing nov < -213, make sure there is a space between < and -. Otherwise the combination is interpreted as an assignment operator <-!