Python basics

Last updated: 2022-06-25 15:52:35

Introduction

Now that our working environment is set, and we know how to edit and execute Python code through Jupyter Notebook (see Setting up the environment), we move on the Python language itself. In this chapter, we introduce the basic concepts, operators, and data types, in Python.

Throughout most of the chapter, we are going to cover data types, in terms of their properties and their behavior. These include the elementary “atomic” data types, namely:

as well as the more complex “collection” data types, namely:

We are going to place more emphasis, and cover more methods, when discussing those data structures which are most useful for our purposes later on in the book, such as list (see Lists (list)).

In addition to data types, we are going to introduce the basic concepts of the Python language, such as Variables and assignment, Functions, and Mutability and copies.

Variables and assignment

The most basic concepts in Python, just like in any other programming language, are the concepts of variables and assignment. We assign values to variables so that we can keep intermediate results in computer memory, and keep processing them incrementally throughout our script. We will see that variables can hold values of any complexity level, ranging from simple numbers or strings, and to arrays, tables, vector layers, or rasters.

Assignment in Python is done using the assignment operator =:

  • To the left of the = operator we specify the variable name of our choice

  • To the right of the = operator we specify the value to be assigned

For example, the following expression assigns the numeric value of 3 to a variable named x:

x = 3

Variable names can be composed of lowercase letters ([a-z]), uppercase letters ([A-Z]), digits ([0-9]), and underscore (_). Also note that variable names are case-sensitive, e.g., g and G are two different variables.

We can now access the value assigned to x in any subsequent expression in our script:

x
3

Note that assigning another value to a pre-defined variable “replaces” its contents. We are going to elaborate on the meaning of this later on (see Mutability and copies):

x = 5
x
5

Trying to access an undefined variable is a commonly encountered error. For example, if we have not defined a variable named z anywhere in our script, then the expression z raises an error:

# z  ## Raises error

Note

The above expression, as well as other expressions that raise errors is “commented out” (see Code comments), so that the notebook can be run uninterrupted. To see the error message, remove the # symbol at the beginning of the line (in your own copy of the notebook) and then run the cell.

Functions

Functions are named pieces of code, to perform a particular job. We will often be executing:

  • Built-in functions

  • Functions from the standard library

  • Functions from third-party packages (see Loading packages)

Functions in Python are excuted by specifying their name, followed by parentheses. Inside the parentheses, there can be zero or more arguments (i.e., function inputs), separated by commas, depending on the function. For example, the built-in function abs accepts a number and returns its absolute value:

abs(-7)
7

Later on, we will learn how to define our own functions (see Defining functions).

Data types

Data types—overview

Numeric values, such as 3 or 5 shown above (see Variables and assignment), are just one of the built-in data types in Python. The most commonly used built-in data types are summarized in Table 8.

Table 8 Python data types

Data type

Meaning

Divisibility

Mutability

Example

int

Integer

atomic

immutable

7

float

Float

atomic

immutable

3.2

bool

Boolean

atomic

immutable

True

None

None

atomic

immutable

None

str

String

collection

immutable

"Hello!"

list

List

collection

mutable

[1,2,3]

tuple

Tuple

collection

immutable

(1,2)

dict

Dictionary

collection

mutable

{"a":2,"b":7}

set

Set

collection

mutable

{"a","b"}

These data types are the basic building blocks of Python code. Later on, we are going to learn about other, more complex, data structures, defined in third-party packages. For example, we will learn about:

  • a data structure called ndarray, which is used to represent arrays (see Creating arrays), and

  • a data structure called GeoDataFrame, which is used to represent vector layers (see Creating a GeoDataFrame).

Note the distinction between “atomic” and “collection” data types:

  • “atomic” data types which represent an indivisible value—int, float, bool, and None

  • “collection” data types that represent a collection of elements, whereas each element in the collection is an “internal” data structure, and may be either atomic or a collection—str, list, tuple, dict, and set

Instances of the data types, namely the values, can be expressed as literal values or as variables. For example, in the expression:

x = 5

5 is a literal value, while x is a variable.

Checking with type

The type function can be used to identify the data type. Let us see how the various data type names listed in Table 8 appear in the console:

type(7)
int
type(3.2)
float
type(True)
bool
type(None)
NoneType
type("Hello!")
str
type([1, 2, 3])
list
type((1, 2))
tuple
type({"a": 2, "b": 7})
dict
type({"a", "b"})
set

In the following sections (see Numbers (int, float)Sets (set)) we go over the most important properties and methods for each of these data types.

Note

To check if a given object belongs to the specified type programmatically, you can use the isinstance function. For example, isinstance(1,int) returns True (because 1 is an int), while isinstance(1.1,int) returns False (because 1.1 is a float).

Numbers (int, float)

Integers and floats

An int (integer) represents a numeric value without a decimal point, possibly negative—if is starts with -. For example, here are two int values, 3 and -78:

3
3
-78
-78

A float represents a numeric value, whether positive or negative, with a decimal point. Here are two float values, -3.2 and 3.0:

-3.2
-3.2
3.0
3.0

Note that presence of a decimal point in a literal number automatically creates a float. Otherwise, we create an int.

int and float can be distinguished based on the way they are printed (with a decimal point, or without it). More systematically, they can be distinguished using the type function (see Checking with type):

type(3)
int
type(3.0)
float

Arithmetic operators

The ordinary arithmetic operators in Python are given in Table 9.

Table 9 Arithmetic operators in Python

Operator

Meaning

+

Addition

-

Subtraction

*

Multiplication

/

Division

**

Exponent

//

Floor divition

%

Modulus

The arithmetic operators can be used with both int and float values. Here are a few examples:

1 + 5
6
7 - 3.5
3.5
5.2 * 5
26.0
1 / 2
0.5

Note that the exponent operator in Python is **:

10 ** 3
1000

Note

Confusingly, Python has a ^ operator for something completely different than exponent (such as defined in R, or in plain language), namely the Bitwise XOR operator, which is beyond the scope of this book. For example, 10^3 returns 9.

Python operators are associated with precedence rules, which are similar and in agreement with order of operations in mathematics. For example, expectedly, * has precedence over +, therefore:

1 + 2 * 3
7

Parentheses can be used to indicate precedence:

(1 + 2) * 3
9

In fact, to make our code clearer, the recommendation is to use parentheses even when they are not required:

1 + (2 * 3)
7

Arithmetic operations return int or float, as necessary. For example, addition of two int values always returns an int, because the result is guaranteed to be a whole number:

2 + 3
5

However, division of two int values always returns a float, because the result of a division is not guaranteed to be a whole number and thus cannot be always represented using an int:

2 / 2
1.0

Calculations can be assigned to variables (see Variables and assignment) to keep the intermediate result in memory, in case our calculation requires several steps. Using the assignment operator and arithmetic operators, we already know how to write Python code comprising several expressions. For example:

x = 55
y = 30
z = x - y
z = z * 10
z
250

Exercise 02-a

  • How many seconds are there in a day? Write an arithmetic expression in Python to find out.

Note

Floor division (//) and modulus (%) are less useful for the purposes of this book, and only given in Table 9 for completeness. As an exercise, search online for “python floor division” and “python modulus” to check out what they do, then try them out in the Python command line or notebook.

Increment assignment

Another commonly used Python operator is the increment assignment operator +=. The increment assignment is a shortcut to addition combined with assignment, i.e., x+=y is a shorter way to express x=x+y. For example:

x = 10
x += 5
x
15

A common use of increment assignment is to advance a “counter” variable inside a for loop (see for loops).

Note

Other than increment assignment (+=), Python also has decrement assignment (-=), multiply assignment (*=), and division assignment (/=) operators.

int and float conversions

We can convert a number to int or float, using functions of the same name:

  • intfloat—a decimal point followed by zero, i.e., .0 is added

  • floatint—anything after the decimal point is discarded

For example:

float(1)
1.0
int(11.8)
11

We can get the nearest integer using round:

round(11.8)
12

Boolean values (bool)

What are Boolean values?

Boolean values represent one of two states, “true” or “false”. Accordingly, the boolean data type in Python can have just one of two possible values, True and False. Boolean values can be created by literally typying True and False:

True
True
False
False

However, typically boolean values are created as a result of conditional expressions (see Conditions).

Negation

Boolean values can be reversed (“negated”) using the not operator, followed by a boolean value (or an expression that creates a boolean value). The not operator is considered one of the logical operators, along with and and or (Table 10) which will be introduced next (see Conditions).

Table 10 Logical operators in Python

Operator

Meaning

and

And

or

Or

not

Not

For example:

not True
False
not False
True
not 1 == 1
False

Negation is useful when writing conditionals (see Conditionals).

Conditions

Most often, boolean values arise as a result of a condition, such as:

3 > 2
True

Conditions involve conditional operators, such as > (greater than) in the above example. The conditional operators in Python are summarized in Table 11.

Table 11 Conditional operators in Python

Operator

Meaning

==

Equal

!=

Not equal

<

Less than

<=

Less than or equal

>

Greater than

>=

Greater than or equal

Here are some more examples of conditional operators:

x = 11
x > 10
True
x <= 10
False
x != 11
False

Keep in mind the distinction between the assignment operator = (see Variables and assignment) and the equality conditional operator ==!

Two or more conditional expressions can be combined into one expression, using the logical operators and or or (Table 10):

  • When using and, the expression is True if both sides are True; otherwise the expression is False.

  • When using or, the expression is True if at least one side is True; otherwise the expression is False.

For example:

1 == 1 and 2 == 3
False
1 == 1 or 2 == 3 
True
1 == 1 and not 2 == 3
True

Boolean to number

Boolean values can be converted to integers using int, in which case False becomes 0 and True becomes 1:

int(False)
0
int(True)
1

In fact, the conversion takes place automatically when mixing True and False values with numbers, as part of an arithmetic expressions or conditions. For example:

False + 9
9
True / 2
0.5
True == 1
True

None (None)

Note that Python has a special value of None:

None

The special value None has its own class, named NoneType:

type(None)
NoneType

None is used to denote the absence of a value. For example, None can be used to mark missing data in a list (see Lists (list)). The None data type is not very relevant for our purposes, so we are not going to encounter it very often later on, but you should be aware it exists.

Strings (str)

Creating strings

Strings (str) are sequences of characters, including, digits, letters, punctations, whitespaces, and directives such as “newline”.

Strings can be created using either single (') or double (") quotes. For example:

'Hello!'
'Hello!'
"Hello!"
'Hello!'

Strings created using single quotes are identical to those created with double quotes. What matters is just the contents inside the quotes. Note that strings are printed with single quotes, but this is just an inconsequential convention.

Note

The main reason for having two types of quote characters is to be able to create strings that contain internal qoutes. For example 'He said: "Hi!"' is a string that contains internal double quotes ("), which is possible thanks to the fact it is defined using single quotes (').

Note that a string can be empty:

""
''

Note

A string (str) is a considered a “collection” data type (Table 8) because a string is actually a collection of characters, rather than an atomic value. For example, we can subset a string using slicing, as in x="Hello"; x[:2] (see list slicing), as if the string was a list of characters. Although for our purposes in this book we are not going to split strings to “parts”, therefore practically treating them as atomic values, we still classified them as a “collection” data type in Table 8 for the sake of accuracy.

String length

The len function can be used to count the number of characters in a string. For example:

len('Hello')
5

Conversion to string

Other data types, such as int, float, and bool, can be converted to string using the str function:

str(12)
'12'
str(-5.7)
'-5.7'
str(False)
'False'

String concatenation

Strings can be concatenated using the + operator. That way, the contents of variables can be combined with literal strings to create an new string. For example:

x = 'Hello'
y = 'World'
x + ' ' + y
'Hello World'

Note that when trying to concatenate strings with other data types, the latter are not automatically transformed to a string, resulting in an error:

# 'band_' + 1 + '.tif'  ## Raises error!

For the concatenation to work, we must transform all components to strings:

'band_' + str(1) + '.tif'  ## This works
'band_1.tif'

Note

Other than simply using the + operator, Python has at least three other, more advanced, methods, to concatenate strings with (numeric) values. Here they are, ordered from oldest to newest:

  • “Old” style string formatting, e.g., 'band_%s.tif' % 1

  • “New” style string formatting, e.g., 'band_{}.tif'.format(1)

  • “f-strings”, e.g., x=1; f'band_{x}.tif'

The + operator is perfectly sufficient for the purposes of this book, so we will not elaborate on these methods. However, if you are going to get into text processing using Python, make sure to check them out!

String to number

Strings can be converted to int or float using functions of the same name. However, for the conversion to be successful, the string must represent a valid number. Namely, the string must contain only + or - (or nothing) at the beginning, followed by numbers. When converting to float (not int!), the number may also contain a decimal point. For example:

int('-99')
-99
float('-99.32')
-99.32
float('1')
1.0

Note

Working with stings is less relevant for our purposes in this book. Nevertheless, here is a list of useful string methods to get an impression of the built-in methods for strings in Python:

  • .strip—Remove spaces from start and end

  • .lower—Convert to lowercase

  • .upper—Convert to uppercase

  • .title—Convert to titlecase

  • .startswith(pattern)—Check if string starts with pattern

  • .endswith(pattern)—Check if string ends with pattern

  • .find(pattern)—Find the index of pattern within the string

  • sep.join([str1, str2, ...])—Join strings str1, str2, etc., using the sep string as separator

Lists (list)

Creating lists

Lists—as well as tuples (see Tuples (tuple)), dictionaries (see Dictionaries (dict)) and sets (see Sets (set)) which we cover next—are data structures that contain collections of items, known as elements. There is no homogeneity restriction, namely a list can contain any mixture of elements of any type. Each element may be any data type, including both the “atomic” data types (int, float, bool) and “collection” data types (for example, we can have a list of lists). A list is an ordered collection, meaning that the order of elements matters and that we can access individual elements using numeric indices (see Accessing list elements and list slicing).

Lists can be created using square brackets [, with elements separated by commas. For example, here is how we can create an empty list:

x = []
x
[]

And here is how we can create a list with seven elements of type string:

days = ["Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"]
days
['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']

An important property of lists is their length, i.e., the number of elements. List length can be calculated using function len:

len(days)
7

Again, keep in mind that there is no restriction on the type of list element. We can mix different types in the same list, although such “heterogeneous” lists are less useful in practice:

[1, "A", True, [55,56,57]]
[1, 'A', True, [55, 56, 57]]

Accessing list elements

List elements can be extracted using an index inside square brackets [. Importantly, indexing in Python starts at zero. For example, days[0] returns the first element of days:

days[0]
'Sun'

days[1] returns the second element of days:

days[1]
'Mon'

days[2] returns the third element of days:

days[2]
'Tue'

and so on. We can think of the Python index as an offset; the first element has an offset of zero, the second element has an offset of one, and so on.

Note

Trying to access a list item beyond list length raises an error. Try executing an expression such as days[10] to see this behavior for yourself.

Assignment to list

We can modify a list element by assigning a new value into it. For example:

days[2] = "ABC"
days
['Sun', 'Mon', 'ABC', 'Wed', 'Thu', 'Fri', 'Sat']

Let us do another assignment to get back the original days list:

days[2] = "Tue"
days
['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']

Note that updating an element (or any other subset) in an existing data structure is considered an “in place” modification, and as such applicable only to mutable data types such as a list. We elaborate on the meaning and implications of “in place” operations later on (see Mutability and copies).

list slicing

We can get a subset of a list, containing just some of the elements, using a notation known as slicing. Slicing uses an index of the form start:stop:step. The meaning of the three componenets is as follows:

  • start—where to start, default is 0

  • stop—where to stop, default is at the end of the list

  • step—step size, default is 1

The resulting goes from start (inclusive) to stop (exclusive), and progresses in steps of size step:

  • When start is omitted (e.g., :end), the subset starts from the beginning

  • When stop is omitted (e.g., start:), the subset goes all the way to the last element

  • When step is omitted (e.g., start:end), the default step of size 1 is used

Note

The rationale behind end being exclusive is that a combination (see list operators) of complementary slices returns the complete list, e.g., days[:3]+days[3:] is equal to days.

For example, days[0:3] means start at the element with index 0, end before the element with index 3 (i.e., end at index 2 inclusive), using step size 1. Therefore we get the first three elements—0, 1, and 2—from days:

days[0:3]
['Sun', 'Mon', 'Tue']

The default value of start is 0. Therefore it can be omitted, to get the same result using days[:3]:

days[:3]
['Sun', 'Mon', 'Tue']

However, if we need to get one or more elements from the middle of the list, we need to use both start and end, as follows:

days[1:3]
['Mon', 'Tue']

When stop is omitted, the subset includes all element from start till the end of the list:

days[1:]
['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']

Note that list subsets using the slice notation, even if they are comprised of just one element, are list objects of length one:

days[:1]
['Sun']

while individual elements (see Accessing list elements) are returned as standalone objects:

days[0]
'Sun'

list operators

The + and * arithmetic operators are defined for lists, too, but their meaning is different than with numbers (see Arithmetic operators):

  • + appends two or more lists together

  • * replicates a list

For example, here we use + to append the first two weekdays with another list of length 3, to get a list of length 5:

days[:2] + [1,2,3]
['Sun', 'Mon', 1, 2, 3]

The * operator replicates a list. The right-hand value needs to be an int, specifying the number of repititions. For example, here we replicate the first two days of the week three times:

days[:2] * 3
['Sun', 'Mon', 'Sun', 'Mon', 'Sun', 'Mon']

What are methods?

In the next section, we introduce the concept of methods (see list methods). A method is similar to a function (see Functions), but it is part of a data type (and, more generally, of a class) definition, unlike a function which is a standalone object. A method is invoked using an object name, followed by a dot (.), then the method name.

For example, as we have already seen, a function with named do_something, with an argument named z, is invoked with:

do_something(z)

A method named do_something, however, would be invoked with:

x.do_something(z)

where x is an object of a class that has a do_something method.

list methods

Some of the most useful methods for modifying lists are .append, .pop, .reverse, and .sort. Let us see what they do through examples.

The .append method appends the specified new element to a list, thus incresing its length by one. The new element is appended at the end of the list. For example, here is now we append the string "New day" at the end of days:

days.append("New day")
days
['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'New day']

Note that list methods such as .append, .pop, .reverse, and .sort (see below), modify the list itself, also known as “in place” (see Mutability and copies), and return None. There is no need to assign the result back to the original variable. For example, doing something like days=days.append("New day") is incorrects, as this will just assign None to days and we will lose the information in the list.

The .pop method does the opposite of .append. Namely, .pop removes the last element of the given list. For example, here is how we can remove the last element ("New day") in days, thus returning to the original list:

days.pop()
days
['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']

The .reverse method, as can be expected, reverses the list:

days.reverse()
days
['Sat', 'Fri', 'Thu', 'Wed', 'Tue', 'Mon', 'Sun']

Another useful method is .sort(), which sorts the given list. In case the list contents are strings, then they are sorted in alphabetical order:

days.sort()
days
['Fri', 'Mon', 'Sat', 'Sun', 'Thu', 'Tue', 'Wed']

Before moving on, let us re-define days to get the original order of week days:

days = ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']
days
['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']

The in operator

The in operator helps us check whether a given value appears in a particular list. For example, suppose we have a list with vowels, named vowels:

vowels = ['a', 'e', 'i', 'o', 'u']
vowels
['a', 'e', 'i', 'o', 'u']

How can we check whether a particular string, such as "e", is a vowel? The straightforward approach is to compare the given value to each letter and combine the results with or (see Conditions):

'e' == vowels[0] or \
'e' == vowels[1] or \
'e' == vowels[2] or \
'e' == vowels[3] or \
'e' == vowels[4]
True

Note that we use the \ character to denote that the expression continues on the next line. Otherwise Python detects that the expression ends with an or (and nothing after it), which raises an error.

The above expression is straightforward but rather verbose, and specific to the list length. Instead, we can use the in operator, which is both shorter and more general:

'e' in vowels
True

Here is another example, where the result is False since the value "b" does not occur in the vowels list:

'b' in vowels
False

Exercise 02-b

  • What will be the result of 'sun' in days, and why? Run the expression to check your answer.

Tuples (tuple)

Creating tuples

Tuples are ordered collections of values of any type, just like lists (see Lists (list)). The difference between lists and tuples is that tuples are immutable, while lists are mutable. In other words, the tuple data type may be considered the immutable version of list. We elaborate on the concept of mutability later on (see Mutability and copies). In short, mutable data types can be modified after creation, e.g., using assignment to subsets (see Assignment to list) or “in place” methods (see list methods), while immutable data types cannot be modified.

Tuples can be created using ordinary parentheses (, with elements separated by commas:

t = ('one', 'two', 'three')
t
('one', 'two', 'three')

Here is a little inconsistency that is important to be aware of. In case we want to create a tuple that contains just one element, we still must include a comma after it:

u = ('one',)
u
('one',)

Otherwise, the parentheses are ignored and the result is a the element itself (such as a string), rather than a tuple contaning it which was our intention:

u = ('one')
type(u)
str

Note

In fact, parentheses are not required to create a tuple; commas are sufficient. For example, 1, or 2,4 are tuples too. (Execute these expressions to see for yourself!)

Tuple methods

Tuples can be indexed (see Accessing list elements), sliced (see list slicing), duplicated or combined (see list operators), or evaluated using in (see The in operator), just like lists. For example:

t[0]
'one'
'two' in t
True

Since tuples are immutable, however, we cannot assign into an existing tuple:

# t[0] = 'ten'  ## Raises error!

For the same reason, we also cannot modify a tuple “in place” using list methods such as .append, .pop, .reverse, or .sort (see list methods).

Conversion to and from list

Tuples can be converted to and from lists, using the list and tuple functions, respectively. For example:

list(t)
['one', 'two', 'three']
tuple(list(t))
('one', 'two', 'three')

Dictionaries (dict)

Creating a dict

A dictionary (dict) is a collection of key:value pairs, where the keys and values can be of any Python data type, as long as the dict keys are immutable. Typically, the keys are strings. Another important property of the keys is that they must be unique, because they are used to access the dict values.

A dictionary can be created using curly brackets, encompassing key:value pairs, separated by commas. For example, the following expression creates a dictionary named person, containing four key:value pairs:

person = {'firstname': 'John', 'lastname': 'Smith', 'age': 50, 'eyecolor': 'blue'}
person
{'firstname': 'John', 'lastname': 'Smith', 'age': 50, 'eyecolor': 'blue'}

Note that three of the values are strings (str) and one is an integer (int), while all keys are strings.

Accessing dict values

Unlike a list (see Lists (list)) or a tuple (see Tuples (tuple)), where elemets are accessible through numeric indices, dictionary entries are not associated with any particular order and therefore cannot be accessed using a numeric index. Instead, dictionary values are only accessible through the keys. In that sense, a Python dict is analogous to a real-life dictionary, since both associate, or translate, one set of values (keys) with another (values). For example, an English-French dictionary associates English words (keys) with a French translation (values).

Dictionary values are accessed using square brackets ([) in an expression such as d[key], where d is a dict object and key is the key. Again, keep in mind that dictionary keys are typically strings, but in general they can be any other immutable data type. For example, here is how we can access each of the four values in person:

person['firstname']
'John'
person['lastname']
'Smith'
person['age']
50
person['eyecolor']
'blue'

Assignment to dict

Dictionary values can be modified, by assignment and using the respective key, similarly to the way that list values can be modified by assignment using a numeric index (see Assignment to list):

person['firstname'] = 'James'
person
{'firstname': 'James', 'lastname': 'Smith', 'age': 50, 'eyecolor': 'blue'}

We can also create new key:value pairs, by assignment to a non-existing property:

person['owns_car'] = True
person
{'firstname': 'James',
 'lastname': 'Smith',
 'age': 50,
 'eyecolor': 'blue',
 'owns_car': True}

Detecting dict keys

The in operator, which we used to check if a list contains a given element (see The in operator), applies to dict keys and can be used to check if a dictionary contains a given key:

'firstname' in person
True
'address' in person
False

Exercise 02-c

  • Python data types can be combined into more complex data structures. For example, we can create a list of tuples, a dictionary of lists, and so on.

  • Create a dictionary with two keys, "a" and "b", and two values which are lists, [1,2] and [3,4], respectively.

  • Which expression can be used to access the value 4?

Sets (set)

A set is a collection of values which are guaranteed to be unique. A set is used to indicate whether a particular value is part of a group or not, without any additional information about that value. In other words, a set can be thought of as a dict with just the keys (without the values).

A set can be created from scratch, using curly brackets {, with elements separated by commas:

x = {'John', 'James', 'Bob'}
x
{'Bob', 'James', 'John'}
type(x)
set

A set can also be created from a dictionary, using the set function. In that case, the values are discarded:

set(person)
{'age', 'eyecolor', 'firstname', 'lastname', 'owns_car'}

Finally, a set can be created from a list, in which case duplicated values are discarded:

set([1, 7, 9, 7])
{1, 7, 9}

We are not going to use sets later on in this book. However, it is important to be aware that this basic data structure exists, in case you encounter it when working with Python.

Mutability and copies

Overview

As mentioned above, data types in Python can be divided into two groups based on the ability to modify them after creation (Table 8):

  • Immutable types, which cannot be changed after creation

  • Mutable types, which can be changed “in place” after creation

In this section, we elaborate on “in place” modification of mutable variables, and demonstrate the implications we need to be aware of when using it. Before that, we need to know a little more, at least conceptually, about how variables and data sctructures are stored in computer memory.

It is helpful to think of a data structure as a specific location in computer memory, containing a particular information in a data structure, whether mutable or immutable. For example, suppose that we define a variable named a with a value, such as 2 or [1,2]. Now, the label a refers to a memory location which stores that particular value. We can schematically illustrate this as follows, where a is a label we place on a “box”, a memory location holding the information, marked as ×:

a → ⊠

When re-assigning a new value into an existing variable, we can think of the label “switching” to point at a new memory location, with new information +.

a ↘ ⊠
    ⊞

Additionally, and only with mutable values, we may modify the value “in place”. For example, in this chapter we learned about five methods to modify list values “in place”:

When modifying a value “in place”, the label still points to the same memory location. It is just that the information in that memory location has changed, e.g., from × to +:

a → ⊞

Why does this matter? When writing code, what is the practical difference between modifying a memory location “in place”, and switching to a new memory location when re-assigning a new value? The answer is, it matters in situations where we have more than one copy of the same variable.

It is important to understand, that, when creating a copy of a variable, as in b=a, we are creating a copy of the “label”, which points to the same memory location as a. So that, now, we have two lables pointing at the same memory location:

a → ⊠ 
b ↗  

What happens if we modify one of the variables a or b “in place”? The answer is that the change is going to be reflected in the other variable too!

a → ⊞ 
b ↗  

We can create “real” independent copies, using the .copy method, as in b=a.copy() instead of b=a. In this case, a and b point at different memory locations, so that any modification of one does not affect the other:

a → ⊠ 
b → ⊠ 

The next two sections demonstrate the ideas described here in practice.

Immutable values

As discussed above, creating a variable named a with the immutable value 2 means that we now have a “label” a, which “points” at a fixed value of 2:

a = 2
a
2

Assigning a into another variable b makes both labels a and b point to the same immutable value 2:

b = a
b
2

Making a copy of the “label” does not create another copy of the data, just another pointer to the same data. Programmatically, the fact that a and b are pointers to the same memory location can be detected using the is operator:

a is b
True

Variables pointing at immutable values, such as int, are basically labels to values that cannot be changed. There are no methods to modify an int “in place”. We can only re-assign a new value. For example, assigning a new value to a, such as 55, makes the respective label a point to another memory location with a different immutable value:

a = 55
a
55

The second label b is unaffected, still pointing to the same immutable value 2:

b
2

Consequently, a and b are no longer labels for the same memory location:

a is b
False

We can illustrate the old and new situations as follows:

(1) a → 2    (2) a → 55 
    b ↗          b → 2  

The above may seem obvious, but hang on. Mutable data types are the ones associated with tricky behavior when modifying them “in place”, as shown next.

Mutable values

Variables pointing at mutable data structures, such as a list, behave exactly the same way immutable ones when assigning a new value, as in a=[5]. However, mutable values behave differently (and unexpectedly, in case you are unaware of this behavior) when modifying them “in place”.

For example, suppose we have a list a. In other words, the label a which points to a (mutable) data structure:

a = [1, 2]
a
[1, 2]

Next, we create a copy of a, named b. The label b now points at the memory location, with the same information, as a:

b = a
b
[1, 2]

Now, let us modify a through assignment to a subset (see Assignment to list). Importantly, this operation modifies a “in place”. We do not make the label a point at a new “box”, elswhere in computer memory (as in a=[5]). Instead, we make a modification inside the existing “box”:

a[0] = 500
a
[500, 2]

Perhaps surprisingly, the modification of a is also reflected in b:

b
[500, 2]

We can illustrate the old and new situations as follows:

(1) a → [1, 2]    (2) a → [500, 2] 
    b ↗               b ↗  

What has happened? Recall that a and b are labels pointing to the same memory location:

a is b
True

An “in place” modification of a—such as assignment to a subset (as shown here), or .append, .pop, .reverse, and .sort— modifies the information where it is at, not switching the label a to a new memory location. Therefore, the change is going to be reflected in b, or in any other label pointing towards the same memory location. Only operations where we “point” a to a new, different, data structure, such as a=[5] will not be reflected in b.

If we want to create an explicit copy of the data, that is, to have the same information in a new independent memeory location, we need to use the .copy method when creating the copy:

a = [1, 2]
b = a.copy()
b
[1, 2]

Using the is operator, we can demonstrate that a and b are pointers to distinct memory locations:

a is b
False

Now, modifying a does not affect b, since a and b are independent copies:

a[0] = 500
a
[500, 2]
b
[1, 2]

Here is an illustration of the old and new situations when b is created using b=a.copy():

(1) a → [1, 2]    (2) a → [500, 2] 
    b → [1, 2]        b → [1, 2]

More exercises

Exercise 02-d

Exercise 02-e

  • Create a dictionary named person, as shown below.

  • Write expressions that return:

    • John’s eye color.

    • John’s last name.

    • A string combining John’s first and last name, separated by a space.

    • A boolean value indicating whether one of John’s hobbies is "Drawing".

person = {
    "name": {"first": "John", "last": "Smith"}, 
    "age": 50, 
    "eyecolor": "blue", 
    "hobbies": ["Fishing", "Golf", "Python programming"]
}