# Create an empty list
= []
my_list print(my_list)
= list()
my_list print(my_list)
[]
[]
List is a build-in array-like data structure in Python. Unlike “traditional” arrays, Python lists are untyped. This makes their usage simpler but less efficient and sometimes “dangerous”. More efficient ‘standard’ array implementation can be found in the NumPy package, which we will discuss in a later lecture.
Creating empty lists can be done in two ways:
# Create an empty list
= []
my_list print(my_list)
= list()
my_list print(my_list)
[]
[]
Using the square brackets []
is the preferred way in Python. The function list()
can also be used to to case any iterable to a list.
print(list("hello"))
['h', 'e', 'l', 'l', 'o']
# Create a list with some values
= [4, 5, 6, 7, 8]
my_list print(my_list)
[4, 5, 6, 7, 8]
# create a list of 10 copies of the same element
= [7] * 10
my_list print(my_list)
# useful for "initialization"
= [0] * 10
my_list print(my_list)
[7, 7, 7, 7, 7, 7, 7, 7, 7, 7]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Indexing is done using square brackets []
. The index is the position (zero-based) of the element in the list. Negative indices count from the end of the list.
= [4, 5, 6, 7, 8]
my_list print(my_list[0]) # first element
print(my_list[-1]) # last element
4
8
Slicing allows to take a ‘subset’ of the list. The syntax is my_list[start:end]
or my_list[start:end:step]
. The result will include elements at indices starting from the start
up to but not including the end
. If the step is specified, this will be the increment (or decrement if negative) between indices (by default, the step is 1). If the slice should start from the beginning, the start
value can be omitted. Likewise, if the slice should end at the end, the end
value can be omitted.
= [4, 5, 6, 7, 8]
my_list print(my_list[0:2]) # first two elements
print(my_list[:2]) # first two elements, same as [0:2]
print(my_list[2:5]) # elements from index 2 to end (index 4)
print(my_list[2:]) # elements from index 2 to end (index 4), same as [2:5]
[4, 5]
[4, 5]
[6, 7, 8]
[6, 7, 8]
print(my_list[:-2]) # from beginning till the second to last index (excluding)
print(my_list[-2:]) # last two elements - from the second to last index till end
[4, 5, 6]
[7, 8]
All of these are the same:
print(my_list[0:5:2]) # every second element from the beginning to the end
print(my_list[0::2]) # every second element from the beginning to the end
print(my_list[::2]) # every second element from the beginning to the end
[4, 6, 8]
[4, 6, 8]
[4, 6, 8]
Negative step:
print(my_list[::-1]) # reverse the list
[8, 7, 6, 5, 4]
…and any other combination you can imagine.
There are several ways of how to remove elements from a list: splicing, ‘popping’ and deleting.
Splice (combine two sub-lists, excluding the removed element):
= [4, 5, 6, 7, 8]
my_list = my_list[:2] + my_list[3:] # remove the element at index 2
my_list print(my_list)
[4, 5, 7, 8]
Using the pop
method:
= [4, 5, 6, 7, 8]
my_list print(my_list.pop()) # remove the last element and return it
print(my_list)
print(my_list.pop(2)) # remove the element at index 2 and return it
print(my_list)
8
[4, 5, 6, 7]
6
[4, 5, 7]
Using the del
keyword:
= [4, 5, 6, 7, 8]
my_list del my_list[2] # delete the element at index 2
print(my_list)
[4, 5, 7, 8]
There is also the remove
method, which removes the first occurrence of the specified value.
= [4, 5, 6, 7, 8]
my_list 5)
my_list.remove(print(my_list)
2]) # remove the element at index 2 my_list.remove(my_list[
[4, 6, 7, 8]
Deleting from the end of the array is typically faster than deleting from ‘within’ the array, which is still faster than deleting from the beginning of the array.
Using the append
method:
= [4, 5, 6, 7, 8]
my_list 9)
my_list.append(print(my_list)
[4, 5, 6, 7, 8, 9]
Using the insert
method:
= [4, 5, 7, 8]
my_list 2, 6) # insert 6 at index 2
my_list.insert(print(my_list)
[4, 5, 6, 7, 8]
Inserting at the beginning of the array is typically faster than inserting in the middle or at the end of the array.
Extending an array with extend
:
= [4, 5, 7, 8]
my_list 6, 9])
my_list.extend([print(my_list)
[4, 5, 7, 8, 6, 9]
Or adding two lists with +
:
= [4, 5, 7, 8]
my_list = my_list + [6, 9]
my_list print(my_list)
[4, 5, 7, 8, 6, 9]
Length of a list:
= [4, 5, 7, 8]
my_list print(len(my_list))
4
Remember the zero-based indexing:
= len(my_list)
n print(my_list[n - 1])
8
Comparison of lists (not very useful, usually you compare elements in a loop):
= [1, 2, 3]
list_a = [4, 5, 6]
list_b
# comparison
print("\nComparison:")
print(list_a == list_b) # equality
print(list_a != list_b) # inequality
print("Equality is not strict but 'ordered'")
print("[1, 2, 3.0] == [1, 2, 3]: ", [1, 2, 3.0] == [1, 2, 3]) # True
print("[1, 2, 3] == [3, 2, 1]: ", [1, 2, 3] == [3, 2, 1]) # False
# Be (ALWAYS) aware of precision/numerical issues
print("[1, 2, 3.0000000000000000001] == [1, 2, 3]: ", [1, 2, 3.0000000000000000001] == [1, 2, 3]) # True, depends on the precision
Comparison:
False
True
Equality is not strict but 'ordered'
[1, 2, 3.0] == [1, 2, 3]: True
[1, 2, 3] == [3, 2, 1]: False
[1, 2, 3.0000000000000000001] == [1, 2, 3]: True
print("3 == 3.00000001", 3 == 3.00000001) # False
print("3 == 2.99999999", 3 == 2.99999999) # False
print("1/3 == 0.33333333", 1/3 == 0.33333333) # False
print("3.00000000000000001 == 3: ", 3.00000000000000001 == 3) # True
print("2.99999999999999999 == 3: ", 2.99999999999999999 == 3) # True
print("1/3 == 0.333333333333333333333: ", 1/3 == 0.333333333333333333333) # True
3 == 3.00000001 False
3 == 2.99999999 False
1/3 == 0.33333333 False
3.00000000000000001 == 3: True
2.99999999999999999 == 3: True
1/3 == 0.333333333333333333333: True
End of side-quest.
Inequality of lists:
print("\nOrdering:")
print(list_a < list_b) # Compares maximum value
print(list_a > list_b)
print(list_a <= list_b)
print(list_a >= list_b)
Ordering:
True
False
True
False
Membership checking:
print("\nMembership:")
print(1 in list_a) # membership
print(7 not in list_a)
Membership:
True
True
index
- find the index of the first occurrence of the specified value
= [4, 5, 6, 7, 8]
my_list print(f"Index of 5: {my_list.index(5)}")
Index of 5: 1
count
- count the number of occurrences of the specified value
= [4, 5, 6, 5, 7, 7, 8, 7]
my_list print(f"Count of 5: {my_list.count(5)}")
Count of 5: 2
sort
- sort the list
= [6, 4, 7, 5, 3]
my_list
my_list.sort()print(my_list)
=True)
my_list.sort(reverseprint(my_list)
[3, 4, 5, 6, 7]
[7, 6, 5, 4, 3]
reverse
- reverse the list
= [4, 5, 6, 7, 8]
my_list
my_list.reverse()print(my_list)
[8, 7, 6, 5, 4]
copy
- create a copy of the list
= [4, 5, 6, 7, 8]
my_list = my_list # not a copy, just a reference
not_a_copy 9)
not_a_copy.append(print(my_list) # 9 was added to the original list
= my_list.copy() # make a copy
my_list_copy 10)
my_list_copy.append(print(my_list_copy) # 10 was added to the copy
print(my_list) # the original list is not affected
[4, 5, 6, 7, 8, 9]
[4, 5, 6, 7, 8, 9, 10]
[4, 5, 6, 7, 8, 9]
Buuuut, copy
is not a deep copy:
= {"a": 1, "b": 2}
my_dict = [[4, 5, 6], my_dict]
my_list = my_list.copy()
my_list_copy 1]["a"] = 3
my_list_copy[print(my_list)
[[4, 5, 6], {'a': 3, 'b': 2}]
Although, you shouldn’t be using multi-typed lists anyway.
It is best to avoid using heterogeneous lists (though sometimes, they are handy).
# Jagged nested list
= [[1, 2, 3], [4, 5], [6]]
my_list print(my_list)
[[1, 2, 3], [4, 5], [6]]
# Multi-typed
= [1, 2, 3, "hello", [4, 5, 6]]
my_list print(my_list)
[1, 2, 3, 'hello', [4, 5, 6]]
The range
function creates a “range” object containing a sequence of integers from 0 to some number, excluding the last number. This can be converted to a list using the list()
function.
# Create a list of the first 5 integers
print(list(range(5)))
[0, 1, 2, 3, 4]
The range
function allows you to specify the starting number and the step size. The general syntax is range(start, stop, step)
.
# Numbers from 5 to 10
print(list(range(5, 10)))
[5, 6, 7, 8, 9]
# Create a list of even numbers from 0 to 10
print(list(range(0, 10, 2)))
[0, 2, 4, 6, 8]
List comprehensions are a concise way to create lists. However, they are technically equivalent to a for loop.
= [x * 2 for x in range(5)]
my_list print(my_list)
[0, 2, 4, 6, 8]
Which is equivalent to:
= []
my_list for x in range(5):
* 2)
my_list.append(x print(my_list)
[0, 2, 4, 6, 8]
The syntax for list comprehensions is:
<expression> for <item> in <iterable>] [
Comprehension, in general, are ‘one-liner’ creators of collections of objects. They can in some cases make the code more readable and in some cases much less readable. Basic Python datatypes support comprehensions.
Comprehension with conditions:
<expression> for <item> in <iterable> if <condition>] [
<expression>
will be added to the list only if <condition>
is True
.
= [x for x in range(10) if x % 2 == 1]
odd_numbers print(odd_numbers)
[1, 3, 5, 7, 9]
Comprehension with conditions and if-else:
<expression_one> if <condition> else <expression_two> for <item> in <iterable>] [
<expression_one>
will be added to the list if <condition>
is True
. Otherwise, <expression_two>
will be added to the list (<condition>
is False
).
= [x if x % 2 == 1 else -x for x in range(10)]
even_or_negative_numbers print(even_or_negative_numbers)
[0, 1, -2, 3, -4, 5, -6, 7, -8, 9]
Dictionaries are like lookup tables. They are implemented using a hash table, which means accessing an element is very fast.
Empty dictionarys can be created using the dict()
function or with {}
(preferred way).
= dict()
empty_dict = {} also_empty_dict
Initialization:
= {
my_dict "a": 1,
"b": 2,
}print(my_dict)
{'a': 1, 'b': 2}
Insert
= {}
my_dict "a"] = 1
my_dict["b"] = 2
my_dict[print(my_dict)
{'a': 1, 'b': 2}
Delete
= {"a": 1, "b": 2}
my_dict del my_dict["a"]
print(my_dict)
{'b': 2}
= {"a": 1, "b": 2}
my_dict print(my_dict["a"])
print(my_dict.get("a")) # safe retrieval, return None if "a" not in dictionary
print(my_dict.get("c")) # safe retrieval, return None if "a" not in dictionary
print(my_dict.get("a", -1)) # default value
print(my_dict.get("c", -1)) # default value
1
1
None
1
-1
Membership
= {"a": 1, "b": 2}
my_dict print("a" in my_dict)
print("c" in my_dict)
True
False
Dictionary comprehension:
**2 for x in range(5)} {x: x
The hash()
function returns the hash value of an object - seemingly arbitrary but for the same input consistent value (headache warning: the consistency holds only for the current “session”!).
print(hash("hello"))
print(hash("hello") == hash("hello"))
-518558418373800793
True
Array access, given an index is fast (constant time), however, finding a specific value is slow. Hashmaps solve this by encoding the value into an index.
def hash_into_index(value, total_items):
return hash(value) % total_items
print(hash_into_index("hello", 10))
7
class MyHashmap:
def __init__(self, total_items):
self.total_items = total_items
self.keys = [None] * total_items
self.values = [None] * total_items
def __setitem__(self, key, value): # magic to allow indexed assignment
= hash_into_index(key, self.total_items)
index self.keys[index] = key
self.values[index] = value
def __getitem__(self, key): # magic to allow indexing
= hash_into_index(key, self.total_items)
index return self.values[index]
def __contains__(self, key): # magic to allow the "in" keyword
= hash_into_index(key, self.total_items)
index return self.keys[index] == key
def __iter__(self): # magic to allow the for-each loop
for key, value in zip(self.keys, self.values):
if key is not None:
yield key, value
def print(self):
for key, value in self:
print(f"{key}: {value}")
= MyHashmap(10)
hashmap "hello"] = "world"
hashmap[print(hashmap["hello"])
print("hello" in hashmap)
print("Contents of our hashmap:")
"water"] = "world"
hashmap["hellno"] = "word"
hashmap["pi"] = 3.14
hashmap["e"] = 2.718
hashmap[9] = "number hashes are the same as the number itself"
hashmap[print() hashmap.
world
True
Contents of our hashmap:
e: 2.718
pi: 3.14
water: world
9: number hashes are the same as the number itself
Obviously, this is not a very good implementation of a hashmap: 1) Collisions (multiple keys hash to the same index) 2) Memory usage (None values for empty slots)
Besides lists, Python also has tuples and sets. These can also store multiple values but offer different functionalities.
A tuple is an immutable list. Tuples are created using the parentheses ()
. Similarly, the tuple()
function can be used to convert an iterable to a tuple.
Empty tuple:
= tuple() # kinda pointless - tuples cannot be changed
empty_tuple = () # kinda pointless - tuples cannot be changed empty_tuple
Single element tuple:
= (1,) # mind the comma! (1) is just 1 in brackets
single_element_tuple print((2) * 10) # just `2 * 10` => 20
print((2,) * 10) # tuple of 10 2s
20
(2, 2, 2, 2, 2, 2, 2, 2, 2, 2)
Creating tuples:
= (1, 2, 3)
my_tuple print(my_tuple)
# Create a tuple from a list
= [4, 5, 6]
my_list = tuple(my_list)
my_tuple print(my_tuple)
(1, 2, 3)
(4, 5, 6)
Tuple comprehension:
= (x * 2 for x in range(5))
my_tuple print(my_tuple)
<generator object <genexpr> at 0x7f6e5d984c10>
Oops, that’s not what we wanted! Actual tuple comprehension (we might talk about generators later):
= tuple(x * 2 for x in range(5))
my_tuple print(my_tuple)
(0, 2, 4, 6, 8)
A set is an unordered collection of unique elements. Sets are created using the {<iterable>}
. Similarly, the set()
function can be used to convert an iterable to a set.
Empty set:
= set() empty_set
Careful, {}
will not create an empty set but an empty dictionary! (see below)
Creating sets:
# Create a set
= {1, 2, 3}
my_set print(my_set)
# Create a set from a list
= [4, 5, 6]
my_list = set(my_list)
my_set print(my_set)
7) # Add an element to the set
my_set.add(print(my_set)
{1, 2, 3}
{4, 5, 6}
{4, 5, 6, 7}
Adding elements to a set is an idempotent operation - adding the same element twice does not change the set.
= {1, 2, 3}
my_set print("my_set: ", my_set)
4)
my_set.add(print("my_set after adding 4: ", my_set)
for i in range(100):
4)
my_set.add(print("my_set after adding 4 one hundred times: ", my_set)
my_set: {1, 2, 3}
my_set after adding 4: {1, 2, 3, 4}
my_set after adding 4 one hundred times: {1, 2, 3, 4}
Set comprehension:
= {int(x / 2) for x in range(15)}
my_set print(my_set)
{0, 1, 2, 3, 4, 5, 6, 7}
Sets are not immutable but they do not support assignment. Rather, you can add or remove elements.
10) # The set can be changed.
my_set.add(0] = 10 # This will raise an error!!! my_set[
Main difference is that tuples are immutable. They cannot be changed once created. Some operations are faster than on lists.
= (1, 2, 3)
my_tuple 1] = 4 # This will raise an error!!! my_tuple[
Tuples are used as the return type in case of functions returning multiple values.
def my_function():
= 5
n = [i for i in range(n)]
l return l, n
print(my_function())
([0, 1, 2, 3, 4], 5)
Tuples are “safe” to pass around because of their immutability.
def my_function(iterable_input):
= []
result for i in iterable_input:
* 2)
result.append(i # sneakily change the input:
0] = 10
iterable_input[return result
= [1, 2, 3] # precious data we don't want to change
my_list print(f"My list before running my_function: {my_list}")
print(my_function(my_list))
print(f"My list after running my_function: {my_list}")
= (1, 2, 3) # precious data we don't want to change
my_tuple try:
print(my_function(my_tuple))
except TypeError: # this will catch the error raised whe the function tries to change the tuple
print("Aha! The function tried to change my tuple!")
My list before running my_function: [1, 2, 3]
[2, 4, 6]
My list after running my_function: [10, 2, 3]
Aha! The function tried to change my tuple!
Sets are unordered and do not allow duplicates. They are implemented as ‘hash set’ and thus certain operations are very efficient (e.g. membership checking, union, intersection, etc.).
= [1, 2, 2, 3, 3, 4, 5, 5]
my_list = set(my_list)
my_set print(my_set)
print(3 in my_set) # True
print(6 in my_set) # False
5) # Added element already in list, nothing happens
my_set.add(print(my_set)
{1, 2, 3, 4, 5}
True
False
{1, 2, 3, 4, 5}
Set specific operations:
print({1, 2, 3} | {4, 5, 6}) # Union
print({1, 2, 3} & {3, 4, 5}) # Intersection
print({1, 2, 3, 4} - {1, 4, 5}) # Difference
print({1, 4, 5} - {1, 2, 3, 4}) # Difference (reversed set order)
print({1, 4, 5} ^ {1, 2, 3, 4}) # Symmetric Difference (XOR)
{1, 2, 3, 4, 5, 6}
{3}
{2, 3}
{5}
{2, 3, 5}
Since tuple are immutable, unlike lists or sets, they can be used as keys in a dictionary.
= {
my_tuple_dict 1, 2): 3,
(4, 5): 6
(
}
print(my_tuple_dict[(1, 2)])
3
This would throw an error:
= {
my_list_dict 1, 2]: 3,
[4, 5]: 6
[ }
= list('alphabet')
my_list = len(my_list)
n for i in range(n):
= my_list[i]
value print(f"Value at index {i}: {value}")
Value at index 0: a
Value at index 1: l
Value at index 2: p
Value at index 3: h
Value at index 4: a
Value at index 5: b
Value at index 6: e
Value at index 7: t
for value in my_list:
print(f"Value: {value}")
Value: a
Value: l
Value: p
Value: h
Value: a
Value: b
Value: e
Value: t
The enumerate()
function returns a tuple of the index and the value at that index. This is equivalent to the indexed loop but ‘nicer’ (no need to explicitly extract the value and index).
for index, value in enumerate(my_list):
print(f"Value at index {index}: {value}")
Value at index 0: a
Value at index 1: l
Value at index 2: p
Value at index 3: h
Value at index 4: a
Value at index 5: b
Value at index 6: e
Value at index 7: t
The while
loop is useful in some special cases (e.g., growing lists - although, this can be dangerous).
from random import random
= list(range(10))
my_list = len(my_list)
n = 0
i while i < len(my_list):
= my_list[i]
value if random() < 1 / (i - value + 1.5):
my_list.append(value)+= 1
i
print(my_list)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 5]
More typical is to use infinite loops with while True
and break
statements.
The same looping “techniques” work for sets and tuples.
= {1, 2, 3}
my_set for i, value in enumerate(my_set):
print(f"Value: {value}")
Value: 1
Value: 2
Value: 3
= (1, 2, 3)
my_tuple for value in my_tuple:
print(f"Value: {value}")
Value: 1
Value: 2
Value: 3
Most common way to iterate over a dictionary is to use the items()
method, which will return (key, value) pairs.
import string
= {k: v for k, v in zip(string.ascii_letters, range(26)) if v < 12 and v % 2 == 0}
alphabet_dict for key, value in alphabet_dict.items():
print(f"The letter {key} is at position {value + 1} in the alphabet.")
The letter a is at position 1 in the alphabet.
The letter c is at position 3 in the alphabet.
The letter e is at position 5 in the alphabet.
The letter g is at position 7 in the alphabet.
The letter i is at position 9 in the alphabet.
The letter k is at position 11 in the alphabet.
There are also keys()
and values()
methods that return the keys and values respectively.
for key in alphabet_dict.keys():
print(f"The letter {key} is at position {alphabet_dict[key] + 1} in the alphabet.")
The letter a is at position 1 in the alphabet.
The letter c is at position 3 in the alphabet.
The letter e is at position 5 in the alphabet.
The letter g is at position 7 in the alphabet.
The letter i is at position 9 in the alphabet.
The letter k is at position 11 in the alphabet.
= []
positions for value in alphabet_dict.values():
+ 1)
positions.append(value print(f"The dictionary contains letters at these positions: {positions}")
The dictionary contains letters at these positions: [1, 3, 5, 7, 9, 11]
Simply “stack” lists inside of a list (nested list).
= [
matrix 1, 2, 3],
[4, 5, 6],
[7, 8, 9]
[
]print(matrix)
= [[j + i * 3 for j in range(3)] for i in range(3)]
also_matrix print(also_matrix)
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
[[0, 1, 2], [3, 4, 5], [6, 7, 8]]
matrix[row][column]
:
print(matrix[0][1]) # 0th row, 1st column
print(matrix[1][0]) # 1st row, 0th column
2
4
For each row in the matrix, loop through each element in the row.
for row in matrix:
for value in row:
print(value)
1
2
3
4
5
6
7
8
9
Enumerate the rows and columns.
for row_index, row in enumerate(matrix):
for column_index, value in enumerate(row):
print(f"Value at row {row_index} and column {column_index}: {value}")
Value at row 0 and column 0: 1
Value at row 0 and column 1: 2
Value at row 0 and column 2: 3
Value at row 1 and column 0: 4
Value at row 1 and column 1: 5
Value at row 1 and column 2: 6
Value at row 2 and column 0: 7
Value at row 2 and column 1: 8
Value at row 2 and column 2: 9
Assignment to a matrix:
for row_index, row in enumerate(matrix):
for column_index, value in enumerate(row):
= value**2
matrix[row_index][column_index] print(matrix)
[[1, 4, 9], [16, 25, 36], [49, 64, 81]]
Checking types of variables is very useful, especially in Python that allows dynamic typing. Many operations are defined for different data types. For example, 1 * 2
is fine, and so is [1] * 2
and "1" * 2
but they result in different outcomes. It is often important to assert that the variables are of the expected type (use if
or assert
).
The is
operator can be used to check if two objects are the same object.
= [1, 2, 3]
a = [1, 2, 3]
b = a
c = a[:]
d = a.copy()
e print(a is b) # the elements have the same value but `a` is not the same object as `b`
print(a is c) # `a` and `c` reference the same object
print(a is d) # `d` got the values from `a` but it is still a different object (different memory location)
print(a is e) # copy makes a new object with the same values (essentially, similar process to defining `d`)
False
True
False
False
It is not?
print(a is not b) # True
print(a is not c) # False
print(a is not d) # True
print(a is not e) # True
True
False
True
True
Some ‘objects’ are the same but it makes no sense to compare them using is
. Although, is
will get you a strict equality.
# this will get you a syntax warning
print(1 is 1.0) # False, because float != int
print(1.0 is 1.0) # True
print((1, 2, 3) is (1, 2, 3)) # Unexpectedly, True but also not advised; just use equality `==`
False
True
True
<>:2: SyntaxWarning:
"is" with a literal. Did you mean "=="?
<>:3: SyntaxWarning:
"is" with a literal. Did you mean "=="?
<>:4: SyntaxWarning:
"is" with a literal. Did you mean "=="?
<>:2: SyntaxWarning:
"is" with a literal. Did you mean "=="?
<>:3: SyntaxWarning:
"is" with a literal. Did you mean "=="?
<>:4: SyntaxWarning:
"is" with a literal. Did you mean "=="?
/tmp/ipykernel_2611439/951537762.py:2: SyntaxWarning:
"is" with a literal. Did you mean "=="?
/tmp/ipykernel_2611439/951537762.py:3: SyntaxWarning:
"is" with a literal. Did you mean "=="?
/tmp/ipykernel_2611439/951537762.py:4: SyntaxWarning:
"is" with a literal. Did you mean "=="?
For number type check, see below.
Use the type(<object>)
build-in function to get the type of an object.
print(type(1)) # <class 'int'>
print(type(1.0)) # <class 'float'>
= [1, 2, 3]
my_list print(type(my_list)) # <class 'list'>
print(type(tuple(my_list))) # <class 'list'>
<class 'int'>
<class 'float'>
<class 'list'>
<class 'tuple'>
The isinstance(<object>, <type>)
build-in function checks whether the object is an instance of the type.
print(isinstance([1, 2, 3], list)) # True
print(isinstance([1, 2, 3], tuple)) # False
print(isinstance(1, int)) # True
print(isinstance(1, float)) # False
True
False
True
False
This will become handy much later, when you will be working with classes. However, there is some use with basic datatypes.
= 3
my_integer print(issubclass(type(my_integer), int)) # True
print(issubclass(type(my_integer), float)) # False
print(issubclass(type(my_integer), (int, float))) # True
True
False
True
Potentially unsafe code:
def multiply_two_numbers_unchecked(a, b):
return a * b
print(multiply_two_numbers_unchecked(3, 2)) # 6 is fine
print(multiply_two_numbers_unchecked(3.0, 2)) # 6.0 is fine
print(multiply_two_numbers_unchecked("3", 2)) # 33 !?
6
6.0
33
Code with run-time type checking and assertions:
def multiply_two_numbers(a, b):
assert isinstance(a, (int, float)), f"'a' must be a number! But it was: {type(a)}"
assert isinstance(b, (int, float)), f"'b' must be a number! But it was: {type(b)}"
# Alternatively, this will also work:
assert issubclass(type(my_integer), (int, float)), f"'a' must be a number! But it was: {type(a)}"
return a * b
print(multiply_two_numbers(3, 2))
print(multiply_two_numbers(3.0, 2))
try:
print(multiply_two_numbers("3", 2)) # this will cause an error
except AssertionError as e:
print(e)
6
6.0
'a' must be a number! But it was: <class 'str'>
More info about basic types and comparison: https://docs.python.org/3/library/stdtypes.html
assert
is fine for testing and debugging, but not for production code! There are better, safer ways of handling incorrect inputs (try...except
+ warn
/log.error
and handle the erroneous state “gracefully”). Essentially, production code should never halt!!!
from typing import Union, Optional
def nice_multiply_two_numbers(a: Union[int, float], b: Union[int, float], raise_error: bool = True) -> Optional[Union[int, float]]:
if not isinstance(a, (int, float)):
if raise_error:
raise TypeError(f"'a' must be a number! But it was: {type(a)}")
else:
return None # does not have to be explicit
if not isinstance(b, (int, float)):
if raise_error:
raise TypeError(f"'b' must be a number! But it was: {type(b)}")
else:
return None # does not have to be explicit
return a * b
print("Using type error & try...except:")
try:
print(nice_multiply_two_numbers(3, 2))
print(nice_multiply_two_numbers(3.0, 2))
print(nice_multiply_two_numbers("3", 2)) # this will cause an error
except TypeError as e:
print(f"One of the inputs had an incorrect type. See the error: {e}")
print("\nUsing None return type:")
= nice_multiply_two_numbers("3", 2, raise_error=False)
result if result is None:
print("One of the inputs had an incorrect type.")
else:
print(result)
Using type error & try...except:
6
6.0
One of the inputs had an incorrect type. See the error: 'a' must be a number! But it was: <class 'str'>
Using None return type:
One of the inputs had an incorrect type.
Efficiency of algorithms matter.
Runtime of an algorithm typically depends on the size of the input. For example, time to loop once through a list will linearly increase with the size of the list. Thus, if accessing an element of a list takes 1ns, looping through a list of N elements will take N * 1ns. However, on a different computer, the element access time might be 10ns. Still, the loop will take N * 10ns. Therefore, we simply say the runtime ‘scales’ with N.
Searching for a specific item in a list requires looping through the list. However, the item might be at the beginning of the list, thus the search time will be 1 (x access time). If the idem is at the end of the list, the search time will be N. On average, we can expect the search time to be N/2. We are, however, typically interested in the worst-case scenario, and for simplicity, ignore any constants. Therefore, we simply say that the lookup time ‘scales’ with N, just as with the ‘full’ loop. (There is typically some overhead and randomness, so we are not interested in precise numbers.)
Asymptotic complexity is a way to describe the runtime of an algorithm as a function of the input size. We can use this to compare the performance of different algorithms - the efficiency of their implementation. For example, we can compare different sorting algorithms.
We can use the timeit
module to profile the runtime of an algorithm. Usage:
import timeit
def my_function():
= 0
s for i in range(10000):
+= i
s return s
= 10
iterations = timeit.Timer(stmt=my_function)
timer = timer.timeit(number=iterations) / iterations
avg_time print(f"Average run time: {avg_time}")
Average run time: 0.0017293294000410241
In Jupyter notebooks or IPython, we can use the %timeit
magic command instead:
%timeit my_function()
541 µs ± 39.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
This will run the function and time it.
There are also better, more advanced tools for profiling (e.g. cProfile
) but we will manage with timeit for now.
Pre-allocation of arrays is (slightly) faster than iterative appending. Although, in Python, both are relatively slow. Depending on the task, list comprehension may be more efficient. In general, if the appending overhead is insignificant, it will not have significant impact on runtime whether pre-allocation is used. However, with large loops it might cause memory issues and pre-allocation will be important with more efficient array implementations (e.g., NumPy arrays).
import numpy as np
def my_simple_function(x):
return x
def my_complex_function(x):
return np.sqrt(np.log(x + 1)**min(1e2, np.exp(np.log(x + 1e-10))**0.33))
def list_append(n, func):
= []
a for i in range(n):
a.append(func(i))return a
def list_preallocate(n, func):
= [0] * n
a for i in range(n):
= func(i)
a[i] return a
def list_comprehension(n, func):
return [func(i) for i in range(n)]
= 100
N print("Evaluating my_simple_function")
%timeit list_append(N, my_simple_function)
%timeit list_preallocate(N, my_simple_function)
%timeit list_comprehension(N, my_simple_function)
print("Evaluating my_complex_function")
%timeit list_append(N, my_complex_function)
%timeit list_preallocate(N, my_complex_function)
%timeit list_comprehension(N, my_complex_function)
Evaluating my_simple_function
12.4 µs ± 1.83 µs per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
9.06 µs ± 726 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
8.27 µs ± 513 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Evaluating my_complex_function
384 µs ± 19.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
375 µs ± 19.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
385 µs ± 20 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Python lists are fairly universal. However, they come with some overhead and thus are not always the best structure. In some cases, it is even worth to cast lists to sets or dictionaries.
= 100
N # Generate list of randomly shuffled squares of numbers
= (np.random.permutation(N)**2).tolist()
l
# Create a dictionary from the list
= {x: x**2 for x in l}
d
# Create a set from the list
= set(l)
s
# We want to find the index of a square of a number in the list
= int(N / 2)**2
number_of_interest print("Lookup time for list")
%timeit l.index(number_of_interest)
print("Lookup time for dictionary")
%timeit d.get(number_of_interest)
print("Lookup time for set")
%timeit s.intersection({number_of_interest})
print("---")
# Now we want to simply see if the number is in the list
print("Membership check time for list")
%timeit number_of_interest in l
print("Membership check time for dictionary")
%timeit number_of_interest in d
print("Membership check time for set")
%timeit number_of_interest in s
print("---")
print("'Fair' lookup time for dictionary from list")
%timeit {x: x**2 for x in l}.get(number_of_interest)
print(f"'Fair' membership check time for dictionary from list")
%timeit number_of_interest in {x: x**2 for x in l}
Lookup time for list
410 ns ± 40.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Lookup time for dictionary
92.8 ns ± 4.17 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
Lookup time for set
220 ns ± 20 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
---
Membership check time for list
190 ns ± 7.96 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
Membership check time for dictionary
67.2 ns ± 2.71 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
Membership check time for set
66.6 ns ± 4.58 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
---
'Fair' lookup time for dictionary from list
25.5 µs ± 605 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
'Fair' membership check time for dictionary from list
27.8 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)