Added lesson 2

2686ac85 · Škoviera, Radoslav, Mgr., Ph.D. · 393da30d · 2686ac85
Commit 2686ac85 authored 1 month ago by Škoviera, Radoslav, Mgr., Ph.D.
--- a/src/pge_lectures/lecture_02/l2_matrices_processing.qmd
+++ b/src/pge_lectures/lecture_02/l2_matrices_processing.qmd
+# 2D arrays (matrices), array processing (search, cumulative sum)
+---
+title: "Lecture 2"
+format:
+  html:
+    code-fold: false
+jupyter: python3
+---
+
+## Matrices
+
+Matrices are 2D arrays. They have two dimensions: rows and columns. The number of rows is often called the height and the number of columns is often called the width.
+This is similar to image, although, images have typically flipped dimensions: x-axis is width (columns) and y-axis is height (rows).
+
+```{python}
+m = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
+print(m)
+also_m = [
+    [1, 2, 3],
+    [4, 5, 6],
+    [7, 8, 9]
+]
+print(also_m)
+m_as_well = [[j + i * 3 for j in range(1, 4)] for i in range(3)]
+print(m_as_well)
+```
+
+Creating a matrix in a loop (the following two functions are equivalent):
+```{python}
+def create_counting_matrix(height, width):
+    return [[j + i * width for j in range(1, width + 1)] for i in range(height)]
+
+def create_counting_matrix_unpacked(height, width):
+    mat = []
+    for i in range(height):
+        row = []
+        for j in range(1, width + 1):
+            row.append(j + i * width)
+        mat.append(row)
+    return mat
+
+print(create_counting_matrix(3, 4))
+```
+
+Matrix pre-allocation:
+```{python}
+M, N = 3, 4
+mat = [[0] * N for _ in range(M)]
+print(mat)
+```
+
+### Filling matrices with values
+
+
+
+### Matrix operations
+
+#### Addition
+
+#### Multiplication
+
+#### Transposition
+
+### Miscellaneous operations on matrices
+
+#### Comparison
+
+#### All different
+
+### "Graphical" matrices (images)
+
+Raw images are basically matrices, where each element represents an intensity value of the corresponding pixel.
+Color images are 3D matrices, where each element is a 3-tuple of intensity values of the color channels but we will not go there.
+
+However, we can have a little fun with gray-scale images.
+```{python}
+img = [
+    [100, 100, 100, 100, 100, 100, 100, 100, 100, 100],
+    [100,   0 ,  0 ,  0,   0 ,  0 ,  0,   0,   0, 100],
+    [  0,   0, 128, 128,   0,   0, 128, 128,   0,   0],
+    [  0,   0, 255, 255,   0,   0, 255, 255,   0,   0],
+    [  0,   0, 255, 255,   0,   0, 255, 255,   0,   0],
+    [  0,   0, 128, 128,   0,   0, 128, 128,   0,   0],
+    [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
+    [  0,   0,   0,   0,   0, 200,   0,   0,   0,   0],
+    [  0,   0,   0,   0,   0, 200,   0,   0,   0,   0],
+    [  0,   0,   0,   0, 200, 200,   0,   0,   0,   0],
+    [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
+    [  0, 255,   0,   0,   0,   0,   0,   0, 255,   0],
+    [  0, 255, 255, 255, 255, 255, 255, 255, 255,   0],
+    [  0,   0, 255, 128, 128, 128, 128, 255,   0,   0],
+    [  0,   0,   0, 255, 255, 255, 255,   0,   0,   0],
+    [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
+    [  0,   0,   0,   0, 100, 100,   0,   0,   0,   0],
+]
+
+print(img)
+
+from matplotlib import pyplot as plt
+
+plt.imshow(img, cmap='gray')
+plt.axis('off')
+plt.show()
+```
+
+
+## Array & matrix processing
+
+### Visualizing (printing) arrays
+
+For debugging purposes, it is useful to see what is in an array. Smaller arrays are easy to print "whole":
+
+```{python}
+import numpy as np
+a = np.random.permutation(10)
+print(a)
+```
+
+You can use the `join` method of strings to make the output nicer. More on that later, when we get to strings.
+
+```{python}
+def pretty_print_array(a):
+    print('[' + ', '.join([str(x) for x in a]) + ']')
+
+pretty_print_array(a)
+```
+
+For larger arrays, you need to be creative - the solution will depend on what you need. For example, it might be enough to print the first or last few items.
+Print the array as rows (Python kinda does that but "uncontrollably"):
+```{python}
+def print_as_rows(a, row_length=10):
+    for i in range(0, len(a), row_length):
+        print(a[i:i+row_length])
+
+a = np.random.permutation(100)
+print_as_rows(a)
+```
+
+You can even improve it to have nicer output. Again, more on that next time, when we get to strings.
+```{python}
+def pretty_print_as_rows(a, row_length=10):
+    print('[')
+    for i in range(0, len(a), row_length):
+        print(', '.join([str(x) for x in a[i:i+row_length]]), end=',\n')
+    print(']')
+
+a = np.random.permutation(100).tolist()
+pretty_print_as_rows(a)
+```
+
+#### Printing matrices:
+
+(remember to define/run cell with the `create_counting_matrix` and `pretty_print` functions above)
+
+```{python}
+def print_matrix(m):
+    print('[', end='')
+    n_rows = len(m)
+    for ri, row in enumerate(m):
+        print(row, end=',\n' if ri < n_rows - 1 else '')
+    print(']')
+
+def formatted_print_matrix(m):
+    print('[', end='')
+    n_rows = len(m)
+    max_digits =  int(np.ceil(np.log10(max([max(row) for row in m]))))
+    for ri, row in enumerate(m):
+        row_prefix = ('' if ri == 0 else ' ') + '['
+        row_numbers = ', '.join([f"{x:{max_digits}d}" for x in row])
+        row_end=',\n' if ri < n_rows - 1 else ']'
+        print(row_prefix + row_numbers, end=row_end)
+    print(']')
+
+mat = create_counting_matrix(11, 17)
+print("Row-by-row print:")
+print_matrix(mat)
+print()
+print("Formatted print:")
+formatted_print_matrix(mat)
+```
+
+
+### Search
+
+#### Linear search
+
+Requires iterating over the array, asymptotic runtime is $O(n)$.
+
+```{python}
+import numpy as np
+N = 100
+a = np.random.permutation(N)
+searched_item = 50
+
+def linear_search(a, x):
+    for i in range(len(a)):
+        if a[i] == x:
+            return i
+    return None
+
+print("Linear search run time:")
+%timeit linear_search(a, searched_item)
+```
+
+##### Computational complexity side-quest
+
+Even though the 'asymptotic' time of linear search is $O(n)$,
+the 'actual' time depends on the 'properties' of the array - where the item is located in the array.
+
+```{python}
+N = 100
+a = np.random.permutation(N)
+searched_item = a[N // 2]
+print("Searching for an item in the middle of the array:")
+%timeit linear_search(a, searched_item)
+
+searched_item = a[0]
+print("Searching for an item at the beginning of the array:")
+%timeit linear_search(a, searched_item)
+
+searched_item = a[-1]
+print("Searching for an item at the end of the array:")
+%timeit linear_search(a, searched_item)
+```
+
+#### Binary search
+
+Requires pre-sorted array but asymptotic runtime is $O(\log n)$ - we only look which "half" of the array might contain the item.
+
+```{python}
+def binary_search_loop(a, x):
+    low = 0  # lower-bound index for search
+    high = len(a) - 1  # upper-bound index for search
+    loop_count = 1
+    while low <= high:  # while there are indices to search
+        mid = (low + high) // 2  # compute the middle index
+        if a[mid] < x:  # if the middle point is less than the item
+            low = mid + 1  # "discard" the left half (search from the middle + 1)
+        elif a[mid] > x:  # if the middle point is greater than the item
+            high = mid - 1  # "discard" the right half (search from the middle - 1)
+        else:  # otherwise, the middle point is the item
+            return mid, loop_count
+        loop_count += 1
+    return None, loop_count  # we did not find the item
+
+def binary_search_recursion(a, x):
+    if len(a) == 0:  # we got an empty array
+        return None  # the item is for sure not in it
+    mid_idx = len(a) // 2  # compute the middle point index
+    if a[mid_idx] == x:  # if the middle point is the item
+        return mid_idx  # we found it, job done
+    elif a[mid_idx] < x:  # if the middle point is less than the item
+        # we look at the right half of the array
+        return binary_search_recursion(a[mid_idx + 1:], x)
+    else:
+        # otherwise, we look at the left half of the array
+        return binary_search_recursion(a[:mid_idx], x)
+
+# recursion is slower in this case, as we are copying the whole array, instead of just using the index.
+sa = np.sort(a)
+print("Binary search run time:")
+%timeit binary_search_loop(sa, 5)
+%timeit binary_search_recursion(sa, 5)
+
+print("Buuuuuut...")
+print("'Fair' binary search run time (includes sorting):")
+%timeit binary_search_loop(np.sort(a), 5)
+%timeit binary_search_recursion(np.sort(a), 5)
+```
+
+The advantage of binary search is relatively constant search time. As you can see from the following,
+the search time is shorter than "average" search time for linear search: $O(\log n) < O(n)$.
+This includes even sorting the array.
+```{python}
+print("Searching for an item in the middle of the array, using binary search:")
+searched_item = sa[N // 2]
+%timeit binary_search_loop(np.sort(a), searched_item)
+
+print("Searching for an item at the beginning of the array, using binary search:")
+searched_item = sa[0]
+%timeit binary_search_loop(np.sort(a), searched_item)
+
+print("Searching for an item at the end of the array, using binary search:")
+searched_item = sa[-1]
+%timeit binary_search_loop(np.sort(a), searched_item)
+```
+
+#### Interpolation search
+
+Even more restrictive requirements than binary search - the array needs to be 'uniformly' sorted.
+
+```{python}
+def interpolation_search(a, x):
+    low = 0  # lower-bound index for search
+    high = len(a) - 1  # upper-bound index for search
+    loop_count = 1
+    while low <= high and a[low] <= x <= a[high]:  # while there are indices to search
+        mid = low + int(((x - a[low]) * (high - low)) / (a[high] - a[low]))
+        if a[mid] < x:  # if the middle point is less than the item
+            low = mid + 1  # "discard" the left half (search from the middle + 1)
+        elif a[mid] > x:  # if the middle point is greater than the item
+            high = mid - 1  # "discard" the right half (search from the middle - 1)
+        else:  # otherwise, the middle point is the item
+            return mid, loop_count
+        loop_count += 1
+    return None, loop_count  # we did not find the item
+
+print("Searching for an item in the middle of the array, using interpolation search:")
+searched_item = sa[N // 2]
+%timeit interpolation_search(np.sort(a), searched_item)
+
+print("Searching for an item at the beginning of the array, using interpolation search:")
+searched_item = sa[0]
+%timeit interpolation_search(np.sort(a), searched_item)
+
+print("Searching for an item at the end of the array, using interpolation search:")
+searched_item = sa[-1]
+%timeit interpolation_search(np.sort(a), searched_item)
+```
+
+The awesome speed of interpolation search:
+
+```{python}
+a = np.random.permutation(N)
+sa = np.sort(a)
+searched_item = sa[N // 2 + 1]
+print("Interpolation search run time:")
+%timeit interpolation_search(sa, searched_item)
+v, i_loop_count = interpolation_search(sa, searched_item)
+print(f"Interpolation search 'step' count: {i_loop_count}")
+
+print("Binary search run time:")
+%timeit binary_search_loop(sa, searched_item)
+b, b_loop_count = binary_search_loop(sa, searched_item)
+print(f"Binary search 'step' count: {b_loop_count}")
+```
+
+Unfortunately, the "single step" search only works if the array is completely uniform.
+
+```{python}
+a = np.random.choice(N**3, size=N, replace=False) / 100  # possibly non-uniform array
+sa = np.sort(a)
+searched_item = sa[N // 2 + 1]
+
+print("Interpolation search run time:")
+%timeit interpolation_search(sa, searched_item)
+v, i_loop_count = interpolation_search(sa, searched_item)
+print(f"Interpolation search 'step' count: {i_loop_count}")
+print("Binary search run time:")
+%timeit binary_search_loop(sa, searched_item)
+v, b_loop_count = binary_search_loop(sa, searched_item)
+print(f"Binary search 'step' count: {b_loop_count}")
+```
+
+#### Find all
+
+The search finds the first occurrence of the searched item. Which one will depend on the search method. Sometimes, it is required to find all occurrences of the searched item in the array.
+
+```{python}
+import numpy as np
+def find_all(a, x):
+    indices = []
+    for i in range(len(a)):
+        if a[i] == x:
+            indices.append(i)
+    return indices
+
+N = 100
+a = np.random.choice(N**3, size=N, replace=True)
+searched_item = a[N // 2]
+occurrences = find_all(a, searched_item)
+print("The value", searched_item, "occurs", len(occurrences), "times in the array at indices", occurrences)
+```
+
+
+### 'Statistical' computations on arrays
+
+#### Simple operations
+
+##### Minimum, maximum
+
+Python has built-in `min()` and `max()` functions. Here, we wil see simple implementation of these functions,
+simply to practice working with arrays. In practice, you can use the built-in functions.
+
+```{python}
+import numpy as np
+my_list = np.random.permutation(10).tolist()
+print("Input array:", my_list)
+
+def minimum(a):
+    min_value = a[0]
+    for i in range(1, len(a)):
+        if a[i] < min_value:
+            min_value = a[i]
+        # Alternatively:
+        # min_value = min(min_value, a[i])
+    return min_value
+
+def maximum(a):
+    max_value = a[0]
+    for i in range(1, len(a)):
+        if a[i] > max_value:
+            max_value = a[i]
+        # Alternatively:
+        # max_value = max(max_value, a[i])
+    return max_value
+
+print("Minimum: ", minimum(my_list))
+print("Maximum: ", maximum(my_list))
+```
+
+##### Mean
+
+First, we need to compute the sum:
+
+```{python}
+import numpy as np
+my_list = np.random.permutation(10).tolist()
+print("Input array:", my_list)
+
+def sum(a):
+    s = 0
+    for i in range(len(a)):
+        s += a[i]
+    return s
+
+print("Sum: ", sum(my_list))
+```
+
+Mean is then simply the sum divided by the number of items:
+
+```{python}
+def mean(a):
+    return sum(a) / len(a)
+
+print("Mean: ", mean(my_list))
+```
+
+
+#### Cumulative sum (prefix sum)
+
+Summing items between two indices is useful for many tasks.
+
+```{python}
+import numpy as np
+temp_measurements = (np.random.rand(200) * 10).tolist()
+print("Input array: " + ', '.join([f"{x:.2f}" for x in temp_measurements[:10]]) + ", ...")
+
+def range_sum(a, start, end):
+    s = 0
+    for i in range(start, end):
+        s += a[i]
+    return s
+
+print("Sum between indices 5 and 15:", range_sum(temp_measurements, 5, 15))
+```
+
+This is fine, if we need to to this once, but what if we need to do it many times?
+
+```{python}
+print("Summing once run time:")
+%timeit range_sum(temp_measurements, 5, 15)
+print("Summing 100 times run time:")
+%timeit [range_sum(temp_measurements, 5, 15) for _ in range(100)]
+```
+
+Is there a better way? Why yes! We will use **cumulative sum** (also known as prefix sum).
+It is done bone adding all the previous values to the current value in the array.
+
+```{python}
+def cumulative_sum(a):
+    for i in range(1, len(a)):
+        a[i] += a[i - 1]
+    return a
+my_list = list(range(10))
+print("Cumulative sum of ordered numbers from 0 to 9: ")
+print("Input array:   [" + ', '.join([f"{x:2d}" for x in my_list]) + "]")
+print("Cumulative sum: " + str(cumulative_sum(my_list)))
+print()
+my_list = np.random.permutation(10).tolist()
+print("Cumulative sum of randomly shuffled numbers: ")
+print("Input array:   [" + ', '.join([f"{x:2d}" for x in my_list]) + "]")
+print("Cumulative sum: " + str(cumulative_sum(my_list)))
+```
+
+### Integral image (summed-area table, cumulative sum in 2D)
\ No newline at end of file