Let’s calculate the Fibonacci sequence, for the step of the Fibonacci sequence called it is calculated as
This can be done recursively as follows:
def fibonacci_1(n: int) -> int: if n <= 0: return 0 if n == 1: return 1 return fibonacci(n-1) + fibonacci(n-2)
If we say is the run time of this algorithm on input , then for the base cases. From this we can inductively prove that
giving that , where the golden ratio. In other other words, this algorithm is exponential in .
This inefficiency is due to computing the same Fibonacci number multiple times.
There is another method to get around the inefficiencies of recursion called Memoization. This is where you still use recursion but you use a hash table to store previously computed values.
For the LIS problem, we calculate iteratively by using the following formula
i.e. find the longest increasing subsequence that started with a number before your current term such that the end number is less than your current term.
Back to the example: Longest increasing subsequence
We do this iteratively
5
7
4
-3
9
1
10
4
5
8
9
3
1
2
1
1
3
2
4
3
4
5
6
2
we got from
-
1
-
-
2
4
5
6
8
9
10
6
Therefore we work out that the LIS for this sequence is 6 as that is the maximum value of and the LIS must end with some number!
So lets write this as an algorithm.
def get_last_best_solution_less_than_current_value( past_sequence: list, past_solutions:list, value: int) -> int: feasible_solutions = [] for solution, past_value in zip(past_solutions, past_sequence): if past_value < value: feasible_solutions.append(solution) return max(feasible_solutions, default = 0)def length_of_the_longest_increasing_subsequence(sequence: list) -> int: lis_solutions = [] for index, value in enumerate(sequence): next_solution = 1 + get_last_best_solution_less_than_current_value( sequence[:index], lis_solutions, value ) lis_solutions.append(next_solution) return max(lis_solutions, default = 0)if __name__ == "__main__":
To calculate the Run time of solution lets break these sub-functions into parts $get\_last\_best\_solution\_less\_than\_current\_value$ goes through all past entries and run some checks, then it takes a max over that set of entries. This takes at most $O(2k)$ operations so,
$$O(get\_last\_best\_solution\_less\_than\_current\_value(k)) = O(k)$$
Now lets look at $length\_of\_the\_longest\_increasing\_subsequence(n)$ this loops through all $n$ elements running $get\_last\_best\_solution\_less\_than\_current\_value$ on a list of size $k$ where $k$ is the value of the element right now. Lastly it finds the max of a list of length $n$ which is $O(n)$, so:
$$\begin{align*}
O(length\_of\_the\_longest\_increasing\_subsequence(n)) & = \sum_{k=0}^{n-1} O(k) + O(n)\\
& = O((n-1)n/2) + O(n)\\
& = O(n^2)\end{align*}$$
giving the run time of this algorithm to be $O(n^2)$.
Largest common subsequence (LCS)
Given two sequences and the goals is to find the length of the longest sequence which is a subsequence of both and .
Largest common subsequence
Let
then the largest common subsequence is of length 4.
This is what the Unix command diff does to two files.
Design attempt for this problem (my solution)
Step 1: define subproblem in words
= the longest common substring with and using .
= the last position of the longest common substring with and at .
Step 2: define recursive relation
I think this would work but the lecturers solution is way better!
Design attempt for this problem (lecturers)
Step 1: define subproblem in words
Let = the longest common substring with and
Step 2: define recursive relation
Set for all .
Then set
Why does this work?
If you can always just add the / string to the longest common subproblem found in . Note that if a larger one existed for not using both and it must at least use one of them. However if this was the case we could switch to using both and and we have found a larger common substring for .
If then the largest common substring must use at most one of or . Therefore the solution can be found in or .
Then lets turn this into code.
from typing import List, Anydef _get_solution( first_value: int, second_value: int, left_solution: int, above_solution: int, diagonal_solution: int) -> int: """ Implements fill logic as described by L(i,j) above. """ if first_value == second_value: return 1 + diagonal_solution return max(left_solution, above_solution)def common_longest_subsequence( first_sequence: List[Any], second_sequence: List[Any]) -> int: # Set the (n+1, m+1) matrix to all 0's first. (Some strang) [ [0 for _ in range(len(second_sequence)+1)] for _ in range(len(first_sequence)+1) ] # This is used to track the max max_value = 0 for first_index in range(len(first_sequence)): for second_index in range(len(second_sequence)): solutions[first_index+1][second_index+1] = _get_solution( first_sequence[first_index], second_sequence[second_index], solutions[first_index][second_index+1], solutions[first_index+1][second_index], solutions[first_index][second_index] ) max_value = max( max_value, solutions[first_index+1][second_index+1] ) return max_valueif __name__ == "__main__": first_sequence = "BCDBCDA" second_sequence = "ABECBAB" solution = common_longest_subsequence(first_sequence, second_sequence) print( f"The length of longest common subsequence of {first_sequence} " f"and {second_sequence} is {solution}." )
The runtime of my algorithm can be broken down into 2 main steps. (For this I assume the sequences are of the same length ). Setting up the matrix which is operations and running the main for loop which has steps and uses operations in each step. Giving the run time to be .
Further practice problems
From Algorithms by S. Dasgupta, C. Papadimitriou, and U. Vazirani.
6.1 Contiguous Subsequence
A contiguous subsequence of a list is a subsequence made up of consecutive elements of .
For instance, if is
then 15, −30, 10 is a contiguous subsequence but 5, 15, 40 is not. Give a linear-time algorithm for the following task:
Input: A list of numbers, .
Output: The contiguous subsequence of maximum sum (a subsequence of length zero has sum zero).
For the preceding example, the answer would be 10, −5, 40, 10, with a sum of 55.
(Hint: For each , consider contiguous subsequence’s ending exactly at position .)
6.2 hotel stops
You are going on a long trip. You start on the road at mile post 0. Along the way there are hotels, at mile posts , where each is measured from the starting point. The only places you are allowed to stop are at these hotels, but you can choose which of the hotels you stop at. You must stop at the final hotel (at distance ), which is your destination.
You’d ideally like to travel 200 miles a day, but this may not be possible (depending on the spacing of the hotels). If you travel miles during a day, the penalty for that day is . You want to plan your trip so as to minimize the total penalty —that is, the sum, over all travel days, of the daily penalties.
Give an efficient algorithm that determines the optimal sequence of hotels at which to stop.
6.3 Yuckdonald's
Yuckdonald’s is considering opening a series of restaurants along Quaint Valley Highway (QVH). The possible locations are along a straight line, and the distances of these locations from the start of QVH are, in miles and in increasing order, . The constraints are as follows:
At each location, Yuckdonald’s may open at most one restaurant. The expected profit from opening a restaurant at location is , where and .
Any two restaurants should be at least miles apart, where is a positive integer.
Give an efficient algorithm to compute the maximum expected total profit subject to the given constraints.
6.4 String of words
You are given a string of characters , which you believe to be a corrupted text document in which all punctuation has vanished (so that it looks something like “itwasthebestoftimes…�). You wish to reconstruct the document using a dictionary, which is available in the form of a Boolean function for any string ,
Give a dynamic programming algorithm that determines whether the string can be reconstituted as a sequence of valid words. The running time should be at most , assuming calls to take unit time.
In the event that the string is valid, make your algorithm output the corresponding sequence of words.
6.11 Longest common substring) - altered
Given two strings and , we wish to find the length of their longest common substring, that is, the largest for which there are integers and with . Show how to do this, what is the time complexity of this?