Categories

# 1c2 Intro to Computational Thinking and Data Science

## 1 – Optimization Problems

3 kinds of models:
Optimization models – start with an objective function to be minimized or maximized. This usually has a set of constraints.
Statistical models
Simulation models

Knapsack problem has 2 versions: the discrete version (0/1 version) where you take the item or you don’t and the continuous version where you can take fractions of items. The continuous version is boring; you use a greedy algorithm to take gold until the gold is gone or you run out of space, then repeat for silver, bronze, etc.
The discrete version is more complicated. If you use a brute force algorithm, it doesn’t work because running time is exponential.
The greedy algorithm asks what is the “best” item is and add that to the knapsack and repeat until you run out of space. This program’s main drawback in computational time is the sort. Python uses a version of the merge sort, so that happens in n log n time (log-linear time). Very efficient compared to brute force.

Lambda is used to create an anonymous function (it has no name). It builds a function to evaluate a sequence of identifiers and some expression, returning a function of n arguments. Professor Guttag maintains that lambda expressions should only be used if they fit on a single line. Otherwise write a function because they’re easier to debug.

Greedy algorithms optimize for a single variable, so while locally the best answer (local meaning for that variable), rarely globally the best answer. Professor Guttag drew two hills for a hypothetical hill-climbing algorithm. One hill is clearly taller than the other one. If the greedy algorithm never goes down, then the algorithm can get stuck on the lower hill. This optimizes locally (the local maximum) but not globally (the global maximum which is what was sought).

## 2 – Optimization Problems (cont.)

Pros of greedy algorithms:
→ easy to implement
→ computationally efficient (n log n time)
Cons of greedy algorithms:
→ does not always yield the best solution
→ you don’t know how good the approximation is (how close to optimal?)

Dynamic programming is meaningless name for a useful methodology. Store and recall answers that will be used more than once, a process known as memoization (like I created memo to remind myself of the answer).

Memoization on the recursive computing of the Fibonacci sequence. Note the use of a dictionary instead of a list.
def fastFib(n, memo = {}):
“””Assumes n is an int >= 0, memo used only by recursive calls
Returns Fibonacci of n”””
if n == 0 or n == 1:
return 1
try:
return memo[n]
except KeyError:
result = fastFib(n-1, memo) +\
fastFib(n-2, memo)
memo[n] = result
return result

Memoization works if you have two conditions:
→ Optimal substructure: a globally optimal solution can be found by combining optimal solutions to local subproblems.
→ Overlapping subproblems: finding an optimal solution involves solving the same problem multiple times.

Modifying maxVal (a program that attempts to get the most calories under a specific limit, a version of the knapsack problem) to use a memo:
→ add memo as a third argument
↑ def fastMaxVal(toConsider, avail, memo = {}):
→ key of memo is a tuple
↑ (items left to be considered, available weight)
↑ items left to be considered represented by len(toConsider)
→ first thing the body of function does is check whether the optimal choice of items given the available weight is already in the memo
→ last thing the body of the function does is update the memo

Comparing the runtimes of the two maxVal programs (the second with dynamic programming) makes the first look exponential and the second polynomial. If the problem is exponential, how can we get polynomial running times? Computational complexity can be subtle.

Many problems of practical importance can be formulated as optimization problems. Greedy algorithms often provide adequate (maybe not optimal) solutions. Finding an optimal solution is usually exponentially hard. Dynamic programming often yields good performance for a subclass of optimization problems–those with optimal substructure and overlapping subproblems–where the solution is always correct and the algorithm is fast in the right circumstances.

## 3 – Graph-theoretic Models

This lecture is given by Dr. Eric Grimson instead of Dr. John Guttag.

Graphs in computation have nodes (vertices) and edges (arcs) connecting those nodes. Edges can be directed or undirected (directed edges are arrows) and they can be weighted or unweighted (think Markov chains). A digraph is a directed graph, the edges pass only in one direction.

Trees are a special case of a graph in which any pair of nodes is connected by a single path.

Common graph problems:
→ Is there a path from A to B? (A sequence of edges and nodes.)
→ What is the least expensive path?
→ Graph Partition Problem – how do you separate a graph into unconnected connected elements?
→ Min-cut Max-flow Problem – is there an efficient way to separate out the highly connected elements, the things that interact a lot and separate out how many of those kinds of subgraphs are inside of my graph?

Class Node:
class Node(object):
def __init__(self, name):
“””Assumes name is a string”””
self.name = name
def getName(self):
return self.name
def __str__(self):
return self.name

Class Edge:
class Edge(object):
def __init__(self, src, dest):
“””Assumes src and dest are nodes”””
self.src = src
self.dest = dest
def getSource(self):
return self.src
def getDestination(self):
return self.dest
def __str__(self):
return self.src.getName() + ‘→’\
+ self.dest.getName()

A methodology known as an adjacency list stores the graph information by listing each node’s possible destinations using a single edge.

Class Digraph
class Digraph(object):
“””edges is a dict mapping each node to a list of its children”””

def __init__(self):
self.edges = {}
if node in self.edges:
raise ValueError(‘Duplicate node’)
else:
self.edges[node] = []
src = edge.getSource()
dest = edge.getDestination()
if not (src in self.edges and dest in self.edges):
raise ValueError(‘Node not in graph’)
self.edges[src].append(dest)
def childrenOf(self, node):
return self.edges[node]
def hasNode(self, node):
return node in self.edges
def getNode(self, name):
for n in self.edges:
if n.getName() == name:
return n
raise NameError(name)
def __str__(self):
result = ”
for src in self.edges:
for dest in self.edges[src]:
result = result + src.getName() + ‘→’\
+ dest.getName() + ‘\n’
return result[:-1] #omit final newline

Class Graph
class Graph(Digraph):
rev = Edge(edge.getDestination(), edge.getSource())

Adjacency list of an example graph:
Boston: Providence, New York
Providence: Boston, New York
New York: Chicago
Chicago: Denver, Phoenix
Denver: Phoenix, New York
Los Angeles: Boston
Phoenix [nothing]

Dr. Grimson proceeds to put that graph into code:

def buildCityGraph(graphType):
g = graphType()
for name in (‘Boston’, ‘Providence’, ‘New York’, ‘Chicago’, ‘Denver’,
‘Phoenix’, ‘Los Angeles’): # Create 7 nodes
return g

Depth-first search.
Unlike binary trees, graphs can contain loops. To prevent an endless loop, this algorithm will keep track of its path and not repeat a leg of the trip.
Finding a path can be broken down into finding various intermediate waypoints. Thus this can be solved recursively.
→ Look at origin node.
→ Look at all edges leaving the origin node, in some order.
→ Follow the first edge. Check to see if you’re at the destination, if you are, return path because you’re done.
→→ If not, repeat the process from new node.
→ Continue until either find goal node, or run out of options.
→→ When out of options, backtrack to the previous node and try the next edge, repeating this process.

def DFS(graph, start, end, path, shortest, toPring = False):
path = path + [start]
if toPrint:
print(‘Current DFS path:’, printPath(path))
if start == end:
return path
for node in graph.childrenOf(start):
if node not in path: # avoid cycles
if shortest == None or len(path) < len(shortest):
newPath = DFS(graph, node, end, path, shortest, toPrint)
if newPath != None:
shortest = newPath
elif toPrint:
return shortest

def shortestPath(graph, start, end, toPrint = False):
return DFS(graph, start, end, [], None, toPrint)

The second function here is a wrapper function. It gets the recursion started properly (it has an empty list). It provides appropriate abstraction. To test these:

def testSP(source, destination):
g = buildCityGraph(DiGraph)
sp = shortestPath(g, g.getNode(source), g.getNode(destination),
toPrint = True)
if sp != None:
print(‘Shortest path from’, source, ‘to’, destination, ‘is’,
printPath(sp))
else:
print(‘There is no path from’, source, ‘to’, destination)

testSP(‘Boston’, ‘Chicago’)

It’s like depth first search, except sideways instead of down. You always stop if you find a path to the destination because that’s always a shortest path.

def BFS(graph, start, end, toPrint = False):
initPath = [start]
pathQueue = [initPath]
while len(pathQueue) != 0:
#Get and remove oldest element in pathQueue
tmpPath = pathQueue.pop(0)
if toPrint:
print(‘Current BFS path:’, printPath(tmpPath))
if lastNode == end:
return tmpPath
for nextNode in graph.childrenOf(lastNode):
if nextNode not in tmpPath:
newPath = tmpPath + [nextNode]
pathQueue.append(newPath)
return None

Weighted Shortest Path.
This form of search assumes a cost for each leg of the journey.
Breadth first search doesn’t yield to modification for weighted shortest path because the longest leg taken first could result in the shortest path (all possibilities should be considered). So if you want weighted shortest path, modify depth first search.

Summary
Graphs are a collection of entities and connections between those entities. Searches are a means of finding a path between entities.