Categories
Uncategorized

1c2 Intro to Computational Thinking and Data Science

1 – Optimization Problems

3 kinds of models:
Optimization models – start with an objective function to be minimized or maximized. This usually has a set of constraints.
Statistical models
Simulation models

Knapsack problem has 2 versions: the discrete version (0/1 version) where you take the item or you don’t and the continuous version where you can take fractions of items. The continuous version is boring; you use a greedy algorithm to take gold until the gold is gone or you run out of space, then repeat for silver, bronze, etc.
The discrete version is more complicated. If you use a brute force algorithm, it doesn’t work because running time is exponential.
The greedy algorithm asks what is the “best” item is and add that to the knapsack and repeat until you run out of space. This program’s main drawback in computational time is the sort. Python uses a version of the merge sort, so that happens in n log n time (log-linear time). Very efficient compared to brute force.

Lambda is used to create an anonymous function (it has no name). It builds a function to evaluate a sequence of identifiers and some expression, returning a function of n arguments. Professor Guttag maintains that lambda expressions should only be used if they fit on a single line. Otherwise write a function because they’re easier to debug.

Greedy algorithms optimize for a single variable, so while locally the best answer (local meaning for that variable), rarely globally the best answer. Professor Guttag drew two hills for a hypothetical hill-climbing algorithm. One hill is clearly taller than the other one. If the greedy algorithm never goes down, then the algorithm can get stuck on the lower hill. This optimizes locally (the local maximum) but not globally (the global maximum which is what was sought).

2 – Optimization Problems (cont.)

Pros of greedy algorithms:
→ easy to implement
→ computationally efficient (n log n time)
Cons of greedy algorithms:
→ does not always yield the best solution
→ you don’t know how good the approximation is (how close to optimal?)

Dynamic programming is meaningless name for a useful methodology. Store and recall answers that will be used more than once, a process known as memoization (like I created memo to remind myself of the answer).

Memoization on the recursive computing of the Fibonacci sequence. Note the use of a dictionary instead of a list.
def fastFib(n, memo = {}):
“””Assumes n is an int >= 0, memo used only by recursive calls
Returns Fibonacci of n”””
if n == 0 or n == 1:
return 1
try:
return memo[n]
except KeyError:
result = fastFib(n-1, memo) +\
fastFib(n-2, memo)
memo[n] = result
return result

Memoization works if you have two conditions:
→ Optimal substructure: a globally optimal solution can be found by combining optimal solutions to local subproblems.
→ Overlapping subproblems: finding an optimal solution involves solving the same problem multiple times.

Modifying maxVal (a program that attempts to get the most calories under a specific limit, a version of the knapsack problem) to use a memo:
→ add memo as a third argument
↑ def fastMaxVal(toConsider, avail, memo = {}):
→ key of memo is a tuple
↑ (items left to be considered, available weight)
↑ items left to be considered represented by len(toConsider)
→ first thing the body of function does is check whether the optimal choice of items given the available weight is already in the memo
→ last thing the body of the function does is update the memo

Comparing the runtimes of the two maxVal programs (the second with dynamic programming) makes the first look exponential and the second polynomial. If the problem is exponential, how can we get polynomial running times? Computational complexity can be subtle.

Many problems of practical importance can be formulated as optimization problems. Greedy algorithms often provide adequate (maybe not optimal) solutions. Finding an optimal solution is usually exponentially hard. Dynamic programming often yields good performance for a subclass of optimization problems–those with optimal substructure and overlapping subproblems–where the solution is always correct and the algorithm is fast in the right circumstances.

3 – Graph-theoretic Models

This lecture is given by Dr. Eric Grimson instead of Dr. John Guttag.

Graphs in computation have nodes (vertices) and edges (arcs) connecting those nodes. Edges can be directed or undirected (directed edges are arrows) and they can be weighted or unweighted (think Markov chains). A digraph is a directed graph, the edges pass only in one direction.

Trees are a special case of a graph in which any pair of nodes is connected by a single path.

Common graph problems:
→ Is there a path from A to B? (A sequence of edges and nodes.)
→ What is the least expensive path?
→ Graph Partition Problem – how do you separate a graph into unconnected connected elements?
→ Min-cut Max-flow Problem – is there an efficient way to separate out the highly connected elements, the things that interact a lot and separate out how many of those kinds of subgraphs are inside of my graph?

Class Node:
class Node(object):
def __init__(self, name):
“””Assumes name is a string”””
self.name = name
def getName(self):
return self.name
def __str__(self):
return self.name

Class Edge:
class Edge(object):
def __init__(self, src, dest):
“””Assumes src and dest are nodes”””
self.src = src
self.dest = dest
def getSource(self):
return self.src
def getDestination(self):
return self.dest
def __str__(self):
return self.src.getName() + ‘→’\
+ self.dest.getName()

A methodology known as an adjacency list stores the graph information by listing each node’s possible destinations using a single edge.

Class Digraph
class Digraph(object):
“””edges is a dict mapping each node to a list of its children”””

def __init__(self):
self.edges = {}
def addNode(self, node):
if node in self.edges:
raise ValueError(‘Duplicate node’)
else:
self.edges[node] = []
def addEdge(self, edge):
src = edge.getSource()
dest = edge.getDestination()
if not (src in self.edges and dest in self.edges):
raise ValueError(‘Node not in graph’)
self.edges[src].append(dest)
def childrenOf(self, node):
return self.edges[node]
def hasNode(self, node):
return node in self.edges
def getNode(self, name):
for n in self.edges:
if n.getName() == name:
return n
raise NameError(name)
def __str__(self):
result = ”
for src in self.edges:
for dest in self.edges[src]:
result = result + src.getName() + ‘→’\
+ dest.getName() + ‘\n’
return result[:-1] #omit final newline

Class Graph
class Graph(Digraph):
def addEdge(self, edge):
Digraph.addEdge(self, edge)
rev = Edge(edge.getDestination(), edge.getSource())
Digraph.addEdge(self, rev)

Adjacency list of an example graph:
Boston: Providence, New York
Providence: Boston, New York
New York: Chicago
Chicago: Denver, Phoenix
Denver: Phoenix, New York
Los Angeles: Boston
Phoenix [nothing]

Leave a Reply

Your email address will not be published. Required fields are marked *