⇡
We have seen that procedures are, in effect, abstractions that describe compound operations on numbers independent of the particular numbers. For example, when we
def cube(x):
return x * x * x
we are not talking about the cube of a particular number, but rather about a method for obtaining the cube of any number. Of course we could get along without ever defining this procedure, by always writing expressions such as
>>> 3 * 3 * 3
>>> x * x * x
>>> y * y * y
and never mentioning cube explicitly. This would place us at a serious
disadvantage, forcing us to work always at the level of the particular
operations that happen to be primitives in the language (multiplication, in
this case) rather than in terms of higher-level operations. Our programs would
be able to compute cubes, but our language would lack the ability to express
the concept of cubing. One of the things we should demand from a powerful
programming language is the ability to build abstractions by assigning names to
common patterns and then to work in terms of the abstractions directly.
Procedures provide this ability. This is why all but the most primitive
programming languages include mechanisms for defining procedures.
Yet even in numerical processing we will be severely limited in our ability to create abstractions if we are restricted to procedures whose parameters must be numbers. Often the same programming pattern will be used with a number of different procedures. To express such patterns as concepts, we will need to construct procedures that can accept procedures as arguments or return procedures as values. Procedures that manipulate procedures are called
higher-order procedures. This section shows how higher-order procedures can serve as powerful abstraction mechanisms, vastly increasing the expressive power of our language.
Consider the following three procedures. The first computes the sum of the
integers from a through b:
def sum_integers(a, b):
if a > b:
return 0
else:
return a + sum_integers(a + 1, b)
The second computes the sum of the cubes of the integers in the given range:
def sum_cubes(a, b):
if a > b:
return 0
else:
return cube(a) + sum_cubes(a + 1, b)
The third computes the sum of a sequence of terms in the series
$$\frac{1}{1 ⋅ 3} + \frac{1}{5 ⋅ 7} + \frac{1}{9 ⋅ 11} + … ,$$
which converges to $\pi/8$ (very slowly):
def pi_sum(a, b):
if a > b:
return 0.0
return 1.0 / (a * (a + 2)) + pi_sum(a + 4, b)
These three procedures clearly share a common underlying pattern. They are for
the most part identical, differing only in the name of the procedure, the
function of a used to compute the term to be added, and the function
that provides the next value of a. We could generate each of the
procedures by filling in slots in the same template:
def name(a, b):
if a > b:
return 0
return term(a) + name(next(a), b)
The presence of such a common pattern is strong evidence that there is a useful abstraction waiting to be brought to the surface. Indeed, mathematicians long ago identified the abstraction of summation of a series and invented “sigma notation,” for example
$$∑ n = abf(n)\; = \;f(a) + \cdots + f(b),$$
to express this concept. The power of sigma notation is that it allows mathematicians to deal with the concept of summation itself rather than only with particular sums—for example, to formulate general results about sums that are independent of the particular series being summed.
Similarly, as program designers, we would like our language to be powerful enough so that we can write a procedure that expresses the concept of summation itself rather than only procedures that compute particular sums. We can do so readily in our procedural language by taking the common template shown above and transforming the “slots” into formal parameters:
def sum(term, a, next, b):
if a > b:
return 0
else:
return term(a) + sum(term, next(a), next, b)
Notice that sum takes as its arguments the lower and upper bounds
a and b together with the procedures term and next.
We can use sum just as we would any procedure. For example, we can use
it (along with a procedure inc that increments its argument by 1) to
define sum-cubes:
def inc(n):
return n + 1
def sum_cubes(a, b):
return sum(cube, a, inc, b)
Using this, we can compute the sum of the cubes of the integers from 1 to 10:
def sum_cubes(a, b):
# sum of cubes from a to b inclusive
def cube(x):
return x * x * x
def iter(a_, result):
if a_ > b:
return result
return iter(a_ + 1, result + cube(a_))
return iter(a, 0)
>>> sum_cubes(1, 10)
3025
With the aid of an identity procedure to compute the term, we can define
sum-integers in terms of sum:
def identity(x):
return x
def sum_integers(a, b):
return sum(identity, a, inc, b)
Then we can add up the integers from 1 to 10:
def sum_integers(a, b):
if a > b:
return 0
else:
return a + sum_integers(a + 1, b)
>>> sum_integers(1, 10)
55
We can also define pi-sum in the same way:
def pi_sum(a, b):
def pi_term(x):
return 1.0 / (x * (x + 2))
def pi_next(x):
return x + 4
return sum(pi_term, a, pi_next, b)
Using these procedures, we can compute an approximation to $\pi$:
Once we have sum, we can use it as a building block in formulating
further concepts. For instance, the definite integral of a function $f$
between the limits $a$ and $b$ can be approximated numerically using the
formula
$$∫ _{a}^{b}\;f\; = \;[\;f(a + \frac{dx}{2})\; + \;f(a + dx + \frac{dx}{2})\; + \;f(a + 2dx + \frac{dx}{2})\; + \; … \;]dx$$
for small values of $dx$. We can express this directly as a procedure:
def cube(x):
return x**3
def sum(term, a, next, b):
# Recursive sum: sum of term(a) + term(next(a)) + ... up to b
if a > b:
return 0
else:
return term(a) + sum(term, next(a), next, b)
def integral(f, a, b, dx):
def add_dx(x):
return x + dx
return sum(f, a + dx / 2.0, add_dx, b) * dx
>>> integral(cube, 0, 1, 0.01)
0.24998750000000042
>>> integral(cube, 0, 1, 0.001)
0.249999875000001
(The exact value of the integral of cube between 0 and 1 is 1/4.)
Exercise 1.29: Simpson’s Rule is a more accurate method of numerical integration than the method illustrated above. Using Simpson’s Rule, the integral of a function $f$ between $a$ and $b$ is approximated as
$$\frac{h}{3}(y_{0} + 4y_{1} + 2y_{2} + 4y_{3} + 2y_{4} + \cdots + 2y_{n - 2} + 4y_{n - 1} + y_{n}),$$
where $h = (b - a)/n$, for some even integer $n$, and $y_{k} = f(a + kh)$. (Increasing $n$ increases the accuracy of the approximation.) Define a procedure that takes as arguments $f$, $a$, $b$, and $n$ and returns the value of the integral, computed using Simpson’s Rule. Use your procedure to integrate
cubebetween 0 and 1 (with $n = 100$ and $n = 1000$), and compare the results to those of theintegralprocedure shown above.Exercise 1.30: The
sumprocedure above generates a linear recursion. The procedure can be rewritten so that the sum is performed iteratively. Show how to do this by filling in the missing expressions in the following definition:
def sum(term, a, next, b): def iter(a, result): if a > b: return result else: return iter(next(a), result + term(a)) return iter(a, 0)Exercise 1.31: 1. The
sumprocedure is only the simplest of a vast number of similar abstractions that can be captured as higher-order procedures. Write an analogous procedure calledproductthat returns the product of the values of a function at points over a given range. Show how to definefactorialin terms ofproduct. Also useproductto compute approximations to $\pi$ using the formula$$\frac{\pi}{4}\; = \;\frac{2 ⋅ 4 ⋅ 4 ⋅ 6 ⋅ 6 ⋅ 8 ⋅ \cdots}{3 ⋅ 3 ⋅ 5 ⋅ 5 ⋅ 7 ⋅ 7 ⋅ \cdots}.$$2. If your
productprocedure generates a recursive process, write one that generates an iterative process. If it generates an iterative process, write one that generates a recursive process.Exercise 1.32: 1. Show that
sumandproduct(Exercise 1.31) are both special cases of a still more general notion calledaccumulatethat combines a collection of terms, using some general accumulation function:(accumulate combiner null-value term a next b)
Accumulatetakes as arguments the same term and range specifications assumandproduct, together with acombinerprocedure (of two arguments) that specifies how the current term is to be combined with the accumulation of the preceding terms and anull-valuethat specifies what base value to use when the terms run out. Writeaccumulateand show howsumandproductcan both be defined as simple calls toaccumulate.2. If youraccumulateprocedure generates a recursive process, write one that generates an iterative process. If it generates an iterative process, write one that generates a recursive process.Exercise 1.33: You can obtain an even more general version of
accumulate(Exercise 1.32) by introducing the notion of a filter on the terms to be combined. That is, combine only those terms derived from values in the range that satisfy a specified condition. The resultingfiltered-accumulateabstraction takes the same arguments as accumulate, together with an additional predicate of one argument that specifies the filter. Writefiltered-accumulateas a procedure. Show how to express the following usingfiltered-accumulate: 1. the sum of the squares of the prime numbers in the interval $a$ to $b$ (assuming that you have aprime?predicate already written)2. the product of all the positive integers less than $n$ that are relatively prime to $n$ (i.e., all positive integers $i < n$ such that $\text{GCD}(i,n) = 1$).1.3.2Constructing Procedures UsingLambda
In using
sumas in 1.3.1, it seems terribly awkward to have to define trivial procedures such aspi-termandpi-nextjust so we can use them as arguments to our higher-order procedure. Rather than definepi-nextandpi-term, it would be more convenient to have a way to directly specify “the procedure that returns its input incremented by 4” and “the procedure that returns the reciprocal of its input times its input plus 2.” We can do this by introducing the special formlambda, which creates procedures. Usinglambdawe can describe what we want aslambda x: x + 4and
>>> (lambda x: 1.0 / (x * (x + 2)))Then our
pi-sumprocedure can be expressed without defining any auxiliary procedures asdef pi_sum(a, b): return sum(lambda x: 1.0 / (x * (x + 2)), a, lambda x: x + 4, b)Again using
lambda, we can write theintegralprocedure without having to define the auxiliary procedureadd-dx:def integral(f, a, b, dx): return sum(f, a + dx / 2.0, lambda x: x + dx, b) * dxIn general,
lambdais used to create procedures in the same way asdefine, except that no name is specified for the procedure:# (lambda (⟨formal-parameters⟩) ⟨body⟩) lambda *args: ...The resulting procedure is just as much a procedure as one that is created using
define. The only difference is that it has not been associated with any name in the environment. In fact,def plus4(x): return x + 4is equivalent to
def plus4(x): return x + 4We can read a
lambdaexpression as follows:>>> (lambda x: x + 4) # the procedure of an argument x that adds x and 4Like any expression that has a procedure as its value, a
lambdaexpression can be used as the operator in a combination such asdef square(x): return x * x >>> (lambda x, y, z: x + y + square(z))(1, 2, 3) 12or, more generally, in any context where we would normally use a procedure name.
Usingletto create local variables
Another use of
lambdais in creating local variables. We often need local variables in our procedures other than those that have been bound as formal parameters. For example, suppose we wish to compute the function$$f(x,y)\; = \;x(1 + xy)^{2} + y(1 - y) + (1 + xy)(1 - y),$$
which we could also express as
$$\begin{array}{lll} a & = & 1 + xy, \\ \phantom{(x,y)}b & = & 1 - y, \\ f(x,y) & = & xa^{2} + yb + ab. \end{array}$$
In writing a procedure to compute $f$, we would like to include as local variables not only $x$ and $y$ but also the names of intermediate quantities like $a$ and $b$. One way to accomplish this is to use an auxiliary procedure to bind the local variables:
def f(x, y): def f_helper(a, b): return x * (a * a) + y * b + a * b return f_helper(1 + x * y, 1 - y)Of course, we could use a
lambdaexpression to specify an anonymous procedure for binding our local variables. The body offthen becomes a single call to that procedure:def square(x): return x * x def f(x, y): a = 1 + (x * y) b = 1 - y return x * square(a) + y * b + a * bThis construct is so useful that there is a special form called
letto make its use more convenient. Usinglet, thefprocedure could be written asdef square(x): return x * x def f(x, y): a = 1 + x * y b = 1 - y return x * square(a) + y * b + a * bThe general form of a
letexpression is# Scheme: # (let ((var1 exp1) # (var2 exp2) # ... # (varn expn)) # body) def _let(): var1 = ... # exp1 var2 = ... # exp2 # ... return ... # bodywhich can be thought of as saying
# let ⟨var₁⟩ have the value ⟨exp₁⟩ and ⟨var₂⟩ have the value ⟨exp₂⟩ … in ⟨body⟩ # translates to assigning the variables and then evaluating the body in Python def _let(): var1 = ... # exp1 var2 = ... # exp2 # ... varn = ... # expn # body that uses var1, var2, ..., varn return ... # result of body # use the let-expression result = _let() resultThe first part of the
letexpression is a list of name-expression pairs. When theletis evaluated, each name is associated with the value of the corresponding expression. The body of theletis evaluated with these names bound as local variables. The way this happens is that theletexpression is interpreted as an alternate syntax for>>> (lambda *vars: ...)(...)No new mechanism is required in the interpreter in order to provide local variables. A
letexpression is simply syntactic sugar for the underlyinglambdaapplication.We can see from this equivalence that the scope of a variable specified by a
letexpression is the body of thelet. This implies that: -Letallows one to bind variables as locally as possible to where they are to be used. For example, if the value ofxis 5, the value of the expression(+ (let ((x 3)) (+ x (* x 10))) x) is 38. Here, the
xin the body of theletis 3, so the value of theletexpression is 33. On the other hand, thexthat is the second argument to the outermost+is still 5.- The variables’ values are computed outside thelet. This matters when the expressions that provide the values for the local variables depend upon variables having the same names as the local variables themselves. For example, if the value ofxis 2, the expression(let ((x 3) (y (+ x 2))) (* x y)) will have the value 12 because, inside the body of the
let,xwill be 3 andywill be 4 (which is the outerxplus 2). Sometimes we can use internal definitions to get the same effect as withlet. For example, we could have defined the procedurefabove asdef square(x): return x * x def f(x, y): a = 1 + (x * y) b = 1 - y return x * square(a) + y * b + a * bWe prefer, however, to use
letin situations like this and to use internaldefineonly for internal procedures.Exercise 1.34: Suppose we define the procedure
>>> def f(g): ... return g(2)Then we have
>>> def f(g): ... return g(2) >>> def square(x): ... return x * x >>> f(square) 4 >>> f(lambda z: z * (z + 1)) 6What happens if we (perversely) ask the interpreter to evaluate the combination
(f f)? Explain.1.3.3Procedures as General Methods
We introduced compound procedures in 1.1.4 as a mechanism for abstracting patterns of numerical operations so as to make them independent of the particular numbers involved. With higher-order procedures, such as the
integralprocedure of 1.3.1, we began to see a more powerful kind of abstraction: procedures used to express general methods of computation, independent of the particular functions involved. In this section we discuss two more elaborate examples—general methods for finding zeros and fixed points of functions—and show how these methods can be expressed directly as procedures.Finding roots of equations by the half-interval method
The half-interval method is a simple but powerful technique for finding roots of an equation $f(x) = 0$, where $f$ is a continuous function. The idea is that, if we are given points $a$ and $b$ such that $f(a) < 0 < f(b)$, then $f$ must have at least one zero between $a$ and $b$. To locate a zero, let $x$ be the average of $a$ and $b$, and compute $f(x)$. If $f(x) > 0$, then $f$ must have a zero between $a$ and $x$. If $f(x) < 0$, then $f$ must have a zero between $x$ and $b$. Continuing in this way, we can identify smaller and smaller intervals on which $f$ must have a zero. When we reach a point where the interval is small enough, the process stops. Since the interval of uncertainty is reduced by half at each step of the process, the number of steps required grows as $Θ(\text{log} (L\;/\;T))$, where $L$ is the length of the original interval and $T$ is the error tolerance (that is, the size of the interval we will consider “small enough”). Here is a procedure that implements this strategy:
def search(f, neg_point, pos_point): midpoint = average(neg_point, pos_point) if close_enough(neg_point, pos_point): return midpoint test_value = f(midpoint) if positive(test_value): return search(f, neg_point, midpoint) elif negative(test_value): return search(f, midpoint, pos_point) else: return midpointWe assume that we are initially given the function $f$ together with points at which its values are negative and positive. We first compute the midpoint of the two given points. Next we check to see if the given interval is small enough, and if so we simply return the midpoint as our answer. Otherwise, we compute as a test value the value of $f$ at the midpoint. If the test value is positive, then we continue the process with a new interval running from the original negative point to the midpoint. If the test value is negative, we continue with the interval from the midpoint to the positive point. Finally, there is the possibility that the test value is 0, in which case the midpoint is itself the root we are searching for.
To test whether the endpoints are “close enough” we can use a procedure similar to the one used in 1.1.7 for computing square roots:
def close_enough(x, y): return abs(x - y) < 0.001
Searchis awkward to use directly, because we can accidentally give it points at which $f$’s values do not have the required sign, in which case we get a wrong answer. Instead we will usesearchvia the following procedure, which checks to see which of the endpoints has a negative function value and which has a positive value, and calls thesearchprocedure accordingly. If the function has the same sign on the two given points, the half-interval method cannot be used, in which case the procedure signals an error.def half_interval_method(f, a, b): a_value = f(a) b_value = f(b) if a_value < 0 and b_value > 0: return search(f, a, b) elif b_value < 0 and a_value > 0: return search(f, b, a) else: raise ValueError(f"Values are not of opposite sign: {a} {b}")The following example uses the half-interval method to approximate $\pi$ as the root between 2 and 4 of $\text{sin} x = 0$:
import math def half_interval_method(f, a, b): def average(x, y): return (x + y) / 2.0 def close_enough(x, y): return abs(x - y) < 0.001 def sign(x): if x > 0: return 1 elif x < 0: return -1 else: return 0 def same_sign(x, y): return sign(x) == sign(y) def search(a, b): mid = average(a, b) if close_enough(a, b): return mid f_mid = f(mid) if same_sign(f(a), f_mid): return search(mid, b) else: return search(a, mid) return search(a, b) >>> half_interval_method(math.sin, 2.0, 4.0) 3.14111328125Here is another example, using the half-interval method to search for a root of the equation $x^{3} - 2x - 3 = 0$ between 1 and 2:
def half_interval_method(f, a, b): def sign(x): if x > 0: return 1 elif x < 0: return -1 else: return 0 def iter(a, b): mid = (a + b) / 2 if a == b: return mid if sign(f(mid)) == 0: return mid if sign(f(a)) == sign(f(mid)): return iter(mid, b) else: return iter(a, mid) return iter(a, b) >>> half_interval_method(lambda x: x*x*x - 2*x - 3, 1.0, 2.0) 1.89306640625Finding fixed points of functions
A number $x$ is called a fixed point of a function $f$ if $x$ satisfies the equation $f(x) = x$. For some functions $f$ we can locate a fixed point by beginning with an initial guess and applying $f$ repeatedly,
$$f(x),\;f(f(x)),\;f(f(f(x))),\;… ,$$
until the value does not change very much. Using this idea, we can devise a procedure
fixed-pointthat takes as inputs a function and an initial guess and produces an approximation to a fixed point of the function. We apply the function repeatedly until we find two successive values whose difference is less than some prescribed tolerance:tolerance = 0.00001 def fixed_point(f, first_guess): def close_enough(v1, v2): return abs(v1 - v2) < tolerance def try_guess(guess): next_val = f(guess) if close_enough(guess, next_val): return next_val return try_guess(next_val) return try_guess(first_guess)For example, we can use this method to approximate the fixed point of the cosine function, starting with 1 as an initial approximation:
import math tolerance = 1e-5 def close_enough(v1, v2): return abs(v1 - v2) < tolerance def fixed_point(f, first_guess): def try_guess(guess): next_ = f(guess) if close_enough(guess, next_): return next_ else: return try_guess(next_) return try_guess(first_guess) >>> fixed_point(math.cos, 1.0) 0.7390822985224023Similarly, we can find a solution to the equation $y = \text{sin} y + \text{cos} y$:
>>> import math >>> def fixed_point(f, first_guess, tolerance=1e-5, max_iter=1000): ... guess = first_guess ... for _ in range(max_iter): ... next_guess = f(guess) ... if abs(next_guess - guess) < tolerance: ... return next_guess ... guess = next_guess ... return guess >>> fixed_point(lambda y: math.sin(y) + math.cos(y), 1.0) 1.2587315962971173The fixed-point process is reminiscent of the process we used for finding square roots in 1.1.7. Both are based on the idea of repeatedly improving a guess until the result satisfies some criterion. In fact, we can readily formulate the square-root computation as a fixed-point search. Computing the square root of some number $x$ requires finding a $y$ such that $y^{2} = x$. Putting this equation into the equivalent form $y = x/y$, we recognize that we are looking for a fixed point of the function $y ↦ x/y$, and we can therefore try to compute square roots as
def fixed_point(f, first_guess): def close_enough(v1, v2): return abs(v1 - v2) < 1e-5 def try_guess(guess): next_ = f(guess) if close_enough(guess, next_): return next_ else: return try_guess(next_) return try_guess(first_guess) def sqrt(x): return fixed_point(lambda y: x / y, 1.0)Unfortunately, this fixed-point search does not converge. Consider an initial guess $y_{1}$. The next guess is $y_{2} = x/y_{1}$ and the next guess is $y_{3} = x/y_{2} = x/(x/y_{1}) = y_{1}$. This results in an infinite loop in which the two guesses $y_{1}$ and $y_{2}$ repeat over and over, oscillating about the answer.
One way to control such oscillations is to prevent the guesses from changing so much. Since the answer is always between our guess $y$ and $x/y$, we can make a new guess that is not as far from $y$ as $x/y$ by averaging $y$ with $x/y$, so that the next guess after $y$ is $\frac{1}{2}(y + x/y)$ instead of $x/y$. The process of making such a sequence of guesses is simply the process of looking for a fixed point of $y ↦ \frac{1}{2}(y + x/y)$:
# fixed-point implementation (from SICP) def fixed_point(f, first_guess): tolerance = 1e-5 def close_enough(v1, v2): return abs(v1 - v2) < tolerance guess = first_guess while True: next_guess = f(guess) if close_enough(guess, next_guess): return next_guess guess = next_guess def average(x, y): return (x + y) / 2.0 def sqrt(x): return fixed_point(lambda y: average(y, x / y), 1.0)(Note that $y = \frac{1}{2}(y + x/y)$ is a simple transformation of the equation $y = x/y ;$ to derive it, add $y$ to both sides of the equation and divide by 2.)
With this modification, the square-root procedure works. In fact, if we unravel the definitions, we can see that the sequence of approximations to the square root generated here is precisely the same as the one generated by our original square-root procedure of 1.1.7. This approach of averaging successive approximations to a solution, a technique that we call
average damping, often aids the convergence of fixed-point searches.
Exercise 1.35: Show that the golden ratio $φ$ (1.2.2) is a fixed point of the transformation $x ↦ 1 + 1/x$, and use this fact to compute $φ$ by means of the
fixed-pointprocedure.Exercise 1.36: Modify
fixed-pointso that it prints the sequence of approximations it generates, using thenewlineanddisplayprimitives shown in Exercise 1.22. Then find a solution to $x^{x} = 1000$ by finding a fixed point of $x ↦ \text{log} (1000)/\text{log} (x)$. (Use Scheme’s primitivelogprocedure, which computes natural logarithms.) Compare the number of steps this takes with and without average damping. (Note that you cannot startfixed-pointwith a guess of 1, as this would cause division by $\text{log} (1) = 0$.)Exercise 1.37: 1. An infinite continued fraction is an expression of the form
$$f\; = \;\frac{N_{1}}{D_{1} + \frac{N_{2}}{D_{2} + \frac{N_{3}}{D_{3} + …}}}.$$
As an example, one can show that the infinite continued fraction expansion with the $N_{i}$ and the $D_{i}$ all equal to 1 produces $1/φ$, where $φ$ is the golden ratio (described in 1.2.2). One way to approximate an infinite continued fraction is to truncate the expansion after a given number of terms. Such a truncation—a so-called finite continued fraction k-term finite continued fraction—has the form
$$\frac{N_{1}}{D_{1} + \frac{N_{2}}{⋱ + \frac{N_{k}}{D_{k}}}}.$$
Suppose that
nanddare procedures of one argument (the term index $i$) that return the $N_{i}$ and $D_{i}$ of the terms of the continued fraction. Define a procedurecont-fracsuch that evaluating(cont-frac n d k)computes the value of the $k$-term finite continued fraction. Check your procedure by approximating $1/φ$ using(cont-frac (lambda (i) 1.0) (lambda (i) 1.0) k) for successive values of
k. How large must you makekin order to get an approximation that is accurate to 4 decimal places?2. If yourcont-fracprocedure generates a recursive process, write one that generates an iterative process. If it generates an iterative process, write one that generates a recursive process.Exercise 1.38: In 1737, the Swiss mathematician Leonhard Euler published a memoir De Fractionibus Continuis, which included a continued fraction expansion for $e - 2$, where $e$ is the base of the natural logarithms. In this fraction, the $N_{i}$ are all 1, and the $D_{i}$ are successively 1, 2, 1, 1, 4, 1, 1, 6, 1, 1, 8, …. Write a program that uses your
cont-fracprocedure from Exercise 1.37 to approximate $e$, based on Euler’s expansion.Exercise 1.39: A continued fraction representation of the tangent function was published in 1770 by the German mathematician J.H. Lambert:
$$\text{tan} x\; = \;\frac{x}{1 - \frac{x^{2}}{3 - \frac{x^{2}}{5 - …}}}\;,$$
where $x$ is in radians. Define a procedure
(tan-cf x k)that computes an approximation to the tangent function based on Lambert’s formula.kspecifies the number of terms to compute, as in Exercise 1.37.1.3.4Procedures as Returned Values
The above examples demonstrate how the ability to pass procedures as arguments significantly enhances the expressive power of our programming language. We can achieve even more expressive power by creating procedures whose returned values are themselves procedures.
We can illustrate this idea by looking again at the fixed-point example described at the end of 1.3.3. We formulated a new version of the square-root procedure as a fixed-point search, starting with the observation that $\sqrt{x}$ is a fixed-point of the function $y ↦ x/y$. Then we used average damping to make the approximations converge. Average damping is a useful general technique in itself. Namely, given a function $f$, we consider the function whose value at $x$ is equal to the average of $x$ and $f(x)$.
We can express the idea of average damping by means of the following procedure:
def average_damp(f): return lambda x: average(x, f(x))
Average-dampis a procedure that takes as its argument a procedurefand returns as its value a procedure (produced by thelambda) that, when applied to a numberx, produces the average ofxand(f x). For example, applyingaverage-dampto thesquareprocedure produces a procedure whose value at some number $x$ is the average of $x$ and $x^{2}$. Applying this resulting procedure to 10 returns the average of 10 and 100, or 55:>>> def square(x): ... return x * x >>> def average_damp(f): ... def _avg(x): ... return (x + f(x)) // 2 ... return _avg >>> (average_damp(square))(10) 55Using
average-damp, we can reformulate the square-root procedure as follows:def sqrt(x): return fixed_point(average_damp(lambda y: x / y), 1.0)Notice how this formulation makes explicit the three ideas in the method: fixed-point search, average damping, and the function $y ↦ x/y$. It is instructive to compare this formulation of the square-root method with the original version given in 1.1.7. Bear in mind that these procedures express the same process, and notice how much clearer the idea becomes when we express the process in terms of these abstractions. In general, there are many ways to formulate a process as a procedure. Experienced programmers know how to choose procedural formulations that are particularly perspicuous, and where useful elements of the process are exposed as separate entities that can be reused in other applications. As a simple example of reuse, notice that the cube root of $x$ is a fixed point of the function $y ↦ x/y^{2}$, so we can immediately generalize our square-root procedure to one that extracts cube roots:
def square(x): return x * x def average(a, b): return (a + b) / 2.0 def average_damp(f): return lambda x: average(x, f(x)) def fixed_point(f, first_guess, tolerance=1e-5, max_iter=1000): def close_enough(v1, v2): return abs(v1 - v2) < tolerance guess = first_guess for _ in range(max_iter): next_guess = f(guess) if close_enough(guess, next_guess): return next_guess guess = next_guess return guess def cube_root(x): return fixed_point( average_damp(lambda y: x / (square(y))), 1.0 )Newton’s method
When we first introduced the square-root procedure, in 1.1.7, we mentioned that this was a special case of Newton’s method.
If $x ↦ g(x)$ is a differentiable function, then a solution of the equation $g(x) = 0$ is a fixed point of the function $x ↦ f(x)$ where$$f(x)\; = \;x - \frac{g(x)}{Dg(x)}$$
and $Dg(x)$ is the derivative of $g$ evaluated at $x$. Newton’s method is the use of the fixed-point method we saw above to approximate a solution of the equation by finding a fixed point of the function $f$.
For many functions $g$ and for sufficiently good initial guesses for $x$, Newton’s method converges very rapidly to a solution of $g(x) = 0$.
In order to implement Newton’s method as a procedure, we must first express the idea of derivative. Note that “derivative,” like average damping, is something that transforms a function into another function. For instance, the derivative of the function $x ↦ x^{3}$ is the function $x ↦ 3x^{2}$. In general, if $g$ is a function and $dx$ is a small number, then the derivative $Dg$ of $g$ is the function whose value at any number $x$ is given (in the limit of small $dx$) by
$$Dg(x)\; = \;\frac{g(x + dx) - g(x)}{dx}.$$
Thus, we can express the idea of derivative (taking $dx$ to be, say, 0.00001) as the procedure
def deriv(g): def derivative(x): return (g(x + dx) - g(x)) / dx return derivativealong with the definition
dx = 0.00001Like
average-damp,derivis a procedure that takes a procedure as argument and returns a procedure as value. For example, to approximate the derivative of $x ↦ x^{3}$ at 5 (whose exact value is 75) we can evaluate>>> def cube(x): ... return x * x * x ... >>> def deriv(g): ... dx = 1e-5 ... return lambda x: (g(x + dx) - g(x)) / dx ... >>> deriv(cube)(5) 75.00014999664018With the aid of
deriv, we can express Newton’s method as a fixed-point process:def newton_transform(g): # Returns the Newton transform of g: a function that maps x to x - g(x)/g'(x) derivative_fn = deriv(g) def transform(x): return x - g(x) / derivative_fn(x) return transform def newtons_method(g, guess): return fixed_point(newton_transform(g), guess)The
newton-transformprocedure expresses the formula at the beginning of this section, andnewtons-methodis readily defined in terms of this. It takes as arguments a procedure that computes the function for which we want to find a zero, together with an initial guess. For instance, to find the square root of $x$, we can use Newton’s method to find a zero of the function $y ↦ y^{2} - x$ starting with an initial guess of 1.This provides yet another form of the square-root procedure:
def sqrt(x): return newtons_method(lambda y: square(y) - x, 1.0)Abstractions and first-class procedures
We’ve seen two ways to express the square-root computation as an instance of a more general method, once as a fixed-point search and once using Newton’s method. Since Newton’s method was itself expressed as a fixed-point process, we actually saw two ways to compute square roots as fixed points. Each method begins with a function and finds a fixed point of some transformation of the function. We can express this general idea itself as a procedure:
def fixed_point_of_transform(g, transform, guess): return fixed_point(transform(g), guess)This very general procedure takes as its arguments a procedure
gthat computes some function, a procedure that transformsg, and an initial guess. The returned result is a fixed point of the transformed function.Using this abstraction, we can recast the first square-root computation from this section (where we look for a fixed point of the average-damped version of $y ↦ x/y$) as an instance of this general method:
def sqrt(x): return fixed_point_of_transform(lambda y: x / y, average_damp, 1.0)Similarly, we can express the second square-root computation from this section (an instance of Newton’s method that finds a fixed point of the Newton transform of $y ↦ y^{2} - x$) as
>>> def sqrt(x): ... return fixed_point_of_transform(lambda y: square(y) - x, newton_transform, 1.0)We began section 1.3 with the observation that compound procedures are a crucial abstraction mechanism, because they permit us to express general methods of computing as explicit elements in our programming language. Now we’ve seen how higher-order procedures permit us to manipulate these general methods to create further abstractions.
As programmers, we should be alert to opportunities to identify the underlying abstractions in our programs and to build upon them and generalize them to create more powerful abstractions. This is not to say that one should always write programs in the most abstract way possible; expert programmers know how to choose the level of abstraction appropriate to their task. But it is important to be able to think in terms of these abstractions, so that we can be ready to apply them in new contexts. The significance of higher-order procedures is that they enable us to represent these abstractions explicitly as elements in our programming language, so that they can be handled just like other computational elements.
In general, programming languages impose restrictions on the ways in which computational elements can be manipulated. Elements with the fewest restrictions are said to have first-class status. Some of the “rights and privileges” of first-class elements are: - They may be named by variables.- They may be passed as arguments to procedures.- They may be returned as the results of procedures.- They may be included in data structures. Lisp, unlike other common programming languages, awards procedures full first-class status. This poses challenges for efficient implementation, but the resulting gain in expressive power is enormous.
Exercise 1.40: Define a procedure
cubicthat can be used together with thenewtons-methodprocedure in expressions of the form
def cubic(a, b, c): """Return a function that computes x^3 + a*x^2 + b*x + c.""" def cubic_fn(x): return x**3 + a * x**2 + b * x + c return cubic_fn def deriv(g, dx=1e-6): """Return a numerical derivative function of g using difference quotient.""" def der(x): return (g(x + dx) - g(x)) / dx return der def newton_transform(g): """Return the Newton transform of g: a function that improves a guess.""" dg = deriv(g) def transform(x): return x - g(x) / dg(x) return transform def newtons_method(g, guess, tol=1e-7): """Find a root of g starting from guess using Newton's method.""" improve = newton_transform(g) def close_enough(x, y): return abs(x - y) < tol next_guess = improve(guess)to approximate zeros of the cubic $x^{3} + ax^{2} + bx + c$.
Exercise 1.41: Define a procedure
doublethat takes a procedure of one argument as argument and returns a procedure that applies the original procedure twice. For example, ifincis a procedure that adds 1 to its argument, then(double inc)should be a procedure that adds 2. What value is returned by
Exercise 1.42: Let $f$ and $g$ be two one-argument functions. The composition $f$ after $g$ is defined to be the function $x ↦ f(g(x))$. Define a procedure
composethat implements composition. For example, ifincis a procedure that adds 1 to its argument,
def compose(f, g): return lambda x: f(g(x)) def square(x): return x * x def inc(x): return x + 1 >>> compose(square, inc)(6) 49Exercise 1.43: If $f$ is a numerical function and $n$ is a positive integer, then we can form the $n^{\text{th}}$ repeated application of $f$, which is defined to be the function whose value at $x$ is $f(f( … (f(x)) … ))$. For example, if $f$ is the function $x ↦ x + 1$, then the $n^{\text{th}}$ repeated application of $f$ is the function $x ↦ x + n$. If $f$ is the operation of squaring a number, then the $n^{\text{th}}$ repeated application of $f$ is the function that raises its argument to the $2^{n}\text{-th}$ power. Write a procedure that takes as inputs a procedure that computes $f$ and a positive integer $n$ and returns the procedure that computes the $n^{\text{th}}$ repeated application of $f$. Your procedure should be able to be used as follows:
def square(x): return x * x def repeated(f, n): # Return a function that applies f n times to its argument def repeated_f(x): result = x for _ in range(n): result = f(result) return result return repeated_f # REPL: >>> repeated(square, 2)(5) 625Hint: You may find it convenient to use
composefrom Exercise 1.42.Exercise 1.44: The idea of smoothing a function is an important concept in signal processing. If $f$ is a function and $dx$ is some small number, then the smoothed version of $f$ is the function whose value at a point $x$ is the average of $f(x - dx)$, $f(x)$, and $f(x + dx)$. Write a procedure
smooththat takes as input a procedure that computes $f$ and returns a procedure that computes the smoothed $f$. It is sometimes valuable to repeatedly smooth a function (that is, smooth the smoothed function, and so on) to obtain the n-fold smoothed function. Show how to generate the n-fold smoothed function of any given function usingsmoothandrepeatedfrom Exercise 1.43.Exercise 1.45: We saw in 1.3.3 that attempting to compute square roots by naively finding a fixed point of $y ↦ x/y$ does not converge, and that this can be fixed by average damping. The same method works for finding cube roots as fixed points of the average-damped $y ↦ x/y^{2}$. Unfortunately, the process does not work for fourth roots—a single average damp is not enough to make a fixed-point search for $y ↦ x/y^{3}$ converge. On the other hand, if we average damp twice (i.e., use the average damp of the average damp of $y ↦ x/y^{3}$) the fixed-point search does converge. Do some experiments to determine how many average damps are required to compute $n^{\text{th}}$ roots as a fixed-point search based upon repeated average damping of $y ↦ x/y^{\;n - 1}$.
Use this to implement a simple procedure for computing $n^{\text{th}}$ roots usingfixed-point,average-damp, and therepeatedprocedure of Exercise 1.43. Assume that any arithmetic operations you need are available as primitives.Exercise 1.46: Several of the numerical methods described in this chapter are instances of an extremely general computational strategy known as iterative improvement. Iterative improvement says that, to compute something, we start with an initial guess for the answer, test if the guess is good enough, and otherwise improve the guess and continue the process using the improved guess as the new guess. Write a procedure
iterative-improvethat takes two procedures as arguments: a method for telling whether a guess is good enough and a method for improving a guess.Iterative-improveshould return as its value a procedure that takes a guess as argument and keeps improving the guess until it is good enough. Rewrite thesqrtprocedure of 1.1.7 and thefixed-pointprocedure of 1.3.3 in terms ofiterative-improve.Adapted from Structure and Interpretation of Computer Programs by Harold Abelson and Gerald Jay Sussman (MIT Press, 1996). Original Scheme examples translated to Python.