7.4 Method
of Least Squares
Regression
Analysis -
process of fitting an elementary function to a set of points using the method
of least squares
Consider
set of points: {(1,20), (2,14),
(3,11), (4,3)}
Being fit to a line y = ax + b
Create
and Minimize the
function which is the sum of the squares of the residuals
Residual - the difference between the y
value of a point and the y-value predicted by the equation o the line.

Generalized
formulae: page 571
#20
for each of five different years, the accompanying table gives the percentage
of high school students who had used cocaine at least one in their lives up to
that year:
|
Year |
1991 |
1993 |
1995 |
1997 |
1999 |
2001 |
2003 |
2005 |
|
% who had used cocaine at least once |
6.0 |
4.9 |
7.0 |
8.2 |
9.5 |
9.4 |
8.7 |
7.6 |
a) Plot these data on a graph, with the
number of years after 1991 on the x
axis and the percentage of cocaine users on the y axis.
b) Find the equation of the
least-squares line for the data.
c) Use the least-squares line to
predict the percentage of high school students who used cocaine at least once
by the year 2009.
#24 In a study of five
industrial areas, a researcher obtained these data relating the average number
of units of a certain pollutant in the air and the incidence (per 100,000
people) of a certain disease:
|
Units of
pollutant |
3.4 |
4.6 |
5.2 |
8.0 |
10.7 |
|
Incidence
of disease |
48 |
52 |
58 |
76 |
96 |
a) Plot these data on a graph, using the
Units of pollutant as the x-variable.
b) Find the equation of the
least-squares line for the data.
c) Use the
least-squares line to estimate the incidence of the disease in an area with an
average pollution level of 7.3 units.
Justification for ![]()

Then the
linear best fit becomes ln y = m
x + ln b

Justification for ![]()

Then the
linear best fit becomes ln y = m
ln x + ln b

Justification for y = A
+ Blnx

Then the line of best fit is y = m ln x + b
Could use a
similar process of residuals for quadratic of the form:
or for cubic or quadratic … but the process of
partial derivatives and resulting equations becomes very messy.
Find the
coefficients of the parabola that is the "best" fit for the points
(-1,-2), (0,1), (1,2), and (2,0)