Polynomial Regression with Least Squares

1. Understanding Polynomial Regression

What is Polynomial Regression?

Polynomial regression extends linear regression to model nonlinear relationships by adding polynomial terms:

Linear Regression:

z = β₀ + β₁x + β₂y

Quadratic Regression:

z = β₀ + β₁x + β₂y + β₃x² + β₄xy + β₅y²

Cubic Regression:

z = β₀ + β₁x + β₂y + β₃x² + β₄xy + β₅y² + β₆x³ + β₇x²y + β₈xy² + β₉y³

2. Mathematical Foundation

Least Squares Method

The goal is to minimize the sum of squared residuals:

min Σ(z_i - ž_i)²

Where:

  • z_i = observed value at point i
  • ž_i = predicted value from polynomial

Matrix Formulation

A · β = Z

Where:

  • A = Design matrix of polynomial terms
  • β = Coefficient vector (what we solve for)
  • Z = Observed values vector

Solution using Normal Equations

β = (AᵀA)⁻¹AᵀZ

3. Step-by-Step Algorithm

Step 1: Data Preparation

# Input data: x, y coordinates and z values
x = [x₁, x₂, ..., xₙ]
y = [y₁, y₂, ..., yₙ]  
z = [z₁, z₂, ..., zₙ]

Step 2: Design Matrix Construction

For degree 2 polynomial:

# Each row: [1, x, y, x², xy, y²]
A = [
    [1, x₁, y₁, x₁², x₁y₁, y₁²],
    [1, x₂, y₂, x₂², x₂y₂, y₂²],
    ...
    [1, xₙ, yₙ, xₙ², xₙyₙ, yₙ²]
]

Step 3: Solve Least Squares

# Using numpy's least squares solver
coefficients = np.linalg.lstsq(A, z, rcond=None)[0]

Step 4: Surface Evaluation

def evaluate_polynomial(x, y, coeffs, degree):
    value = 0
    idx = 0
    for d in range(degree + 1):
        for j in range(d + 1):
            value += coeffs[idx] * (x ** (d - j)) * (y ** j)
            idx += 1
    return value

4. Practical Example

Sample Data:

PointXYZ (Elevation)
100100
2100110
3010105
41010115

Linear Surface (Degree 1):

Design Matrix A:
[[1, 0, 0],
 [1, 10, 0], 
 [1, 0, 10],
 [1, 10, 10]]
 
Solving: A·β = [100, 110, 105, 115]

Result:

z = 100 + 1.0*x + 0.5*y

5. Choosing Polynomial Degree

Degree 1 - Linear:

  • Good for simple trends
  • 3 coefficients
  • Planar surface

Degree 2 - Quadratic:

  • Captures curvature
  • 6 coefficients
  • Can model hills/valleys

Degree 3 - Cubic:

  • Complex surfaces
  • 10 coefficients
  • Risk of overfitting

6. Interpretation of Results

Residual Analysis:

residuals = z_observed - z_predicted
RMSE = np.sqrt(np.mean(residuals**2))

Goodness of Fit:

  • : Proportion of variance explained
  • RMSE: Average prediction error
  • Visual inspection: Check residual patterns

7. Applications in GIS

Topographic Analysis:

  • Regional slope detection
  • Geological trend identification
  • Climate pattern analysis

Environmental Modeling:

  • Pollution gradient mapping
  • Temperature surface generation
  • Precipitation trend analysis

8. Advantages and Limitations

Advantages:

  • Simple mathematical foundation
  • Computationally efficient
  • Easy to interpret coefficients
  • Flexible for different complexities

Limitations:

  • Sensitive to outliers
  • Assumes global trend (no local variations)
  • Can overfit with high degrees
  • Extrapolation risks

9. Validation Methods

Cross-Validation:

# Split data into training and testing
train_mask = np.random.choice([True, False], size=len(z), p=[0.7, 0.3])
train_A, test_A = A[train_mask], A[~train_mask]
train_z, test_z = z[train_mask], z[~train_mask]

Residual Mapping:

  • Create residual raster (observed - predicted)
  • Check for spatial patterns
  • Identify areas of poor fit

This tutorial provides both the theoretical foundation and practical implementation for polynomial trend surface analysis using least squares regression!