Numerical Methods · Learn

Ritz-Galerkin Method

You don't always need to solve a differential equation exactly. Pick a handful of "reasonable-looking" functions, demand that the equation's leftover error be orthogonal to each one, and a small linear system hands you a surprisingly good answer.

▶ Play the lab📖 Learn the theory

1The problem

Consider the boundary value problem

$$ \frac{d}{dt}\left(t\frac{dx}{dt}\right) + t(x-1) = 0,\qquad x(0)\text{ finite},\quad x(1)=0 $$

This is a disguised Bessel equation, and it happens to have a known closed-form solution — useful, because it means we can grade an approximate method against the truth:

$$ x(t) = 1 - \frac{J_0(t)}{J_0(1)} $$

In general, most boundary value problems you'll meet don't have a closed form at all. The Ritz-Galerkin method gives you a way forward anyway.

2Choosing trial functions

Pick a small set of trial functions $\varphi_i(t)$ and look for an approximate solution as their weighted sum:

$$ x(t) \approx \sum_{i=1}^N c_i\,\varphi_i(t) $$

Here, $\varphi_n(t) = t^{n-1}(1-t)$. This isn't an arbitrary guess — the factor $(1-t)$ makes every trial function vanish at $t=1$ automatically, matching $x(1)=0$ for any choice of coefficients, and $t^{n-1}$ keeps each one finite at $t=0$. The boundary conditions are satisfied by construction, before a single coefficient is solved for — only the differential equation itself is left to approximately satisfy.

3The residual, and Galerkin's condition

Substitute the approximation into the left side of the ODE. Write $L[\varphi] = \frac{d}{dt}(t\varphi') + t\varphi$ for the operator. Because the trial functions can't satisfy the equation exactly, you're left with a nonzero residual:

$$ R(t) = \sum_{i=1}^N c_i\,L[\varphi_i](t) - t $$

You can't force $R(t)=0$ at every point — that would take infinitely many trial functions. Galerkin's condition instead asks that $R(t)$ be orthogonal to every trial function used to build the approximation:

$$ \int_0^1 R(t)\,\varphi_i(t)\,dt = 0,\qquad i=1,\dots,N $$

That's exactly $N$ linear equations in the $N$ unknown coefficients $c_i$ — an ordinary matrix problem, solvable directly for small $N$ or by an iterative method like Gauss-Seidel for larger systems.

Recipe

  1. Choose trial functions $\varphi_i(t)$ that already satisfy the boundary conditions.
  2. Form the residual $R(t)$ by substituting $\sum c_i\varphi_i$ into the differential operator.
  3. Set $\int R(t)\varphi_i(t)\,dt = 0$ for each $i=1,\dots,N$ — this is $N$ linear equations in $N$ unknowns.
  4. Solve the linear system for $c_1,\dots,c_N$.
  5. Assemble $x(t) \approx \sum c_i\varphi_i(t)$.

4Working it by hand: one and two terms

With $N=1$, $\varphi_1(t)=1-t$. The operator gives $L[\varphi_1](t) = -1+t-t^2$, and the single Galerkin equation $\int_0^1\big(c_1 L[\varphi_1]-t\big)\varphi_1\,dt=0$ reduces to

$$ c_1\left(-\tfrac{5}{12}\right) - \tfrac{1}{6} = 0 \quad\Longrightarrow\quad c_1 = -\tfrac{2}{5} = -0.4 $$

so the one-term approximation is $x(t)\approx -0.4(1-t)$ — crude, but it already has the right boundary behavior and the right sign.

Adding a second trial function $\varphi_2(t)=t(1-t)$ turns the single equation into a $2\times 2$ system:

$$ \begin{bmatrix}-\tfrac{5}{12} & -\tfrac{2}{15}\\[2pt] -\tfrac{2}{15} & -\tfrac{3}{20}\end{bmatrix}\begin{bmatrix}c_1\\ c_2\end{bmatrix} = \begin{bmatrix}\tfrac16\\[2pt]\tfrac1{12}\end{bmatrix} \quad\Longrightarrow\quad c_1=-\tfrac{50}{161},\;\; c_2=-\tfrac{45}{161} $$

Notice the matrix is symmetric — that's not a coincidence. The operator $L$ here is self-adjoint (it comes from a Sturm-Liouville problem), and self-adjoint operators always produce symmetric Galerkin matrices.

▶ In the lab

Slide N from 1 to 6 and watch the maximum error against the exact $J_0$-based solution fall from about $9\times10^{-2}$ at one term to under $2\times10^{-8}$ at six — and watch the live equations being assembled, term by term, as you add trial functions. Open the lab →

5Why "Ritz" and "Galerkin" share a name

Ritz's original method starts somewhere completely different: minimize a variational functional (an energy-like integral whose minimum corresponds to the true solution) directly over the trial coefficients. Galerkin's method, derived above, starts from the differential equation itself and forces a weighted residual to vanish. For a self-adjoint operator, both approaches generate the exact same linear system — which is why the combined name persists even though the two methods don't obviously start from the same place. For operators that aren't self-adjoint, only Galerkin's weighted-residual route still applies.

6Where this leads

Everything above uses global trial functions — each $\varphi_i$ is nonzero across the whole domain, which is why a handful of well-chosen polynomials can already get within $10^{-8}$. The Finite Element Method applies the identical weighted-residual idea to local, piecewise trial functions — each one nonzero only over a small element of the domain. The bookkeeping scales to far more complex geometries that way, but the governing condition, $\int(\text{residual})\cdot(\text{trial function})\,dt=0$, never changes. The same idea also extends directly to PDEs in two or more dimensions — the trial functions just become functions of $x$ and $y$ together, and the integral becomes a double integral over the domain.

Key takeaways

Worked from the site author's own teaching notes on the Ritz-Galerkin method. References cited there: Optimization by Variational Methods, Morton M. Denn, McGraw-Hill Book Company, 1969; Conduction Heat Transfer, V. Arpaci, Addison-Wesley Publishing Company, 1966.

▶ Related

Everything here is one spatial dimension. The 2D companion applies the identical method to a heated square plate — same trial-function idea, one more integral, and a genuinely interesting catch about trial functions and symmetry.

EngineeringCandy · Learn · the theory behind the playground