Gram Schmidt Orthogonalization for Continuous Functions

Gram-Schmidt Orthogonalization

Computer Solution of Large Linear Systems

In Studies in Mathematics and Its Applications, 1999

Proof

The method to obtain yⁱ , is known as the Gram–Schmidt orthogonalization process. Let us consider first only two vectors, i.e., n = 2. Let x ¹ and x ² be given. We define

$\begin{array}{l} y^{1} = \frac{x^{1}}{‖ x^{1} ‖}, \\ y^{2} = x^{2} - \frac{(x^{1}, x^{2})}{{‖ x^{1} ‖}^{2}} x^{1} = x^{2} - (y^{1}, x^{2}) y^{1} \cdot \end{array}$

Note that $\frac{(x^{1}, x^{2})}{{‖ x^{1} ‖}^{2}} x^{1}$ is the component of x ² in the direction x ¹. Clearly, if we subtract this component from x ² we obtain a vector y ² which is orthogonal to x ¹. The vectors y ¹ and y ² are independent, linear combinations of x ¹ and x ² and span the same subspace. This can be generalized to n vectors giving

$\begin{array}{l} y^{1} = \frac{x^{1}}{‖ x^{1} ‖}, \\ z^{i} = x^{i} - \frac{(y^{1}, x^{i})}{{‖ y^{1} ‖}^{2}} y^{1} - \dots - \frac{(y^{i - 1}, x^{i})}{{‖ y^{i - 1} ‖}^{2}} y^{i - 1}, \\ \begin{matrix} y^{i} = \frac{z^{i}}{‖ z^{i} ‖}, & i = 2, \dots, n \end{matrix} \cdot \end{array}$

It is easy to check that the y² are orthogonal and by induction that the spanned subsets are the same.

Note that for the Gram–Schmidt algorithm we have to construct and store the previous i – 1 vectors to compute yⁱ.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/S0168202499800022

Geometric Fundamentals

Wolfgang Boehm , Hartmut Prautzsch , in Handbook of Computer Aided Geometric Design, 2002

2.3.2 Gram-Schmidt orthogonalization

A Cartesian system [a ₀, b ₁… b _d] of a subspace or the Euclidean space itself can easily be constructed from an affine system [a ₀, v ₁… v _d] in E ⁿ with Gram-Schmidt's orthogonalization by alternating computation of the coefficients λ_i,jand μ_ias follows:

Set b ₁ = μ₁ v ₁such that ‖b ₁‖ = 1.

Set b ₂ = μ₂(v ₂+ λ_2,1 b ₁) such that b ₂is orthogonal to b ₁and ‖b ₂‖ = 1.

…

Set $b_{d} = μ_{d} (v_{d} + λ_{d, 1} b_{1} + \dots + λ_{d, d - 1} b_{d - 1}),$ ,such that b _dis orthogonal to b ₁, …, b _{d −1}and ‖b _d‖ = 1.

Note that in a Cartesian system the dot product is written as u⋅ v = u ^t v.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780444511041500034

Processing, Analyzing and Learning of Images, Shapes, and Forms: Part 2

Yu Wang , Justin Solomon , in Handbook of Numerical Analysis, 2019

3.1.3 Eigenvalue problem

The eigenvalue problem associated to operator $A : H \to H$ is defined as follows:

(6) $\begin{array}{l} A ϕ = λ ϕ, \end{array}$

where $λ \in R$ is known as an eigenvalue and ϕ(⋅) is its corresponding eigenfunction. The spectral theorem states that in the most common case, namely when $A$ is a compact self-adjoint operator and $H$ is a separable Hilbert space (Zhu, 2007), there are countably many eigenvalues and corresponding eigenfunctions. We mainly consider this case in our survey, and hence we use ${λ_{i}}_{i = 0}^{\infty}$ and ${ϕ_{i} (x)}_{i = 0}^{\infty}$ to denote the sets of eigenvalues and corresponding eigenfunctions of $A$ , respectively, sorted in ascending order such that λ ₀ ≤ λ ₁ ≤ λ ₂ ≤⋯.

Known as the Courant–Fischer min-max theorem, the strong form (6) can be converted to an equivalent weak form by finding saddle points of the optimization problem

(7) $\begin{array}{l} \{\begin{matrix} min_{ϕ (\cdot)} & a (ϕ, ϕ) \\ s.t. & {〈 ϕ, ϕ 〉}_{M} = 1 . \end{matrix} \end{array}$

Assuming a(⋅, ⋅) is symmetric, we can follow the convention that ${ϕ_{i} (x)}_{i = 0}^{\infty}$ are orthonormal: Eigenfunctions corresponding to different λ's must be orthogonal, and applying Gram–Schmidt orthogonalization to eigenfunctions with the same λ, followed by normalization, ensures that ${ϕ_{i} (x)}_{i = 0}^{\infty}$ are orthonormal. A consequence of the spectral theorem, for many choices of operators $A$ , is that the ϕ _i's form a complete orthonormal basis ^a ; in classical mathematics, the completeness of the Laplacian is a consequence of the Sturm–Liouville decomposition (Chavel, 1984; Rosenberg, 1997). Laplacian eigenfunctions are also known as manifold harmonics. When the surface is a sphere, the Laplacian eigenfunctions are called spherical harmonics.

The spectrum of an operator, ${λ_{i}}_{i = 0}^{\infty}$ , is the generalization of eigenvalues of a matrix. This spectral decomposition of $A$ , as we will see later, extracts information about $M$ , from large- to small-scale.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/S1570865919300250

Recursive Methods for Electronic States

R. Haydock , in Encyclopedia of Condensed Matter Physics, 2005

Recursive Approaches

A wide variety of recursive methods are used to calculate electronic properties of systems to which band-structure methods do not apply due to lack of periodicity, for example surfaces or other defects and disordered systems. These methods range from Gram–Schmidt orthogonalization, to the method of moments, modified moments, orthogonal polynomial expansions such as Chebyshev expansions, the Lanczos method, including the conjugate gradient method, and the recursion method (also related to the Mori projection method). In their simplest applications, recursive methods determine densities of states from path counting, for example, a tight-binding s-band on a square lattice. The common ingredients of recursive methods are: they start with some initial state of physical significance and apply the Hamiltonian repeatedly to it generating a sequence of states, which span the time evolution of the initial state; and they use the normalizations and overlaps between states in the sequence to construct expansions, such as continued fraction expansions of electronic Green functions, from which physical properties are calculated. Some of the advantages of recursive approaches are efficient computing because they are usually formulated in position space making Hamiltonians sparse, and the well-developed mathematical theory of moments. One of the main challenges for these methods is obtaining reliable information about singularities in the densities of states for macroscopic systems.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B0123694019011384

KRYLOV-LANCZOS METHODS

R.R. CraigJr., in Encyclopedia of Vibration, 2001

Algorithm 1: undamped structure; single load vector

1.

Starting vector: As the starting vector, select the static deflection of the structure due to the load distribution vector, f. That is, solve the equation:

(a) $K q_{1} = f$

for the static deflection q ₁. Mass normalize this to form the starting Lanczos vector:

(b) $ψ_{L 1} = \frac{1}{β_{1}} q_{1}$

where the normalizing factor β ₁ is determined by:

2.

Second Lanczos vector: The second Lanczos vector is obtained by first solving for the static deflection of the structure subjected to inertia loading due to the first vector's deflection. In addition, there is a Gram–Schmidt orthogonalization step that removes the starting vector component. First, solve the equation:

(d) $K {\tilde{q}}_{2} = M ψ_{L 1}$

for the static deflection

{\tilde{q}}_{2}

. Then, use the Gram–Schmidt procedure to remove the ψ _L1-component of this iterate:

(e) $q_{2} = {\tilde{q}}_{2} - α_{1} ψ_{L 1}$

where:

(f) $α_{1} = ψ_{L 1}^{T} M {\tilde{q}}_{2}$

Finally, mass normalize the vector q ₂ to form the second Lanczos vector:

(g) $ψ_{L 2} = \frac{1}{β_{2}} q_{2}$

where the normalizing factor β ₂ is determined by:

(h) $β_{2} = \sqrt{q_{2}^{T} M q_{2}}$

3.

General Lanczos vector: The general Lanczos vector, ψ _Lj, j=3, 4, … is obtained by the following steps. First, solve the equation:

(i) $K {\tilde{q}}_{(j + 1)} = M ψ_{L j}$

for the static deflection

{\tilde{q}}_{(j + 1)}

. Use the Gram–Schmidt procedure to remove both the ψ _Lj-component and the ψ _L(j−1)-component of this iterate:

(j) $q_{(j + 1)} = {\tilde{q}}_{(j + 1)} - α_{j} ψ_{L j} = β_{j} ψ_{L (j - 1)}$

where:

(k) $α_{j} = ψ_{L j}^{T} M {\tilde{q}}_{(j + 1)}$

and:

(l) $β_{j} = ψ_{L (j - 1)}^{T} M {\tilde{q}}_{(j + 1)}$

which can be shown to be just the preceeding normalizing factor. Finally, mass normalize the vector q _(j·+1) to form the (j+1)-st Lanczos vector:

(m) $ψ_{L (j + 1)} = \frac{1}{β_{(j + 1)}} q_{(j + 1)}$

where the normalizing factor β _(j+1) is determined by:

(n) $β_{(j + 1)} = \sqrt{q_{(j + 1)}^{T} M q_{(j + 1)}}$

Figure 3 shows the four Lanczos modes for the cantilever beam shown in Figure 1. The starting vector for these is the same starting vector that was used for the set of Krylov modes.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B0122270851000114

Topics in Multivariate Approximation and Interpolation

Tomas Sauer , in Studies in Computational Mathematics, 2006

4.1 The Newton approach and the finite difference

The first step towards the error formula was originally motivated by a very simple observation: the Vandermonde matrix

$[ξ^{α} : ξ \in Ξ, | α | \leq n]$

with respect to the canonical monomial basis of Π _n indexes the nodes and the basis elements in two different ways. So why not re–index the nodes as Ξ = {ξ _α : |α| ≤ n}? Clearly, this is always possible in many ways, so that we make the additional requirement that the interpolation problem with respect to the node subsets Ξ _k :={ξ _α : |α| ≤ k} ⊆ Ξ, k = 0,…,n, is poised for Π _k . The "dual" polynomials for this arrangement of nodes are the Newton fundamental polynomials p_α , |α| ≤ n, defined by the requirement that

(6) $p_{α} \in Π_{|α|}, p_{α} (ξ_{β}) = δ_{α, β}, | β | \leq | α | \leq n .$

The Newton basis P = {p_α : |α| ≤ n} of Π _n has the property that the matrix [p_β (ξ_α ) : |α|, |β| ≤ n] is block upper triangular with identity matrices on the diagonal. Moreover, the polynomials in (6), which generalize the ones from (5), can be computed as a by–product of the indexing process of Ξ, cf. [6,72], either by means of Gauss elimination or a Gram–Schmidt orthogonalization process; mathematically, both approaches are equivalent, but they differ in implementation details. Even the polynomials p_α do not depend uniquely on Ξ: in the generic case, even any ξ ∊ Ξ can be chosen as ξ_α and the polynomials have to adapt accordingly. This is the point where some subtle problems arise with Hermite interpolation: for a reasonable generalization of the Newton basis it would be necessary to index Θ as {θ_α : |α| ≤ n} in such a way that, for k = 0,…, n,

(i): the interpolation problem with respect to Θ _k is poised for Π _k and
(ii): ker Θ _k is also an ideal.

It is not difficult to ensure each of these properties separately, which corresponds to row or column permutations in Gaussian elimination, but how to satisfy them simultaneously is a tricky issue. Recall that the above conditions permit "putting the interpolation problem into block" [73], hold trivially for Lagrange interpolation problems (since condition ii is always satisfied then) and were essential for extending the Newton approach from Lagrange to Hermite interpolation problems in [73]. At the present it is not clear whether or not this can be done for a general set of functionals, but evidence fortunately is more on the optimist's than on the pessimist's side.

Conjecture 7

Any finite set Θ ⊂ Π′ of linear functionals which admits a poised ideal interpolation problem for Π _n can be graded as Θ₀ ⊂ Θ₁ ⊂ · · · ⊂ Θ _n = Θ such that for k = 0,…, n the interpolation with respect to Θ _k is poised for Π _k and ker Θ _k is an ideal in Π.

As shown in [72], the interpolant of degree n to f can now be written in terms of the Newton basis as

(7) $L_{n} f = \sum_{| α | \leq n} λ [Ξ_{| α | - 1}, ξ_{α}] f . p_{α},$

where the finite differences λ[Ξ _k , x:] f satisfy the recurrence relation

(8) $λ [x] f = f (x),$

(9) $λ [Ξ_{k}, x] f = λ [Ξ_{k - 1}, x] - \sum_{| α | = k} λ [Ξ_{k - 1}, ξ_{α}] f . p_{α} (x), k = 0, \dots, n,$

and

(10) $λ [Ξ_{k}, x] = (f - L_{k} f) (x), x \in ℝ^{d}$

property of simplex splines were derived, cf. [51], one uses as "directions" differences of successive nodes, and so we specialize de Boor's divided difference to

$[T] f : = [T; T D] f, D = [\begin{array}{c} - 1 \\ 1 & ⋱ \\ ⋱ & - 1 \\ 1 \end{array}] \in ℝ^{n + 1 \times n},$

which will become one essential building block for the remainder formula. The other is the concept of a path of length n which is a vector μ = (μ ⁰,…, μⁿ ) of multiindices such that |μ^k | = k, k = 0,…, n. The set of all paths of length n will be denoted by M _n . Associated to a path μ ∊ M _n and a properly indexed set Ξ = {ξ_α : |α| ≤ n} we obtain a matrix Ξ _μ := [ξ_μ ^j : j = 0,…, n] of sites visited along the path as well as the number

$π_{μ} : = \prod_{j = 0}^{n - 1} p_{μ^{j}} (ξ_{μ^{j + 1}}) .$

With this notation at hand, the error formula from [72] takes the convenient form

(12) $(f - L_{n} f) (x) = \sum_{μ \in M_{n}} p_{μ^{n}} (x) π_{μ} [Ξ_{μ}, x] f .$

This formula illustrates how significantly things change when passing from univariate to multivariate polynomial interpolation. In one variable, there is precisely one path and (12) is the well–known error formula with B–splines as Peano kernels that can be encountered in many textbooks as for example in [25,36]. The multivariate version of this formula, however, contains

$\prod_{j = 0}^{n} (\begin{array}{c} j + d - 1 \\ d - 1 \end{array})$

terms in the summation which already becomes (n + 1)! for d = 2 and grows much faster for higher values of d. In particular, these numbers by far exceed even the "dimension curse" factor dⁿ which is widely accepted as an unavoidable growth rate in multivariate problems. But the number of terms in (12) appears even more peculiar when comparing it to another formula for the error of multivariate polynomial interpolation which is due to Ciarlet and Raviart [23] and uses the Lagrange fundamental polynomials ℓ_α , |α| ≤ n, to express the error as

(13) $(f - L_{n} f) (x) = \sum_{| α | \leq n} ℓ_{α} (x) [ξ_{α}^{n}, x; {(x - ξ_{α})}^{n}] f,$

where exponentiation indicates n–fold repetition of a column in the respective matrix. This formula, based on a multipoint Taylor expansion, has become an important tool in the analysis of finite elements and it has the apparent advantage that the number of terms in the sum equals the number of interpolation sites, a number much smaller than the number of paths – except, of course, in the univariate case where it becomes a formula due to Kowalewski, [37, eq.(24), p. 24], but not the "standard" error formula. In this respect, (13) is no (direct) generalization of the univariate formula, while (12) definitely is. Another fundamental difference is that the simplex spline integrals in (12) normally run over nondegenerate convex sets in ℝ^d, while the one in (13) only integrates along lines. The surprising fact that these two expressions nevertheless describe precisely the same quantity, namely the error of interpolation, shows that there is still quite a bit of magic going on among simplex splines.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/S1570579X06800091

Computer Solution of Large Linear Systems

In Studies in Mathematics and Its Applications, 1999

6.5 The Lanczos Algorithm

Almost as the same time Hestenes and Stiefel developed the CG algorithm [772], Cornelius Lanczos introduced the method that now bears his name (see Lanczos [879]). This method is well known for computing a few of the extreme eigenvalues of sparse matrices. However, the Lanczos method can also be used to solve linear systems and, as it turns out, CG is nothing else than a particular case of the Lanczos method. This algorithm is explained in full details in Parlett's book [1062]. We shall mainly follow the exposition of Simon [1172]. In this section for the sake of simplicity, we do not consider preconditioning (M = I).

A being a symmetric matrix, we consider Krylov spaces

$K_{k} (A, b) = span (b, A b, \dots, A^{k - 1} b) \cdot$

Generally, K_k (A, b) is of dimension $k, {b, A b, \dots, A^{k - 1} b}$ being a basis. Unfortunately, this basis is not well conditioned and it is much better to construct an orthonormal basis of K_k(A, b). We can use the Gram–Schmidt orthogonalization method to achieve this goal. Suppose we already have orthogonal vectors ${q^{1}, \dots, q^{k}}$ as a basis for K_k(A, b), then we have to orthogonalize A^kb against q ¹, … q^k. This is the same as orthogonalizing Aq^k against q ¹,…, q^k. It turns out that Aq^k is already orthogonal to $q^{1}, \dots, q^{k - 2}$ because of the symmetry of A. We simply have to orthogonalize Aq^k against q^k−1 and q^k. Let us denote

${\bar{q}}^{k} = A q^{k} - δ_{k} q^{k} - η_{k} q^{k - 1},$

with $δ_{k} = (q^{k}, A q^{k}), η_{k} = (q^{k - 1}, A q^{k})$ . To obtain $q^{k + 1}$ the vector ${\bar{q}}^{k}$ has to be normalized and it turns out that $| | {\bar{q}}^{k} | | = η_{k + 1}$ . Let

${\bar{T}}_{k} = (\begin{matrix} δ_{1} & η_{2} \\ η_{2} & δ_{2} & η_{3} \\ ⋱ & ⋱ & ⋱ \\ η_{k - 1} & δ_{k - 1} & η_{k} \\ η_{k} & δ_{k} \end{matrix}),$

then, denoting $Q_{k} = [q^{1}, \dots, q^{k}]$ , the relation defining the vectors q^k can be written in matrix form as

$A Q_{k} - Q_{k} {\bar{T}}_{k} = η_{k + 1} q^{k + 1} {(e^{k})}^{T} \cdot$

Note the similarity of this relation with the equation in lemma 6.23. This can also be written as

$A Q_{k} = Q_{k + 1} {\tilde{T}}_{k},$

where

${\tilde{T}}_{k} = (\begin{matrix} {\tilde{T}}_{k} \\ η_{k + 1} {(e^{k})}^{T} \end{matrix}),$

is a (k + 1) × k upper Hessenberg matrix. Of course, we have $Q_{k}^{T} A Q_{k} = {\tilde{T}}_{k} . {\bar{T}}_{k}$ is the orthogonal projection of A onto K_k (A, b). Therefore, it is natural to compute an approximation of the solution in $K_{k} (A, b) a s x^{k} = Q_{k} {\bar{T}}_{k}^{- 1} Q_{k}^{T} b$ . That is, we project the right hand side b onto K_k (A, b), we solve in K_k (A, b) with the projection of A and we take the solution back to the original space. Note that to compute the approximation we need all the previous basis vectors. However, we do not have to compute x^k at each iteration because we note that $Q_{k}^{T} b = η_{1} e^{1}$ where e ¹ is the first column of the identity matrix. Therefore ${\bar{T}}_{k}^{- 1} Q_{k}^{T} b$ is ε₁ times the first column of the inverse of the tridiagonal matrix T_k and the residual is

$r^{k} = b - A x^{k} = b - A Q_{k} {\bar{T}}_{k}^{- 1} Q_{k}^{T} b = - η_{k + 1} ϕ_{k} q^{k + 1},$

where ϕ _k is the kth (last) component of ${\bar{T}}_{k}^{- 1} Q_{k}^{T} b$ (that is η₁ times the last component of the first column of the inverse of ${\bar{T}}_{k}$ ). This implies that

$‖ r^{k} ‖ = η_{k + 1} | ϕ_{k} | \cdot$

This gives a handy way of computing the norm of the residual without even computing the vectors x^k. We remark that the vector x^k has its residual r^k orthogonal to K_k. Let us now consider the connection with CG. We write the relation $A Q_{k} - Q_{k} {\bar{T}}_{k} = η_{k + 1} q^{k + 1} {(e^{k})}^{T}$ as

$A Q_{k} - Q_{k} {\bar{T}}_{k} = G_{k},$

where the matrix G_k has only the last column being non–zero and proportional to q^k+1 . Now, suppose that A is not only symmetric but also positive definite. Then, ${\bar{T}}_{k} = Q_{k}^{T} A Q_{k}$ is also positive definite and there exists a Cholesky factorization ${\bar{T}}_{k} = L_{k} D_{k} L_{k}^{T}$ where L_k is lower bidiagonal with ones on the diagonal and D_k is diagonal. Then, with a little algebra, we have

$A Q_{k} L_{k}^{- T} D_{k}^{- 1} - Q_{k} L_{k} = G_{k} L_{k}^{- T} D_{k}^{- 1} \cdot$

We denote $P_{k} = Q_{k} L_{k}^{- T}$ . This can be computed as solving $P_{k} L_{k}^{T} = Q_{k}$ . With this notation we write

$A P_{k} D_{k}^{- 1} - Q_{k} L_{k} = G_{k} L_{k}^{- T} D_{k}^{- 1} \cdot$

The matrix $L_{k}^{- T}$ is an upper triangular matrix with ones on the diagonal, therefore $G_{k} L_{k}^{- T} = G_{k}$ . If we write the last column of the previous relation, we see that since r^k is a scalar multiple of q^k+1 , it is a linear combination of r^k−1 and Ap^k (which means that x^k is a linear combination of p^k−1 and p^k ). Moreover, writing $L_{k} P_{k}^{T} = Q_{k}^{T}$ , we see that p^k is a linear combination of p^k−1 and r^k. Therefore, up to a scaling, the Lanczos algorithm is identical to CG (without preconditioning). On a more careful examination, we can see that we have the following relationship. In CG we have

$\begin{array}{l} A r^{k} & = \frac{1}{γ_{k}} (r^{k} - r^{k + 1}) - \frac{β_{k}}{γ_{k - 1}} (r^{k - 1} - r^{k}) \\ = - \frac{1}{γ_{k}} r^{k + 1} + (\frac{1}{γ_{k}} + \frac{β_{k}}{γ_{k - 1}}) r^{k} - \frac{β_{k}}{γ_{k - 1}} r^{k - 1} \cdot \end{array}$

As $β_{k}^{2} = | | r^{k} | | / | | r^{k - 1} | |$ , we divide by the norm of r^k to get

$A q^{k + 1} = \frac{\sqrt{β_{k + 1}}}{γ_{k}} q^{k + 2} + (\frac{1}{γ_{k}} + \frac{β_{k}}{γ_{k - 1}}) q^{k + 1} + \frac{\sqrt{β_{k}}}{γ_{k - 1}} q^{k},$

where

$q^{k + 1} = {(- 1)}^{k} \frac{r^{k}}{‖ r^{k} ‖} \cdot$

This shows that

$\begin{array}{l} \begin{matrix} δ_{k} = \frac{1}{γ_{k - 1}} + \frac{β_{k - 1}}{γ_{k - 2}}, & β_{0} = 0, & γ_{- 1} = 1, \end{matrix} \\ η_{k} = \frac{\sqrt{β_{k}}}{γ_{k - 1}} \cdot \end{array}$

This relationship of the two methods shows why (in principle) we cannot use CG for indefinite matrices. In that case, it may happen that the Cholesky factorization of ${\bar{T}}_{k}$ does not exists. We shall see how to deal with this problem in the following sections.

There is also another interpretation of the Lanczos algorithm. Let B be the matrix whose columns are A^jb, j=1,…, n-1. As Lanczos is nothing other than orthogonalizing the columns of B, we can recursively compute the columns of Q_n from those of B. Hence, we can write $Q_{n} = B L^{- T} Π^{- 1}$ where L is lower triangular with ones on the diagonal and II is a diagonal matrix whose diagonal entries are η₁,η₁,η₂, … Then, we can write

$B^{T} B = L Π^{2} L^{T} \cdot$

It is easy to see that ${(B^{T} B)}_{i, j} = (b, A^{i + j} b)$ . Therefore, B^TB is the matrix of the moments of A and the Lanczos algorithm is nothing other than the Cholesky factorization of the matrix of moments in disguise.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/S0168202499800071

Constructions of Orthogonal and Biorthogonal Scaling Functions and Multiwavelets Using Fractal Interpolation Surfaces

Bruce Kessler , in Advances in Imaging and Electron Physics, 2002

I INTRODUCTION

Founer analysis, the decomposition of a signal into the different-frequency sine and cosine waves necessary to build the signal, has been a standard tool in signal procasmg This approach is particularly useful when analog sound signals are being analyzed. Sounds of a particular frequency can be identified in the signal and then adjusted or even removed from the signal. However, when digital images are being analyzed, standard Fourier analysis has some distinct weaknesses:

●: Digitized images frequently have a number of sharp edges, whereas sound signals are typically smooth and wavy. Rapid changes in the data are reflected in a greater range of frequencies detected in the Fourier analysis of the signal and a larger number of nonzero Fourier coefficients.
●: Because the sine and cosine waves used in Fourier analysis have global support, changing or omitting a Fourier coefficient will cause a change in the entire image. Also, although Fourier analysis can detect the presence and size of sharp changes in the image, it cannot identify where they are located.

The introduction of wavelet theory has helped to address these weaknesses.

In a wavelet analysis, the sine and cosine waves of Fourier analysis are replaced with a set of compactly supported functions whose translates and dilates form a complete orthonormal system. Frequencies are determined by applying the bases at different resolutions. With bases of compact support, a nonzero basis coefficient gives an indication of both the presence and the size of a shaip change in the signal, as well as an idea of where the change took place. In addition, the basis being used can be chosen to best suit the type of signal being analyzed and the particular goals of the analysis. A great imroduction to wavelets, with a comparison and contrast of Fourier analysis and wavelet analysis, can be found in Hubbard (1998).

The majority of work on wavelets has involved the use of a single analysis function defined over a one-dimensional domain. (The most notable of these is Daubechies' D4 scalingfunction. See Daubechies, 1992, for complete details. For other constructions, see Donovan et al.,1996c, Hardin et. al., 1992, and Strang and Strela, 1994.) By using tensor products, researchers can easily adapt bases of this type to image data defined over two-dimensional domains. Useful functions φ ₁(x) and φ ₂(x) can be used to construct a useful function φ(x, y) by defining φ(x, y) = φ ₁(x)φ ₂(y). Such bases are said to be separable.

Many researchers have replaced the single scaling function with a set of functions, which allows greater freedom in the basis design. (A notable example of such a construction is the GHM (Geronimo–Hardin–Massopust) scaling vector. See Geronimo et al.,1994, for complete details.) Also, the condition that the bases be orthogonal has been relaxed. For instance, Hardin and Marasovich (1999) built bioithogonal counteiparls to the GHM scaling functions. Likewise, a separable biorthogonal basis is being used by the U.S. govemment's Federal Bureau of Investigation to compress images of fingerprints and is a pail of the new Joint Photographic Experts Group (JPEG) standard. See Daubechies (1992) for a discussion of the role of orthogonality with a single scaling function.

This article outlines the work of Donovan, Geronimo, Hardin, and the author in constructing nonseparable (i.e., not separable) orthogonal and biorthogonal scaling vectors by using well-developed theory in fractal inteipolation surfaces. (For other approaches in consmcting nonseparable bases, see Belogay and Wang, 1999, and Donovan et. al.,2000.) Separable bases are easy to apply (as long as the data are rectangular) but favor horizontal and vertical changes in the data, whereas nonseparable bases may not. Also, the bases constructed in this article can be adapted to arbitrary triangulations (the subject of an upcoming paper by Hardin and the author), which may be better suited to some data sets and applications. The author is hopeful that research in this area will lead to even more useful bases for the analysis of digitized images.

A Notation and Definitions

Let ∈ ₁ and ∈ ₂ be linearly independent vectors in R ² and let us define ∈ ₀ := (0, 0). Let $T$ be the three-directional mesh with directions ∈ ₁, ∈ ₂, and ∈ ₂ − ∈ ₁. Let us define $Δ_{0} \in T$ as the triangular region with vertices ∈ ₀, ∈ ₁, and ∈ ₂, and $\nabla_{0} \in T$ as the triangular region with vertices ∈ ₁, ∈ ₂, and ∈ ₁ + ∈ ₂. Let us also define the translation function t_i,j (x) := x − i∈ ₁ − j∈ ₂ and the dilation d_i,j (x) := Nx − i∈ ₁ − j∈ ₂ for some fixed integer dilation N > 1 . Furthermore, let us define the affine reflection function r : ∇₀ → Δ₀ which maps the vertices ∈ ₁, ∈ ₁, and ∈ ₁ + ∈ ₂ to vertices ∈ ₂,∈ ₀, and ∈ ₁, respectively. The notation $\overset{⌣}{f} : = f \circ r$ is used for any f supported in Δ₀.

Definition I.1 A multiresolution analysis (MRA) of L ²(R ²) of multiplicity r is a set of closed linear subspaces such that

1.: ⋯ ⊂ V _{− 2} ⊂ V _{− 1} ⊂ V ₀ ⊂ V ₁ ⊂ V ₂ ⊂ ⋯
2.: $\cap_{j \in Z} V_{j} = {0}$
3.: $\bar{\cup_{j \in Z} V_{j}} = L^{2} (R^{2})$
4.: $f \in V_{j} \Leftrightarrow f (N^{- j} \cdot) \in V_{0}, j \in Z$
5.: There exists a set of functions {φ ¹, φ ²,...,φ^r } such that {φ^k ○ t_i : k = 1,..., r, i ∈ Z ²} forms a Riesz basis of V ₀.

The r vector Φ := (φ ¹, φ ²,...,φ^r )^T is referred to as a scaling vector and the individual φ^k as scaling functions.

Conditions 1, 4, and 5 imply that a scaling vector Φ with compactly supported φ^k satisfies the dilation equation

(1) $Φ (x) = N \sum_{i \in Z^{2}} g_{i} Φ \circ d_{i}$

for a finite number of r × r constant matrices g _i

Definition I.2 A vector Φ of r linearly independent functions on R ² is refinable at dilation N if it satisfies Eq. (1) for some sequence of r × r constant matrices g _i.

A simple example of an MRA of L ²(R ²) over the mesh $T$ is constructed by defining the "hat" function h as the piecewise linear function that satisfies h(i∈ ₁ + j∈ ₂) = δ _0,i δ _0,j and letting Φ = {h}. Using the notation

$S (H) : = {clos}_{L^{2}} span {f \circ t_{i} : i \in Z^{2}, f \in H} for H \subset L^{2} (R^{2})$

let us then define V ₀ := S(Φ). It is easily verified that the scaling vector is relinable for any integer dilation N > 1, and that (V_Ρ ) is an MRA, where $V_{Ρ} : = S (Φ (N^{Ρ} \cdot))$ .

For function vectors Γ and Λ with elements in L ²(R ²), let us define

$〈 Γ, Λ 〉 = \int_{R^{2}} Γ (x) Λ {(x)}^{T} d x$

Definition I.3 If 〈 Φ, Φ ∘ t_i,j 〉 = δ _0,i,δ _0,j I, then let us say that Φ is an orthogonal scaling vector. If the φ^k are compactly supported, then the MRA generated by Φ is said to be orthogonal.

Let us define W_n to be the orthogonal complement of V_n in V_n+1 , so that

$V_{n + 1} = V_{n} \oplus W_{n} for n \in Z$

The W_n , referred to as wavelet spaces, are necessarily pairwise orthogonal and are spanned by the orthogonal dilations and translations of a set of functions {ψ ¹, ψ ²,...,ψ^t }, referred to as wavelets, that satisfy the equation

(2) $Ψ (x) = N \sum_{i \in Z^{2}} h_{i} Φ \circ d_{i}$

for some t × r constant matrices h_i , where Ψ is the t-vector (ψ ¹, ψ ²,...,ψ^t )^T, called a multiwavelet.

Definition I.4 A pair of n-dimensional function vectors Φ and $\tilde{Φ}$ is said to be biorthogonal if

$〈 Φ, \tilde{Φ} \circ t_{i, j} 〉 = δ_{0 . i} δ_{0, j} I i, j \in Z$

A necessary and sufficient condition for the construction of biorthogonal vectors was given in Hardin and Marasovich (1999) and is stated next without proof.

Lemma I.1 Suppose U and W are m-dimensional subspaces of R ⁿ. There exist dual (biorthogonal) bases for U and W if and only if $U \cap W^{⊥} = {0}$ .

If the criteria of Lemma I.1 are met, then the Gram–Schmidt orthogonalization process can be modified to provide biorthogonal sets in the following fashion:

1.

Consider the two sets {x ₁,...,x_n } and {y ₁,...,y_n } wheie 〈 x_i , y_j 〉 ≠ 0, i = 1,...,n. Let u ₁ = x ₁ and v ₁ = y ₁

2.

Let

$u_{i} = x_{i} - \sum_{j = 1}^{i - 1} \frac{〈 x_{i}, v_{j} 〉}{〈 u_{j}, v_{j} 〉} u_{j} and v_{i} = y_{i} - \sum_{j = 1}^{i - 1} \frac{〈 y_{i}, u_{j} 〉}{〈 u_{j}, v_{j} 〉} v_{j} for i = 2, ... n$

3.

Let

$z_{i} = u_{i} and {\tilde{z}}_{i} = \frac{v_{i}}{〈 u_{i}, v_{i}}} for i = 1, ..., n$

Let us suppose that X and Y are biorthogonal function vectors. Then let us define the projection operator $P_{X}^{Y}$ such that ker $P_{X}^{Y} = Y^{⊥}$ and range $P_{X}^{Y} = X$ . If $X : = S (X)$ and $Y : = S (Y)$ are finite shift-invariant spaces, then

$P_{X}^{Y} f : = \sum_{j \in Z^{2}} \sum_{i = 1}^{n} \frac{〈 f, y_{i} \circ t_{j} 〉}{〈 x_{i}, y_{i} 〉} x_{i} \circ t_{j}$

where x_i ∈ X and y_i ∈ Y.

B Fractal Interpolation Surfaces

The construction of fractal interpolation surfaces is outlined in Geronimo and Hardin (1993) and Massopust (1990). See Bamsley (1988) for an introduction to fractals in general. The following is a brief introduction to fractal interpolation surfaces.

Let D be a closed triangular region in R ² and let ${q_{n}}_{n = 1}^{r}$ be a set of points in D such that q ₁, q ₂, and q ₃ are the vertices of D. Let ${Δ_{i}}_{i = 1}^{N}$ be a triangulation of {q_n } such that the graph has chromatic number 3. (The chromatic number of a graph is the fewest number of symbols needed to cover the vertices of the graph so that any two adjacent venices have distinct symbols. It is important to note that not all triangulations have chromatic number 3.) Let us assign a symbol k(n) ∈ {1, 2, 3} to each of the q_n so that each subdomain Δ_i has three distinct symbols at its vertices.

Let {Z_n } be a set of real values associated with the {q_n }. There exists a unique mapping u_i : D → Δ_i for i = 1, 2,...,N of the form

(3) $u_{i} (x) = [\begin{matrix} a_{i} & b_{i} \\ c_{i} & d_{i} \end{matrix}] x + [\begin{matrix} m_{i} \\ n_{i} \end{matrix}]$

where a_i , b_i , c_i , d_i , m_i , and n_i are uniquely determined by

(4) $u_{i} (q_{k (n)}) = q_{n}$

for all vertices q_n of Δ_i. Also, let us define a mapping v_i : D × R → R for i = 1, 2,...,N of the form

(5) $v_{i} (x, z) = [e_{i} f_{i}] x + s_{i} z + p_{i}$

where |s_i | < 1 and where e_i , f_i , and p_i are uniquely determined by

(6) $v_{i} (q_{k (n)}, z_{k (n)}) = z_{n}$

for all vertices q_n of Δ_i.

Let C ₀(D) denote the space of continuous functions on R ² with support in D. Let us define a function Γ : C ₀(D) → C ₀(D) piecewise by

(7) $Γ (f) : = v_{i} (u_{i}^{- 1}, f \circ u_{i}^{- 1})$

for f ∈ C ₀(D) . Then the function Γ is contractive in the supremum norm with contractivity $| s | = \max_{i = 1}, ..., N | s_{i} |$ . By the contraction mapping theorem, there exists an f* ∈ C ₀(D) such that Γ(f*) = f*. This function interpolates the points (q_n , z_n ) and is refeired to as a fractal interpolation surface (FIS).

Example I.1 Let us define an FIS over the right triangle with vertices (0, 1), (0, 0), and (1, 0) and additional triangulation points $(0, \frac{1}{2}),$ $(\frac{1}{2}, \frac{1}{2})$ , and $(\frac{1}{2}, 0)$ . The triangulation and chromatic mappings used are shown in Figure 1. After the various unknowns are solved, progressively finer approximations ofthe FIS are drawn by repeatedly applying the union of the domain mappings, starting with the linear surface that interpolates the given data.

The FIS being approximated through successive iterations in Figure 2. interpolates the points (0, 1, 0), $(0, \frac{1}{2}, \frac{1}{4}),$ $(\frac{1}{2}, \frac{1}{2}, \frac{3}{4}),$ (0, 0, 0), $(\frac{1}{2}, 0, \frac{1}{2})$ , and $(1, 0, \frac{1}{4})$ and has vertical scaling $s_{i} = \frac{3}{5}$ for all i ∈ {1, 2, 3, 4}.

Under cerlain circumstances, the matchup conditions along the edges of the subdomains are more easily met and the construction of the FIS is greatly simplified. If the interpolation points along the boundary of D are coplanar, then the requirement that the triangulation have chromatic number 3, along with the requirement that the mappings u_i take vertices of D only to "appropriate"

vertices of Δ_i, may be dropped, and we may express the fixed point f* as

(8) $f^{*} (x) = Λ (x) + \sum s_{i} f^{*} \circ u_{i}^{- 1} (x)$

where Λ is the piecewise linear function defined by

(9) $Λ (x) = [e_{i} f_{i}] u_{i}^{- 1} (x) + p_{i} for x \in Δ_{i}$

Example I.2 This example shows an FIS constructed on the equilateral triangle (0, 0), $(\frac{1}{2}, \frac{\sqrt{3}}{2}$ , and (1, 0),with additional triangulation points $(\frac{1}{6}, \frac{\sqrt{3}}{6})$ , $(\frac{1}{3}, \frac{\sqrt{3}}{3})$ , $(\frac{1}{2}, \frac{\sqrt{3}}{10})$ , and $(\frac{3}{4}, \frac{\sqrt{3}}{4})$ .The triangulation is shown in Figure 3. Notice that the triangulation has a chromatic number of 4.

Let the surface be zero along the boundary and let us interpolate $(\frac{1}{2}, \frac{\sqrt{3}}{10}, \frac{1}{2})$ , with $s_{i} = \frac{1}{2}$ for i ∈ {1, 2, 3, 4, 5, 6}. The oriemation of the mappings u_i and v_i determine the resulting FIS, but not the continuity of the surface. Approximations to the FIS are shown in Figure 4.

Notice that if all the interpolation points (q_n , z_n ) are coplanar, then the resulting FIS is merely the plane containing the points over the domain D. Therefore, the hat function h defined in Section I.A is a union of six FISs.

C Main Results

The following is an extension of ideas which first appeared in Donovan et al. (1996a) and later in Donovan et al. (1996b). Let us define $h_{i} : = h (\cdot - \in_{i}) |_{Δ_{0}}$ and let C ₀(R ²) denote the bounded, continuous functions over R ². Then we have the following result.

Theorem I.1 Suppose there are function vectors $B : = {ω^{1}, ..., ω^{t}, {\overset{⌣}{ω}}^{1}, ..., {\overset{⌣}{w}}^{t}}$ and $\tilde{B} : = {{\tilde{w}}^{1}, ..., \tilde{w}', {\overset{⌣}{\tilde{w}}}^{1}, ... {\overset{⌣}{\tilde{w}}}^{t}}$ with functions in $C_{0} (R^{2}) \cap L^{2} (R^{2})$ such that

1.: B and $\tilde{B}$ are biorthogonal.
2.: B and $\tilde{B}$ each extend {h}.
3.: supp $(w^{i}),$ supp $({\tilde{w}}^{i}) \subseteq Δ_{0},$ i = 1,...,t.
4.: $(I - P_{B}^{\tilde{B}}) h_{i} ⊥ (I - P_{\tilde{B}}^{B}) h_{j},$ i ≠ j, i,j ∈ {0, 1, 2}.

Then there exist biorthogonal scaling vectors Φ and $\tilde{Φ}$ of length q := 2t + 1 such that V ₀ := S(Φ) and ${\tilde{V}}_{0} : = S (\tilde{Φ})$ each contain the piecewise linears on the mesh $T$ .

Proof. The main issue is finding compactly supported functions φⁱ and ${\tilde{ϕ}}^{j}$ that satisfy the biorthogonality conditions 〈 φⁱ , ${\tilde{ϕ}}^{j}$ 〉 = δ_i.j . Let us define the following:

$\begin{array}{l} ϕ^{i} : = w^{i} for i=1, \dots,t {\tilde{ϕ}}^{i} : = {\tilde{w}}^{i} for i=1, \dots,t \\ ϕ^{t + i} : = {\overset{\lor}{w}}^{i} for i=1, \dots,t {\tilde{ϕ}}^{t + i} : = {\overset{\lor}{\tilde{w}}}^{i} for i=1, \dots,t \\ ϕ^{q} : = \frac{1}{α} (I - P_{w}^{\tilde{w}}) h {\tilde{ϕ}}^{q} : = \frac{1}{β} (I - P_{\tilde{w}}^{w}) h \end{array}$

where α, β are constants such that $α β : = 〈 (I - P_{w}^{\tilde{w}}) h, (I - P_{\tilde{w}}^{w}) h 〉 .$ . Let Φ := (φ ¹,...,φ^q )^T and ${\tilde{Φ}}^{1} : = {({\tilde{ϕ}}^{1}, \dots, {\tilde{ϕ}}^{q})}^{T}$ . Then let us set $V_{p} : : = S (Φ (N^{p} \cdot))$ and ${\tilde{V}}_{p} : = S (\tilde{Φ} (N^{p} \cdot))$ .

Condition 1 of Theorem I.1 guarantees that

$\begin{array}{l} 〈 ϕ^{i}, {\tilde{ϕ}}^{j} 〉 = δ_{^{i, j}} for i,j = 1, \dots,t \\ 〈 ϕ^{i}, {\tilde{ϕ}}^{j} 〉 = δ_{^{i, j}} for i,j = t+1, \dots,2t \end{array}$

Condition 3 of Theorem I. 1 guarantees that

$\begin{array}{l} 〈 ϕ^{i}, {\tilde{ϕ}}^{j} 〉 = 0 for i,j = 1, \dots,t, j=t+1, \dots,2t \\ 〈 ϕ^{i}, {\tilde{ϕ}}^{j} 〉 = 0 for i,j = t+1, \dots,2t, j=1, \dots t \end{array}$

Condition 4 of Theorem I. 1 establishes the remaining orthogonality conditions:

$\begin{array}{l} 〈 ϕ^{q}, {\tilde{ϕ}}^{i} 〉 = 0 for i=1, \dots,2t \\ 〈 ϕ^{i}, {\tilde{ϕ}}^{q} 〉 = 0 for i=1, \dots,2t \end{array}$

Condition 2 of Theorem I.1 guarantees that both Φ and $\tilde{Φ}$ are refinable and that V_n ⊂ V_n+1 and ${\tilde{V}}_{n} \subset {\tilde{V}}_{n + 1}$ The requirements that $\cap_{j \in Z} V_{j} = 0,$ $\bar{U_{j \in Z} V_{j}} = L^{2} (R)$ , and $\bar{U_{j \in Z} {\tilde{V}}_{j}} = L^{2} (R)$ , and that the translates of Φ and $\tilde{Φ}$ form Reisz bases, are trivially met by compactly supported scaling vectors. Therefore, both (V_p ) and $({\tilde{V}}_{p})$ are MRA's. ■

Section III gives a detailed definition of the wavelet spaces W_f , ${\tilde{W}}_{f}$ , W_g , ${\tilde{W}}_{g}$ , W_h and ${\tilde{W}}_{h}$ . W_f and ${\tilde{W}}_{f}$ have generators supported on triangles, W_g and ${\tilde{W}}_{g}$ have generators supported on parallelograms, and W_h and ${\tilde{W}}_{h}$ have generators supported on hexagons. The main theorem on the construction of the q(N ² − 1) wavelets is stated next and proven in Section III.

Theorem I.2 Let (V_p ) and $({\tilde{V}}_{p})$ be biorthogonal MRA of multiplicity q in R ² constructed from Theorem I.1. Let us define W_f , ${\tilde{W}}_{f}$ , W_g , ${\tilde{W}}_{g}$ , W_h and ${\tilde{W}}_{h}$ , as previously. Then V₁ = V ₀ + W ₀ and ${\tilde{V}}_{1} = {\tilde{V}}_{0} + {\tilde{W}}_{0}$ where $W_{0} = W_{f} + W_{g} + W_{h}, {\tilde{W}}_{0} = {\tilde{W}}_{f} + {\tilde{W}}_{g} + {\tilde{W}}_{h}$ , and W ₀ and ${\tilde{W}}_{0}$ each have q(N ² − 1) generators.

In Section V, a useful prefilter for the orthogonaI scaling functions consstructed in Section II.B is presented. Examples are given that use the pretilter and bases for image compression and denoising.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/S1076567002800441

An improved Padé approximant in the ANM algorithm: Application to the post-buckling of shells

Rachida Ayane , ... Noureddine Damil , in Finite Elements in Analysis and Design, 2019

4 Conclusion

In this work, we have presented a numerical investigation of vectorial Padé approximants used in Ref. [10] for the computation of post-buckling of shells. In order to determine numerically the efficiency of these vectorial Padé approximants, we have computed the coefficients b _i by a Gram-Schmidt orthogonalization technique. According to this numerical study we found that the vectorial Padé approximant CPP[1, N − 1] is better than the others. Several numerical tests were carried out for showing the efficiency of this vectorial Padé approximant in this type of problems. From the obtained results, we have remarked that when we use this vectorial Padé approximant CPP[1, N − 1] we can have large step lengths compared to those of CS and CCP algorithms.

Read full article

URL:

https://www.sciencedirect.com/science/article/pii/S0168874X19302756

Motivations and realizations of Krylov subspace methods for large sparse linear systems

Zhong-Zhi Bai , in Journal of Computational and Applied Mathematics, 2015

2 The direct methods

In the Gaussian elimination method, we successively operate the Gauss transforms one a time on the expanded matrix $[A ∣ b]$ , and finally obtain the target matrix $[I ∣ x_{*}]$ , that is, $[A ∣ b] \to [I ∣ x_{*}]$ in symbolic, where $I$ is the identity matrix. This method can solve only one system a time, requiring approximately the storage $n^{2} + n$ ( $n^{2}$ for the coefficient matrix $A$ and $n$ for the right-hand side $b$ ) and the operations $\frac{2}{3} n^{3}$ . The methodology of the LU factorization is a little bit different from the Gaussian elimination, which first factorizes the coefficient matrix $A$ into the product of a lower-triangular matrix $L$ and an upper-triangular matrix $U$ , i.e., $A = L U$ , and then computes the exact solution $x_{*}$ of the linear system (1) through a forward elimination and a backward substitution. The LU factorization method can solve many linear systems having the same coefficient matrix $A$ but different right-hand sides $b$ a time.

For the QR factorization method, we first factorize the coefficient matrix $A$ into a product of an orthogonal matrix $Q$ and an upper-triangular matrix $R$ , obtaining $A = Q R$ , and then compute the exact solution $x_{*}$ of the linear system (1) through a backward substitution due to $R x = Q^{T} b$ . The QR factorization requires approximately the storage $n^{2} + n$ and the operations $2 n^{3}$ . Here and in the sequel, we use ${(\cdot)}^{T}$ to indicate the transpose of either a vector or a matrix.

Besides the differences in storage and operation mentioned above, it has been proved that the LU factorization exists only for strictly diagonal dominant matrices and symmetric positive definite matrices, but the QR factorization may exist for any matrix even for a rectangular one. Hence, the QR factorization can be employed to solve the linear least-squares problems via, e.g., the seminormal equation $R^{T} R x = A^{T} b$ ; see [14]. Moreover, if $A \in R^{n \times n}$ is sparse, then both $L$ and $U$ may be also sparse, but $Q$ and $R$ could be dense. Hence, each of these two factorizations has its pros and cons.

As we have known, Givens rotation, Householder reflection and Gram–Schmidt orthogonalization are three classical and typical tools for computing a QR factorization for a given matrix. Below we review the classical Gram–Schmidt orthogonalization process and its stabilized modification, in which the latter is the elementary ingredient of the Krylov subspace iteration methods.

Let

$A = [a_{1}, a_{2}, \dots, a_{n}], Q = [q_{1}, q_{2}, \dots, q_{n}]$

and

$R = [\begin{matrix} r_{11} & r_{12} & \dots & r_{1 n} \\ r_{22} & ⋮ \\ ⋱ & ⋮ \\ r_{n n} \end{matrix}],$

where $a_{i}$ and $q_{i}$ are the $i$ th columns of the matrices $A$ and $Q$ , respectively. Then $A = Q R$ or

$[a_{1}, a_{2}, \dots, a_{n}] = [q_{1}, q_{2}, \dots, q_{n}] [\begin{matrix} r_{11} & r_{12} & \dots & r_{1 n} \\ r_{22} & ⋮ \\ ⋱ & ⋮ \\ r_{n n} \end{matrix}]$

is equivalent to

${\begin{matrix} a_{1} = r_{11} q_{1}, \\ a_{2} = r_{12} q_{1} + r_{22} q_{2}, \\ a_{3} = r_{13} q_{1} + r_{23} q_{2} + r_{33} q_{3}, \\ \dots \\ a_{n} = r_{1 n} q_{1} + r_{2 n} q_{2} + \dots + r_{n n} q_{n}, \end{matrix}$

which straightforwardly results in the following orthogonalization process, called the classical Gram–Schmidt process.

The classical Gram–Schmidt process is numerically unstable. A stabilized modification, called the modified Gram–Schmidt process, is described in the following.

We remark that the modified Gram–Schmidt process is, philosophically speaking, an application of the idea of the Gauss–Seidel sweep used for iteratively solving linear systems.

Of course, besides the LU and the QR factorization methods stated above, the famous Gram rule gives the most beautiful analytic formula for the solution $x_{*}$ of the linear system (1). Precisely speaking, in terms of the determinants of the matrices $A$ and

$A_{j} = [a_{1}, \dots, a_{j - 1}, b, a_{j + 1}, \dots, a_{n}], j = 1, 2, \dots, n,$

the $j$ th element $x_{*}^{[j]}$ of $x_{*}$ is given by

(2) $x_{*}^{[j]} = \frac{det (A_{j})}{det (A)}, j = 1, 2, \dots, n,$

where $det (\cdot)$ denotes the determinant of the corresponding matrix. As is well known, the cost of this formula is tremendous like $O (n^{2} n!)$ , so it is practically prohibitive especially when the matrix $A$ is large and sparse. However, by making use of the LU or the QR factorization we propose here a practical implementation for the Gram rule. To this end, we only consider the general case that $A$ is nonsingular and nonsymmetric, as the special case that $A$ is symmetric positive definite can be treated analogously by utilizing the Cholesky factorization [5] of the matrix $A$ instead of LU or QR. Let $A = L U$ be the LU factorization, $e_{j} = {(0, \dots, 0, 1, 0, \dots, 0)}^{T}$ be the $j$ th unit basis vector in $R^{n}$ , and $v_{j} = e_{j} - A^{- 1} b$ . Then we have $A e_{j} = a_{j}$ and

$A_{j} = A - (a_{j} - b) e_{j}^{T} = A (I - v_{j} e_{j}^{T}) .$

$det (A_{j}) = det (A) \cdot det (I - v_{j} e_{j}^{T}) .$

Because

$(I - v_{j} e_{j}^{T}) v_{j} = (1 - e_{j}^{T} v_{j}) v_{j} = (e_{j}^{T} A^{- 1} b) v_{j}$

and

$(I - v_{j} e_{j}^{T}) e_{i} = e_{i}, for i \neq j,$

the eigenvalues of the matrix $I - v_{j} e_{j}^{T}$ are 1 with multiplicity $n - 1$ and $e_{j}^{T} A^{- 1} b$ , which implies that

$det (I - v_{j} e_{j}^{T}) = e_{j}^{T} A^{- 1} b .$

It follows immediately from (2) that

$x_{*}^{[j]} = det (I - v_{j} e_{j}^{T}) = e_{j}^{T} A^{- 1} b .$

As a result, we obtain the following procedure for computing $x_{*}$ .

This procedure is the same as the LU factorization method. It implements the Gram rule in $\frac{2}{3} n^{3}$ operations, in the same cost as that of either the LU or the Gaussian elimination. Alternatively, using the QR instead of the LU factorization of the matrix $A$ we can analogously obtain a corresponding practical implementation of the Gram rule, too.

Read full article

URL:

https://www.sciencedirect.com/science/article/pii/S0377042715000370

davishict1943.blogspot.com

Source: https://www.sciencedirect.com/topics/mathematics/gram-schmidt-orthogonalization

Gram Schmidt Orthogonalization for Continuous Functions