"The theorem guarantees convergence by averaging out whatever disturbing oscillations might occur in the ordinary Fourier approximation." - Introduction to Calculus and Analysis: Volume I by Richard Courant and Fritz John
Author's Commentary (Hong Kong SAR, 03-02-2025): I had originally intended to publish such a blog post on 02-02-2025, but the holiday spirit as a consequence of the Chinese New Year had negotiated such. Nonetheless, it was important to me to provide another round of editing, which, as is typical, had incurred additional hours of editing. Such a topic is one that I have great appreciation for, and is a topic that I was first introduced to, from the perspective of convolutions, via Stein and Shakarchi's Fourier Analysis: An Introduction. I still remember struggling through the first few pages of such a text, sitting in a classroom, where an English lesson as targeted at international students had droned on in the background. But, prior, it was Courant and John's Introduction to Calculus and Analysis: Volume I, that had introduced me to such topics, but from a less general perspective, without ever having used the term "convolution". Later, the persistent realization of the intersection of such material was exciting to say the least, and had made me feel a continued growth in one's own intellectual maturity.
0 Introduction
Consider that, in an earlier blog post titled Discussions on Some Desirable Properties of Measurable Sets and Functions in Real Analysis: The Littlewood Heuristics & Applications, we had suggested approximation methods via the use of simple functions, and via the use of a certain stochastic sampling method. In this blog post, we would like to provide yet another interpretation to approximation theory via the theory of convolutions - that an entirely different framework can be provided so as to allow generalized schemes of approximations in general. In fact, such an interpretation via the theory of convolutions is a little bit distinct from alternatives as approximations via interpolations/extrapolations, as approximations via function mimics utilizing such classes of functions as simple functions or step functions,$^{[1]}$ or as approximations via sampling. Approximations via convolutions demonstrate what are essentially notions of "filtering", that, in a certain sense, certain mathematical devices would continuously sift through and filter every part of a function in order to recover it, and it is the convolutions with respect to approximate identity kernels that provide the approximations.
1 An Explicit Example of Convolution Utilizing a Certain Kernel
Before we treat a certain general theory of convolutions, an example can be provided at least, one that is somewhat classical, that provides the first of such classes of examples. So consider,
Theorem 1.1 For a continuous real-valued function $f$ as defined on $[0, 1]$, there exists a sequence of functions $\{ P_n \}_{n = 1}^{\infty}$ which converge to it on $[\alpha, \beta]$ for $0 < \alpha < \beta < 1$, and such a sequence of functions is provided by,$^{[2]}$
$$P_n(x) = \frac{\int_{\alpha}^{\beta}f(u)\left[ 1 - (u - x)^2 \right]^n du}{\int_{-1}^{1} \left( 1 - u^2 \right)^n du}$$
In other words, the sequence of approximations is provided by convolutions with respect to the family of kernels $g_n(x) = \frac{\left( 1 - (-x)^2 \right)^n}{\int_{-1}^{1} \left( 1 - u^2 \right)^n du}$ on $[\alpha, \beta]$ in the sense of,$^{[3]}$
$$f \ast g_n = \frac{1}{\int_{-1}^{1} \left( 1 - u^2 \right)^n du} \int_{\alpha}^{\beta}f(u) \left[ 1 - \left( u - x \right)^2 \right]^n du$$
PROOF. Consider, for $u = v + x$,
$$\int_{\alpha}^{\beta} f(u) \left[ 1 - \left( u - x \right)^2 \right]^n du = \int_{\alpha - x}^{\beta - x} f(v + x) \left[ 1 - v^2 \right]^n dv$$
Then, since the kernels $\{ g_n \}_{n = 1}^{\infty}$ produce discontinuous singularities at the origin for $n \rightarrow \infty$,$^{[4]}$ we can thus provide our integral, with respect to $v$, over the intervals $[\alpha - x, -\delta]$, $[-\delta, \delta]$, and $[\delta, \beta - x]$. We can denote the integral above over the respective closed intervals as $I_{[\alpha - x, -\delta]}$, $I_{[-\delta, \delta]}$, and $I_{[\delta, \beta - x]}$. We can now provide an estimate of a difference in,
$$\left| f - f \ast g_n \right| = \left| f - \frac{I_{[\alpha - x, -\delta]}}{\int_{-1}^{1} \left( 1 - u^2 \right)^n du} - \frac{I_{[-\delta, \delta]}}{\int_{-1}^{1} \left( 1 - u^2 \right)^n du} - \frac{I_{[\delta, \beta - x]}}{\int_{-1}^{1} \left( 1 - u^2 \right)^n du} \right|$$
We first realize that $\frac{I_{[\alpha - x, -\delta]}}{\int_{-1}^{1} \left( 1 - u^2 \right)^n du}$ and $\frac{I_{[\delta, \beta - x]}}{\int_{-1}^{1} \left( 1 - u^2 \right)^n du}$ can be made arbitrarily small in the sense that,
$$\begin{align} & \left| \frac{I_{[\alpha - x, -\delta]}}{\int_{-1}^{1} \left( 1 - u^2 \right)^n du} \right| < M \frac{\int_{-1}^{-\delta} \left( 1 - v^2 \right)^n dv}{\int_{-1}^{1} \left( 1 - u^2 \right)^n du} \\ & \left| \frac{I_{[\delta, \beta - x]}}{\int_{-1}^{1} \left( 1 - u^2 \right)^n du} \right| < M \frac{\int_{\delta}^{1} \left( 1 - v^2 \right)^n dv}{\int_{-1}^{1} \left( 1 - u^2 \right)^n du} \end{align}$$
In the above, $M$ denotes a sufficiently large upper bound for $f$, where its existence follows from the extreme value theorem of elementary analysis. In making use of the integral mean value theorem of elementary calculus, we can then determine that, for instance,
$$\begin{align} \frac{\int_{\delta}^{1} \left( 1 - v^2 \right)^n dv}{\int_{-1}^{1} \left( 1 - u^2 \right)^n du} &< \frac{ \left( 1 - \delta^2 \right)^n (1 - \delta)}{\int_{-1}^{1} \left( 1 - u \right)^n du} \\ &< \frac{ \left( 1 - \delta^2 \right)^n}{\int_{0}^{1} \left( 1 - u \right)^n du} \\ &= \left( n + 1 \right) \left( 1 - \delta^2 \right)^n \end{align}$$
Where the above tends to 0 for $n \rightarrow \infty$ since $\left( 1 - \delta^2 \right)^n$ vanishes to a higher order of magnitude than $\left( n + 1 \right)$ grows to infinity.$^{[5]}$ Then, for $\frac{I_{[-\delta, \delta]}}{\int_{-1}^{1} \left( 1 - u^2 \right)^n du}$, consider the following approach for $f$ uniformly continuous on $[-\delta, \delta]$,
$$\begin{align} I_{[-\delta, \delta]} &= \int_{-\delta}^{\delta} f(x) \left( 1 - v^2 \right)^n dv + \int_{-\delta}^{\delta} \left[ f(v + x) - f(x) \right] \left( 1 - v^2 \right)^n dv \\ & < f(x) \left[ \int_{-1}^{1} \left( 1 - v^2 \right)^n dv - \int_{\delta \leq \left| v \right| \leq 1} \left( 1 - v^2 \right)^n dv \right] + \epsilon \int_{-\delta}^{\delta} \left(1 - v^2 \right)^n dv \end{align}$$
That is,
$$\begin{align} \frac{I_{[-\delta, \delta]}}{\int_{-1}^{1} \left( 1 - u^2 \right)^n du} & < \frac{f(x) \left[ \int_{-1}^{1} \left( 1 - v^2 \right)^n dv - \int_{\delta \leq \left| v \right| \leq 1} \left( 1 - v^2 \right)^n dv \right] + \epsilon \int_{-\delta}^{\delta} \left(1 - v^2 \right)^n dv}{\int_{-1}^{1} \left( 1 - u^2 \right)^n du} \\ &= f(x) - f(x) \frac{\int_{\delta \leq |v| \leq 1}\left( 1 - v^2 \right)^n dv}{\int_{-1}^{1}(1 - u^2)^n du} + \epsilon \frac{\int_{-\delta}^{\delta}(1 - v^2)^n dv}{\int_{-1}^{1}(1 - u^2)^n du} \end{align}$$
Then, we note that $\frac{\int_{\delta \leq |v| \leq 1}\left( 1 - v^2 \right)^n dv}{\int_{-1}^{1}(1 - u^2)^n du}$ can be made arbitrarily small as $\frac{\int_{\delta}^{1} \left( 1 - v^2 \right)^n dv}{\int_{-1}^{1} \left( 1 - u^2 \right)^n du}$ can be made arbitrarily small for sufficiently large $n$, and, we also have that $\frac{\int_{-\delta}^{\delta} \left(1 - v^2 \right)^n dv}{\int_{-1}^{1} \left( 1 - u^2 \right)^n du}$ is bounded.$^{[6]}$ Hence,
$$\lim_{n \rightarrow \infty} \left| f - f \ast g_n \right| = 0$$ $\square$
And so, a possible theory is not without precedent, or at least one existence case which allows one to consider more general possibilities. No doubt, there will be those who notice, perhaps relatively quickly, certain key properties and phenomena that may lend themselves to the consideration of more specialized topics, as in possibilities in topics of the theory of singular integrals and more.$^{[7]}$
It may be productive to comment that such kinds of results are not necessarily without a utilitarian aspect. For instance, it may just so happen that, in certain contexts, $f$ may not be given explicitly in closed form, rather, it may be convolutions with appropriate classes of kernels that allow one to gain further information and thus the potential recovery of apt approximate functions. It would also be somewhat irresponsible for me to fail to state that such kinds of mathematical methods have proliferated in many subfields of the sciences and engineering, from such classical subfields as those of electrical engineering,$^{[8]}$ to topics of communications engineering, machine learning, and so on and on. Almost surely, we intend to discuss applications of the theory of convolutions in certain topics of STEM more generally in future blog posts.
2 An Elementary Theory of Convolutions and the Approximation to the Identity
Nonetheless, a general theory can be provided in this section, a framework perhaps of greater simplicity regarding the theory of convolutions, and in the notion of "approximation to the identity" in particular. Recall that in the proof of Theorem 1.1 in the previous section titled 1 An Explicit Example of Convolution Utilizing a Certain Kernel, we had already made use of such terms as "convolution" and "family of kernels", where we suppose that their definitions are reasonably accessed by the reader. In the theory of convolutions, we have a particular result that, if only we are provided a so-called family of "approximate identity" kernels, convolutions with such classes of approximate identity kernels provide "approximations to the identity" in the limit.
More than that is that a general framework actually allows us to avoid making ad-hoc arguments, ad-hoc arguments which would include the one made in the proof of Theorem 1.1. And so, consider the following definition of a family of approximate identity kernels $\{ g_n \}_{n = 1}^{\infty}$ as suited to our purposes,
I. We have that, for $n \in \mathbb{N}$,
$$\int_{-1}^{1} g_n(x) dx = 1$$
II. That, for $n \in \mathbb{N}$,
$$\int_{-1}^{1} \left| g_n(x) \right| dx < \infty$$
III. And, finally, we also have that, for real $\delta > 0$,
$$\lim_{n \rightarrow \infty} \int_{\delta \leq |x| \leq 1} \left| g_n(x) \right| dx = 0$$
A major result of such a generalized framework is then,
Theorem 2.1 If $\{ g_n \}_{n = 1}^{\infty}$ is a family of approximate identity kernels and $f \in \mathcal{L} \left( \left[-1, 1 \right] \right)$ a real-valued function, then we have that $\lim_{n \rightarrow \infty} \left| f(x) - f \ast g_n(x) \right| = 0$ for all $x$ points of continuity of the domain of $f$.
PROOF. The argument is somewhat similar to the ad-hoc argument as provided in the proof of Theorem 1.1, that we then realize a general underlying phenomenon in the purposeful isolation of points of singularities, and so we consider, for some real $\delta > 0$,
$$\begin{align}\left| f - f \ast g_n \right| &= \left| \left( \int_{-1}^{1} g_n(u) du \right) \cdot f - \int_{|u| \leq 1} g_n(u) f(x - u) du \right| \\ &= \left| \int_{\left| u \right| < \delta} g_n(u) \left[ f(x) - f(x - u) \right] du + \int_{\delta \leq \left| u \right| \leq 1} g_n(u) \left[ f(x) - f(x - u) \right] du \right| \\ & \leq \int_{\left| u \right| < \delta} \left| g_n(u) \right| \cdot \left| f(x - u) - f(x) \right| du + \int_{\delta \leq \left| u \right| \leq 1} \left| g_n(u) \right| \cdot \left| f(x - u) - f(x) \right| du \\ & \leq \epsilon \int_{-1}^{1} \left| g_n(u) \right| du + 2K \int_{\delta \leq \left| u \right| \leq 1} \left| g_n(u) \right| du \end{align}$$
We note that the first equality above is justified due to property I. of approximate identity kernels. Then, for sufficiently small real $\delta > 0$, the real $\epsilon > 0$ can be made arbitrarily small. Finally, since property II. states the boundedness of $\int_{-1}^{1} \left| g_n(u) \right| du$, and, since property III. states the vanishing of $\int_{\delta \leq \left| u \right| \leq 1} g_n(u) du$ for $n \rightarrow \infty$, we have that $\lim_{n \rightarrow \infty} \left| f - f \ast g_n \right| = 0$. $\square$
We now come to a particular result that allows us to subsume the approximation scheme in the previous section under a general theory of convolutions in,
Theorem 2.2 The family of kernels $g_n(x) = \frac{\left( 1 - (-x)^2 \right)^n}{\int_{-1}^{1} \left( 1 - u^2 \right)^n du}$ is a family of approximate identity kernels.
PROOF. For the first property of families of approximate identity kernels in I., such is clearly immediately satisfied as $\int_{-1}^{1}g_n dx = \frac{\int_{-1}^{1} \left( 1 - (-x)^2 \right)^n dx}{\int_{-1}^{1} \left( 1 - u^2 \right)^n du} = 1$. Then, for the second property in II., we have that for some $K$, $\int_{-1}^{1} g_n dx < K \int_{-1}^{1} \left( 1 - (-x)^2 \right)^n dx$, and where the integrability of $\left( 1 - (-x)^2 \right)^n$ gives the boundedness of $\int_{-1}^{1} g_n dx$ on $[-1, 1]$ for arbitrary $n \in \mathbb{N}$. Then, finally, for the third property in III., since $\left( 1 - (-x)^2 \right)^n$ converges to $0$ for every $x \in [-1, 0) \cup (0, 1]$, it is then clearly satisfied as a result.$^{[9]}$ $\square$
Hence, Theorem 2.1 in combination with Theorem 2.2 provides another proof for Theorem 1.1, and one that produces a generalization of Theorem 1.1 as the approximations can be provided by $P_n(x) = \frac{\int_{-1}^{1}f(u)\left[ 1 - (u - x)^2 \right]^n du}{\int_{-1}^{1} \left( 1 - u^2 \right)^n du}$ - the difference being that the interval of integration has been extended from $[\alpha, \beta]$ for $0 < \alpha < \beta < 1$ to $[-1, 1]$.$^{[10]}$ And so, in this way, this section has demonstrated that a viable general theory for the theory of convolutions is to simply identify and utilize families of kernels that are sufficiently well-behaved in the approximate identity kernels.
We can observe that approximate identity kernels essentially provide approximations to the identity for $n \rightarrow \infty$ in the sense that we are almost merely filtering $f$ with identity functions as, for the expression $f(x) \ast g(x) = \int f(u) g(x - u) du$, if $g$ was reasonably similar to some kind of an "identity" object, as we integrate with respect to $u$, $g(x - u)$ is essentially continuously translated in such a fashion so as to filter every part of $f$, to be finalized in a "continuous sum" in the form of an integral. In fact, for $g$ the "delta function" in the sense that $g$ is defined to be 1 on the origin and 0 everywhere else,$^{[11]}$ we have that $f(x) \ast g(x) = \int f(u) g(x - u) du$, almost trivially, filters every part of $f(u)$ when integrating with respect to $u$ - that is, as $u$ sifts through all appropriate values of the domain, we merely multiply $f(u)$ by 1 for all $u$ of the domain, and collect all the results in the form of an integral.$^{[12]}$
3 Some Applications in Fourier Analysis
Fourier analysis classically constituted the theory of approximations via trigonometric polynomials, where Fourier himself had originally described certain tools of Fourier analysis as the resolution of certain functions into infinite sums of "cosines of multiple arcs".$^{[13]}$ And, although such interpretations can be subsumed under various generalized frameworks, as in the functional analytic framework of orthogonal projections in inner-product spaces, there is another framework that could subsume the theory of Fourier analysis, and given the context of this blog post, perhaps it is not entirely too surprising that the direction that we are interested in is the framework as provided by the theory of convolutions. Before we proceed, we provide a slight altering of property I. of families of approximate identity kernels in,
I. We have that, for $n \in \mathbb{N}$,
$$\frac{1}{2 \pi} \int_{-\pi}^{\pi} g_n(x) dx = 1$$
And, for the other properties II. and III., we intend to integrate over the interval $[-\pi, \pi]$ instead. Such is simply due to the fact that we would like to accommodate the $2\pi$-periodic properties of Fourier series as conventionally stated, and, also, that we would like to ultimately be able to recover the approximation to the identity on $[-\pi, \pi]$. So, consider a first class of kernels that is classical in,
Definition 3.1 (The Dirichlet Kernels) The Dirichlet kernels are defined to be, for $n \in \mathbb{N}$,
$$D_n(x) = \frac{\sin \left[ \left( n + \frac{1}{2} \right)x \right]}{\sin \left( \frac{1}{2}x \right)}$$
We also take it as known, and so by definition, that,$^{[14]}$
$$D_N(x) = \sum_{n = -N}^{N} e^{inx}$$
Definition 3.2 (The Fejér Kernels) The Fejér kernels are defined to be, for $n \in \mathbb{N}$,
$$F_n(x) = \frac{1}{n} \left( \frac{\sin \left( \frac{1}{2}nx \right)}{\sin \left( \frac{1}{2}x \right)} \right)^2$$
Then, in pursuing certain convergence results of Fourier analysis, in recalling the treatment as provided in the previous sections, we can then proceed in the following manner. If Fourier series can be provided in general in terms of convolutions with families of kernels, and if only such families of kernels are approximate identity kernels, then we can guarantee certain convergence results. A natural question is then, are the family of kernels as defined in Definition 3.1 and Definition 3.2 families of good kernels?
Theorem 3.1 The family of Dirichlet kernels is not a family of approximate identity kernels.
Intuitively, we first comment that it is not entirely clear that the family of Dirichlet kernels is not a family of approximate identity kernels, where we state now that it is in fact the second property in II. which is violated. Such is not entirely clear as families of Dirichlet kernels provide iterated oscillations of positive and negative amplitudes, that is, it is not so straightforward as to whether, for $n \rightarrow \infty$, that the cancellation effect between the positive amplitudes and the negative amplitudes is sufficient to provide convergence. We thus require a quantitative estimate at least and so we consider,
Lemma 3.1 We have that the Dirichlet kernel satisfy $\frac{1}{2 \pi} \int_{-\pi}^{\pi} \left| D_n(x) \right| dx > c \log(n)$ for some real $c$.
PROOF. Since $\sin(x) \leq x$, we have that $\frac{1}{x} \leq \frac{1}{\sin(x)}$, and so we first immediately obtain,
$$\begin{align} \int_{-\pi}^{\pi} \left| D_n(x) \right| dx & \geq \int_{-\pi}^{\pi} \frac{\left| \sin \left[ \left( n + \frac{1}{2} \right)x \right] \right|}{\left| x \right|} dx \\ &= \int_{-\left( n + \frac{1}{2} \right) \pi}^{\left( n + \frac{1}{2} \right) \pi} \frac{\left| \sin(u) \right|}{\left| u \right|} du \\ &> \int_{\pi}^{n \pi} \frac{\left| \sin(u) \right|}{\left| u \right|} du \\ &= \sum_{k = 1}^{n - 1} \int_{k \pi}^{(k + 1)\pi} \frac{\left| \sin(u) \right|}{\left| u \right|} du \\ & \geq \frac{1}{\pi} \sum_{k = 1}^{n - 1} \frac{1}{k + 1} \int_{k \pi}^{(k + 1)\pi} \left| \sin(u) \right| du \end{align}$$
Then, finally, since $\sum_{k = 1}^{n} \frac{1}{k} > c' \log(n)$ for some appropriate $c'$, we obtain the estimate $\frac{1}{2 \pi} \int_{-\pi}^{\pi} \left| D_n(x) \right| dx > c \log(n)$.$^{[15]}$ $\square$
The proof of Theorem 3.1 is then an immediate consequence of Lemma 3.1. Given such an estimate, a quantitative bound is then provided which allows us to realize that the family of Dirichlet kernels consists of exactly those kernels that are unable to control their corresponding iterated oscillations for $n \rightarrow \infty$. Since Fourier series can typically be formulated in terms of convolutions with respect to Dirichlet kernels, what Theorem 3.1 allows us to conclude then essentially is that there exist Fourier series that are not convergent, a particularly simple result, a special case of a general theory in the theory of convolutions.$^{[16]}$
However, there exist other classes of kernels where such iterated oscillations can be controlled. Indeed, the family of Fejér kernels resolves, in a certain sense, exactly the weaknesses of the family of Dirichlet kernels in that what is provided is an appropriate averaging phenomenon so as to control the iterated oscillations of the family of Dirichlet kernels. And so, consider the following formulation below,
Theorem 3.2 The Fejér kernels can be provided by Cesàro sums of families of Dirichlet kernels in the sense of,
$$F_n(x) = \frac{D_0(x) + D_1(x) + \dots + D_{n - 1}(x)}{n}$$
PROOF. First, recall that since the Dirichlet kernels can be given by $D_n(x) = \frac{\sin \left[ \left( n + \frac{1}{2} \right)x \right]}{\sin \left( \frac{1}{2}x \right)}$, we can rewrite the Dirichlet kernels as,
$$\begin{align} \frac{\sin \left[ \left( n + \frac{1}{2} \right)x \right]}{\sin \left( \frac{1}{2}x \right)} &= \frac{\cos \left( \frac{1}{2}x \right) \cos \left[ \left( n + \frac{1}{2} \right)x \right] - \cos \left[ \left( n + 1 \right)x \right]}{\left[ \sin \left( \frac{1}{2}x \right) \right]^2} \\ &= 2 \frac{\cos \left( \frac{1}{2}x \right) \cos \left[ \left( n + \frac{1}{2} \right)x \right] - \cos \left[ \left( n + 1 \right)x \right]}{1 - \cos(x)} \end{align}$$
Where, in the first equality, we have used the trigonometric addition formula in $\sin(\frac{1}{2}x) \sin \left[ \left( n + \frac{1}{2} \right)x \right] = \cos(\frac{1}{2}x) \cos \left[ \left( n + \frac{1}{2} \right)x \right] - \cos \left[ \left( n+ \frac{1}{2} \right)x + \frac{1}{2}x \right]$, and, in the second equality, the half-angle formula in $\left[ \sin \left( \frac{x}{2} \right) \right]^2 = \frac{1 - \cos(x)}{2}$. And, since,
$$\begin{align} \cos(nx) &= \cos \left[ \left( n + \frac{1}{2} \right)x - \frac{1}{2}x \right] \\ &= \cos \left[ \left( n + \frac{1}{2} \right)x \right] \cos \left( \frac{1}{2}x \right) + \sin \left[ \left( n + \frac{1}{2} \right)x \right] \sin \left( \frac{1}{2}x \right) \end{align}$$
$$\begin{align} \cos \left[ (n + 1)x \right] &= \cos \left[ \left( n + \frac{1}{2} \right)x + \frac{1}{2}x \right] \\ &= \cos \left[ \left( n + \frac{1}{2} \right)x \right] \cos \left( \frac{1}{2}x \right) - \sin \left[ \left( n + \frac{1}{2} \right)x \right] \sin \left( \frac{1}{2}x \right) \end{align}$$
We proceed to,
$$\begin{align} \frac{\sin \left[ \left( n + \frac{1}{2} \right)x \right]}{\sin \left( \frac{1}{2}x \right)} &= \frac{\cos(nx) - \cos \left[ (n + 1)x \right]}{1 - \cos(x)} \end{align}$$
The point of the above being, that such an expression then allows us to provide a telescoping sum so as to retrieve the Fejér kernels,
$$\begin{align} \frac{D_0(x) + \dots + D_{n - 1}(x)}{n} &= \frac{1}{n} \left( \frac{\cos(0x) - \cos \left[ \left( 0 + 1 \right)x \right] + \cos(1x) - \cos \left[ \left( 1 + 1 \right)x \right] + \dots - \cos (nx)}{1 - \cos(x)} \right) \\ &= \frac{1}{n} \left( \frac{1 - \cos(nx)}{1 - \cos(x)} \right) \\ &= \frac{1}{n} \left( \frac{\sin \left( \frac{1}{2}nx \right)}{\sin \left ( \frac{1}{2}x \right)} \right)^2 \end{align}$$
$\square$
Notice that in the above Cesàro sums in the providing of Fejér kernels, all that was provided is a certain arithmetic averaging phenomenon. Although it is not entirely clear that such an approach can actually resolve the iterated oscillations of the Dirichlet kernels, it turns out that, in fact, such an averaging technique is sufficient for our current purposes, as formalized in the following,
Theorem 3.3 The family of Fejér kernels is a family of approximate identity kernels.
PROOF. First, since Dirichlet kernels satisfy property I. of approximate identity kernels as,
$$\begin{align} \frac{1}{2 \pi} \sum_{n = -N}^{N} \int_{-\pi}^{\pi} e^{inx} dx &= \frac{1}{2\pi} \int_{-\pi}^{\pi} e^{i \cdot 0 \cdot x} dx + \frac{1}{2 \pi} \sum_{n = -N}^{-1} \int_{-\pi}^{\pi} \left[ \cos(nx) + i \sin(nx) \right] dx + \frac{1}{2 \pi} \sum_{n = 1}^{N} \int_{-\pi}^{\pi} \left[ \cos(nx) + i \sin(nx) \right] dx \\ &= \frac{2 \pi}{2 \pi} \\ &= 1 \end{align}$$
It is clear that the Fejér kernels, being kernels that can be given in terms of Cesàro sums of Dirichlet kernels, also satisfy property I.. Property II. is realized almost immediately since Fejér kernels are non-negative and since property I. is satisfied. Next, we demonstrate property III. of approximate identity kernels, which can be realized by noticing that $\left( \frac{\sin \left( \frac{1}{2}nx \right)}{\sin \left( \frac{1}{2}x \right)} \right)^2$ is bounded for $x \neq 0$, and so, away from $x = 0$, the Fejér kernels are of the same order of magnitude as $\frac{1}{n}$. $\square$
In other words, Fourier series are Cesàro summable in the sense that Cesàro sums of partial Fourier sums always provide results that converge. Or, alternatively, for entire classes of functions, even if their Fourier series do not converge, at the very least, we can always provide trigonometric polynomials that converge to them in the limit, and these trigonometric polynomials are exactly those Cesàro sums of partial Fourier sums, seemingly quite a powerful and profound result since such demonstrates that even the class of elementary trigonometric sinusoids are sufficient so as to provide approximations for surprisingly wide classes of functions in general.
For further commentary, notice that all the kernels as provided in this blog post are kernels that in some sense converge closer and closer to functions with graphs that spike at the origin, with masses that are mostly concentrated at the origin, or, in other words, these kernels appear to resemble, more and more, the delta function at the origin for $n \rightarrow \infty$. However, for the Dirichlet kernels, they do not mimic the delta function sufficiently well in the limit in that property II. of approximate identity kernels is not satisfied. Alternatively, approximate identity kernels are kernels that are able to aptly mimic the delta function for $n \rightarrow \infty$.
FOOTNOTES
[1] Recall that simple functions are provided as linear combinations of indicator functions as defined on appropriate subsets of an appropriately generated sigma-algebra - the coefficients of such linear combinations are provided so that the simple function would at least agree with the function that it approximates on some subset of the function domain, thus "mimicking" the function that it approximates. Visually, we would find that corresponding to simple functions are collections of "pathological rectangles", achieving different relative "heights", in order to mimic the function that it approximates. Pathological rectangles, are, of course, to be distinguished from the classical rectangles of step functions.
[2] See page 66 of Courant & Hilbert's Methods of Mathematical Physics: Volume I.
[3] A convolution can be given as $f(x) \ast g(x) = \int f(u) g(x - u) du$.
[4] To see this, we merely need to notice that $(1 - x^2)^n$ as defined on $[0, 1]$ is increasingly "concentrated" at the origin for $n \rightarrow \infty$ as $\lim_{n \rightarrow \infty} (1 - \epsilon^2)^n = 0$ for real $\epsilon \in (0, 1]$, and, that $\int_{-1}^{1} (1 - u^2)^n du$ becomes increasingly small for $n \rightarrow \infty$.
[5] In fact, $(1 - \epsilon)^n$ for any real $\epsilon \in (0, 1)$ will always vanish to a greater order of magnitude relative to which $n$ grows to infinity for $n \rightarrow \infty$.
[6] Intuitively, the fact that $(1 - x^2)^n$ increasingly concentrates mass at the origin would indicate to us that the ratio $\frac{\int_{\delta}^{1} (1 - u^2)^n du}{\int_{-1}^{1} (1 - u^2)^n du}$ tends to 0 for $n \rightarrow \infty$ since the domain of integration of the integral in the numerator is bounded away from the origin. Additionally, the fact that $\frac{\int_{-\delta}^{\delta} (1 - u^2)^n du}{\int_{-1}^{1} (1 - u^2)^n du}$ is bounded is obvious with the same intuitive device, and, that such an expression tends to 1 for $n \rightarrow \infty$.
[7] There is a fantastic exposition in Stein's Singular Integrals and Differentiability Properties of Functions which would indicate further directions in the development of an entire mathematical theory that rests on singular integrals as the fundamental objects.
[8] Actually, a "mathematical engineer" in Oliver Heaviside had once utilized a prototypic theory of the theory of convolutions, and, the theory of generalized functions, in making use of prototypic notions of convolutions in contexts of certain topics of electrical engineering. See for instance Lützen's Heaviside's operational calculus and the attempts to rigorise it.
[9] Recall footnotes [4], [5], and [6].
[10] One ad-hoc reason as to why the family of kernels as originally provided in Theorem 1.1 was stated for the interval $[\alpha, \beta]$ as opposed to $[-1, 1]$ was in part due to the proof that was provided, where, we had made use of the substitution $u - x = v$. That, we were specifically looking to avoid the scenario where we would've translated too far away from the interval $[-1, 1]$, since then $\left( 1 - (-x)^2 \right)^n$ would no longer be as well behaved, and we simply state quickly that $\left( 1 - (-x)^2 \right)^{(2n - 1)}$ would tend to infinity for $n \rightarrow \infty$ on $x \notin [-\sqrt{2}, \sqrt{2}]$ - thus the ad-hoc nature of the provided proof of Theorem 1.1. Additionally, the proof of Theorem 1.1 had made use of the uniform continuity of $f$ on $[0, 1]$, whereas Theorem 2.1 specifically states those points of continuity, and thus the subtlety of Lebesgue integrable $f$ in Theorem 2.1.
[11] Such is actually not the definition of the delta function, even in its original heuristical form. But for the purpose of a quick intuitive discussion in that particular paragraph, we feel it sufficient. Of course, the delta function is intimately related to the properties of an approximate identity kernel.
[12] "But knowing the developed impulsive solution... the solution for a continued force varying anyhow with the time is at once expressible by a definite integral, because the continued force may be regarded as consisting of an infinite series of successive infinitesimal impulses." - Oliver Heaviside
[13] "Regarding the researches of d'Alember and Euler could one not add that if they knew this expansion, they made but a very imperfect use of it. They were both persuaded that an arbitrary and discontinuous function could never be resolved in series of this kind, and it does not even seem that anyone had developed a constant in consines of multiple arcs, the first problem which I had to solve in the theory of heat." - Joseph Fourier
[14] See for instance page 37 of Stein and Shakarchi's Fourier Analysis: An Introduction.
[15] See for instance page 505 of Courant and John's Introduction to Calculus and Analysis: Volume I.
[16] Actually, we have not proven that convolutions with families of kernels that are not approximations to the identity could diverge, but, for the moment, we take it as known that there exist Fourier series' that do not converge to appropriate limit functions.
REFERENCES
Courant, R., & Hilbert, D. (1989). Methods of Mathematical Physics: Volume I. Wiley-VCH Verlag GmbH & Co. KGaA. (Original work published 1953)
Courant, R., & John, F. (1989). Introduction to Calculus and Analysis: Volume I. Springer New York. (Original work published 1965). https://doi.org/10.1007/978-1-4613-8955-2
Jahnke, H. N. (2003). A History of Analysis. American Mathematical Society; London Mathematical Society. (Original work published 1999). https://doi.org/10.1090/hmath/024
Lützen, J. (1979). Heaviside's operational calculus and the attempts to rigorise it. Archive for History of Exact Sciences, 21, 161-200. https://doi.org/10.1007/BF00330405
Stein, E. M., & Shakarchi, R. (2003). Fourier Analysis: An Introduction. Princeton University Press.