Cointegration and Error Correction Mechanisms for Singular Stochastic Vectors

: Large-dimensional dynamic factor models and dynamic stochastic general equilibrium models, both widely used in empirical macroeconomics, deal with singular stochastic vectors, i.e., vectors of dimension r which are driven by a q -dimensional white noise, with q < r . The present paper studies cointegration and error correction representations for an I ( 1 ) singular stochastic vector y t . It is easily seen that y t is necessarily cointegrated with cointegrating rank c ≥ r − q . Our contributions are: (i) we generalize Johansen’s proof of the Granger representation theorem to I ( 1 ) singular vectors under the assumption that y t has rational spectral density; (ii) using recent results on singular vectors by Anderson and Deistler, we prove that for generic values of the parameters the autoregressive representation of y t has a ﬁnite-degree polynomial. The relationship between the cointegration of the factors and the cointegration of the observable variables in a large-dimensional factor model is also discussed.


Introduction
An r-dimensional stochastic vector y t such that y t = A 0 u t + A 1 u t−1 + · · · , where the matrices A j are r × q and u t is a q-dimensional white noise, with q < r, is said to be singular. Singular stochastic vectors have been systematically analyzed in a number of papers starting with Deistler 2008a, 2008b). A motivation for studying the consequences of singularity, as argued by these authors, is that the factors' vector in large-dimensional dynamic factor models (DFM), such as those introduced in Forni et al. (2000); Forni and Lippi (2001), Watson 2002a, 2002b), is typically singular. Singularity is also an important feature of dynamic stochastic general equilibrium models (DSGE), see e.g., Sargent (1989), Canova (2007), pp. 230-2. Singularity as it arises in DFMs is presented in some detail below.
DFMs are based on the idea that all the observed variables in an economic system are driven by a few common (macroeconomic) shocks and by idiosyncratic components which may result from measurement errors and sectoral or regional shocks. Formally, each variable in the n-dimensional dataset x it , i = 1, 2, . . . , n, t = 1, 2, . . . , T, is decomposed into the sum of a common component χ it , and an idiosyncratic component it : x it = χ it + it , where χ it and js are orthogonal for all i, j, t, s. In the standard version of the DFM the common components are linear combinations of an r-dimensional vector of common factors F t = (F 1t F 2t · · · F rt ) , Now suppose that the observable variables x it and the common factors F t are I(1) and that (1 − L)F t = C(L)u t , where u t is a nonsingular q-dimensional white-noise vector 1 , the common shocks. A number of papers analyzing macroeconomic databases find strong empirical support for the assumption that the vector F t is singular, i.e., that q < r. See, for US datasets, Giannone et al. (2005); Amengual and Watson (2007); Forni and Gambetti (2010), Luciani (2015). For a Euro-area dataset, see Barigozzi et al. (2014). Such results can be easily understood observing that usually the static Equation (1) is just a convenient representation derived from a "primitive" set of dynamic equations linking the common components χ it to the common shocks u t . As a simple example, suppose that the variables x it are driven by a common one-dimensional cyclical process f t , such that (1 − αL) f t = u t , where u t is scalar white noise, and that the variables x it load f t dynamically: x it = a i0 f t + a i1 f t−1 + it . (3) In this case we can set F 1t = f t , F 2t = f t−1 = F 1,t−1 , λ i1 = a i0 , λ i2 = a i1 , so that Equations (1) and (2) take the form respectively. Here r = 2 and q = 1 so that F t is singular. For a general analysis of the relationship between representation (1) and "deeper" dynamic representations like (3), see e.g., Forni et al. (2009); Stock and Watson (2016). Now suppose that the factors F t have been estimated. Obtaining u t and the impulse-response functions of the variables x it with respect to u t (or structural shocks obtained by a linear transformation of u t ) requires the estimation of a VAR for the singular I(1) vector F t . On the other hand, the latter is necessarily cointegrated with cointegration rank c at least equal to r − q (the rank of the spectral density of (1 − L)F t does not exceed q at all frequencies and, therefore, at frequency zero).
Singular vectors of factors in an I(1) DFM and I(1) singular vectors in DSGE models provide strong motivation for studying singular I(1) vectors in a general time-series context. The main contributions of the paper are: (I) A generalization of Johansen's proof of the Granger Representation Theorem (from MA to AR), this is Proposition 2. Consider an I(1) singular vector y t , with dimension r, rank q < r, and cointegrating rank c ≥ r − q. Assuming that (1 − L)y t has an ARMA structure, S(L)(1 − L)y t = B(L)u t and that some simple additional conditions hold, y t has a representation as a vector error correction mechanism (VECM) with c error correction terms: where α and β are both r × c and full rank, β y t − w is I(0), A(L) and A * (L) are r × r rational matrices in L. Under the additional assumption that unity is the only zero of B(L), i.e., if z = 1 then B(z) is full rank, A(L) and A * (L) are finite-degree matrix polynomials. (II) Assuming that the parameters of S(L) and B(L) may vary in an open subset of R λ , see Section 3.2 for the definition of λ, in Proposition 3 we show that all the assumptions used to obtain (4), and also the assumption that unity is the only possible zero of B(L), hold for generic values of 1 Usually orthonormality is assumed. This is convenient but not necessary in the present paper. the parameters. This implies that the matrices A(L) and A * (L) are generically of finite degree, which is obviously not the case for nonsingular vectors. 2 The paper is organized as follows. Section 2 is preliminary. We firstly recall recent results for stationary singular stochastic vectors with rational spectral density, see Deistler 2008a, 2008b). Secondly, we discuss cointegration and the cointegrating rank for I(1) singular stochastic vectors.
In Section 3 we prove our main results. We also obtain the permanent-transitory shock representation in the singular case: y t is driven by r − c permanent shocks, i.e., r minus the cointegrating rank, the usual result. However, the number of transitory shocks is c − (r − q), not c as in the nonsingular case.
Section 3 also contains an exercise carried out with simulated singular I(1) vectors. We compare the results obtained by estimating an unrestricted VAR in the levels and a VECM. Though limited to a simple example, the results confirm what has been found for nonsingular vectors, that under cointegration the long-run features of impulse-response functions are better estimated using a VECM rather than an unrestricted VAR in the levels (Phillips 1998).
In Section 4 we analyse cointegration of the observable variables x it in a DFM. Our results on cointegration of the singular vector F t have the implication that p-dimensional subvectors of the n-dimensional common-component vector χ t , with p > r − c, are cointegrated. As a consequence, stationarity of the idiosyncratic components would imply that all p-dimensional subvectors of the n-dimensional dataset x t are cointegrated if p > r − c. For example, if q = 3 and d = 1, then all 3-dimensional subvectors in the dataset are cointegrated, a kind of regularity that we do not observe in actual large macroeconomic datasets. This suggests that an estimation strategy robust to the assumption that the idiosyncratic components can be I(1) has to be preferred (for this aspect we refer to Barigozzi et al. 2019). Section 5 concludes. Some proofs, a discussion of some non-uniqueness problems arising with singularity and details on the simulations are collected in the Appendix.

Stationary Singular Vectors
As in this paper we only consider representation issues it is convenient to assume that all stochastic processes are defined for t ∈ Z. Accordingly, the lag operator L is defined as Ly t = y t−1 for t ∈ Z (Bauer and Wagner (2012) also study I(1) and cointegrated processes for t ∈ Z).
We start by introducing results on singular vectors with an ARMA structure from Deistler 2008a, 2008b). Some preliminary definitions are needed.

Definition 1. (Zeros and Poles)
(A) When considering matrices V(z) whose entries are rational functions of z ∈ C we always assume that numerator and denominator of each entry have no common roots. If V(z) is an r × q matrix of rational functions, we say that z * is a pole of V(z) if it is a pole of some entry of V(z). (B) Suppose that V(z) is an r × q matrix whose entries are polynomial functions of z ∈ C, with q ≤ r. We say that z * ∈ C is a zero of V(z) if rank(V(z * )) < q, and that V(z) is zeroless if it has no zeros, i.e., rank(V(z)) = q for all z ∈ C.
2 To our knowledge, the present paper is the first to study cointegration and error correction representations for I(1) singular vectors, the factors of I(1) dynamic factor models in particular. An error correction model in the DFM framework is studied in (Banerjee et al. 2014(Banerjee et al. , 2017. However, their focus is on the relationship between the observable variables and the factors. Their error correction term is a linear combination of the variables x it and the factors F t , which is stationary if the idiosyncratic components are stationary (so that the x's and the factors are cointegrated). Because of this and other differences their results are not directly comparable to those in the present paper.
With a minor abuse of language, we may speak of zeros and poles of the corresponding matrix V(L). When a r × r polynomial matrix S(L) has all its zeros outside the unit circle we say that S(L) is stable.
All the stationary vector processes considered have an ARMA structure. Precisely, the r-dimensional process y t has an ARMA structure with rank q, q ≤ r, if there exist (i) a non-singular q-dimensional white-noise process u t , (ii) an r × r stable polynomial matrix S(z), with S(0) = I r , (iii) an r × q matrix B(z) whose rank is q for all z with the exception of a finite subset of C, such that where V(L) = S(L) −1 B(L).
Suppose that y t has also the representation y t =S(L) −1B (L)ũ t , whereũ t is aq-dimensional nonsingular white noise. Denoting by Σ y (θ) the spectral density of y t , so that the rank of Σ y (θ) is q for all θ, with the exception of a finite subset of [−π, π]. As the spectral density is independent of the ARMA representation, q =q andB(z) has rank q except for a finite subset of C.
Remark 1. Let us recall that the equation in the unknown vector process ζ t , where S(L) is stable, has only one stationary solution, and this is y t = S(L) −1 B(L)u t . Thus the ARMA process y t can also be defined as the stationary solution of S(L)ζ t = B(L)u t .

Definition 2. (Genericity)
Suppose that a statement Q depends on p ∈ A, where A is an open subset of R λ . We say that Q holds generically in A, or that Q holds for generic values of p ∈ A, if the subset N of A where it does not hold is nowhere dense in A, i.e., the closure of N in A has no internal points.
For example, assuming that p ∈ A = R, the statement "The roots of the polynomial x 2 + px + 1 are distinct" holds generically in A.
Definition 3. (Rational reduced-rank family of filters) Assume that r > q and let G be a set of ordered couples (S(L), B(L)), where: (i) B(L) is an r × q polynomial matrix of degree s 1 ≥ 0.
(ii) S(L) is an r × r polynomial matrix of degree s 2 ≥ 0. S(0) = I r . (iii) Denoting by p the vector containing the λ = rq(s 1 + 1) + r 2 s 2 coefficients of the entries of B(L) and S(L), we assume that p ∈ Π, where Π is an open subset of R λ such that for p ∈ Π, (1) S(z) is stable, (2) rank(B(z)) = q with the exception of a finite subset of C.
We say that G is a rational reduced-rank family of filters with parameter set Π.
The notation S p (L), B p (L), though more rigorous, would be heavy and not really necessary. We use it only in Appendix A.1.
(I) Suppose that V(L) is an r × q matrix polynomial in L. If V(z) is zeroless then V(L) has an r × r finite-degree stable left inverse, i.e., there exists a finite-degree polynomial r × r matrix W(L) such that: (II) Assume that y t is the stationary solution of S(L)ζ t = B(L)u t , where (S(L), B(L)) belongs to a rational reduced-rank family of filters with parameter set Π. For generic values of the parameters in Π, B(L) is zeroless so that y t has a finite VAR representation.
For statement (I) see Anderson and Deistler (2008a), Theorem 3. Statement (II) is a modified version of their Theorem 2, see for a proof Forni et al. (2009), p. 1327.

Fundamentalness
Assume that the r-dimensional vector y t has an ARMA structure, rank q and the moving average representation (5). If rank(B(z)) = q for |z| < 1, then u t belongs to the space spanned by y t−k , with k ≥ 0, and representation (5), as well as u t , is called fundamental (for these definitions and results see e.g., Rozanov (1967), pp. 43-7). Note that if (5) is fundamental rank(B(0)) = q. Note also that when q = r, the condition that rank(B(z)) = q for |z| < 1 becomes det(B(z)) = 0 for |z| < 1.
Remark 2. Note that in Proposition 1, part (II), we do not assume that u t is fundamental for y t . However, Proposition 1, (II), states that for generic values of p ∈ Π the matrix B(L) is zeroless and therefore u t is fundamental for y t .

I(1) Singular Vectors
To analyze cointegration and the autoregressive representations of singular non-stationary vectors let us first recall the definitions of I(0), I(1) and cointegrated vectors. This requires some preliminary definitions and results.
We denote by L 2 (Ω, F , P) the space of the square-integrable functions on the probability space (Ω, F , P). Let z t = (z 1t z 2t · · · z rt ) , z ht ∈ L 2 (Ω, F , P), be an r-dimensional stochastic process and consider the difference equation ( in the unknown r-dimensional process ζ ζ ζ t . A solution of (6) is see e.g., Gregoir (1999), p. 439, Franchi andParuolo (2019). All the solutions of (6) are , is a solution of the homogeneous equation (1 − L)ζ t = 0, so that φ t = K, for some r-dimensional stochastic vector K, for all t ∈ Z. We say that the process φ t = K is a constant stochastic process. Obviously a constant stochastic process φ t = K is weakly stationary. Its spectral measure has the jump Σ K at frequency zero. Thus φ t has a spectral density (has an absolutely continuous spectral measure) if and only if Σ K = 0, i.e., if and only if φ t (ω) = k, where k ∈ R r , for ω almost everywhere in Ω.

Definition 4. (I(0), I(1) and Cointegrated vectors) I(0).
An r-dimensional ARMA y t with spectral density Σ y (θ) is I(0) if Σ y (0) = 0. I(1). The r-dimensional vector stochastic process y t is I(1) if it is a solution (1 − L)ζ t = z t where z t is an r-dimensional I(0) process. The rank of y t is defined as the rank of z t .

Cointegration.
Assume that the r-dimensional stochastic vector y t is I(1) and denote by Σ ∆y (θ) the spectral density of (1 − L)y t . The vector y t is cointegrated with cointegrating rank c, with 0 < c < r, if rank(Σ Σ Σ ∆y (0)) = r − c.
If q is the rank of y t and r ≥ q, then c = r − q + d, where q > d > 0. Thus in the singular case, r > q, y t is necessarily cointegrated with cointegrating rank at least equal to r − q.
If y t is I(1) and cointegrated with cointegrating rank c, there exist c linearly independent r × 1 vectors c j , j = 1, . . . , c, such that the spectral density of c j (1 − L)y t vanishes at frequency zero. The vectors c j are called cointegrating vectors and the set c j , j = 1, . . . , c, a complete set of cointegrating vectors. Of course a complete set of cointegrating vectors c j , j = 1, . . . , c, can be replaced by the set d j , j = 1, . . . , c, where the vectors d j are c independent linear combinations of the vectors c j . Lemma 1. (I) Assume that y t has an ARMA structure and has the rational representation (5): y t = V(L)u t . Then y t is I(0) if and only if V(1) = 0. (II) Assume (1 − L)y t has an ARMA structure and has the rational representation The process y t is I(1) if and only if V(1) = 0. (III) If y t is I(1), cointegrated and has representation (7), the cointegrating rank of y t is c if and only if the rank of V(1) is r − c. Moreover c is a cointegrating vector for y t if and only if c V(1) = 0.
(IV) Assume that y t is I(1). c is a cointegrating vector for y t if and only if a scalar stochastic variable w ∈ L 2 (Ω, F , P) can be determined such that c y t − w is stationary with an ARMA structure.
The process y t solves (6) with z t = V(L)u t , so that, defining we have are rational functions of L with no poles of modulus less or equal to unity, (ii) K is a constant r-dimensional stochastic process. We have: If c is a cointegrating vector of y t we have c V(1) = 0, so that Setting w = c K, the process c y t − w = c V * (L)u t has the desired properties. Note that w has the equivalent definition w = c y 0 − c V * (L)u 0 . Conversely, suppose that w is such that c y t − w has an ARMA structure. By (9), The three terms on the left-hand side are finite and independent of t. As Σ µ t = |t|Σ u and Σ u is positive definite, the right-hand side diverges for |t| → ∞ unless c V(1) = 0.
Lemma 1 shows that our definitions of I(0) and I(1) processes are equivalent to Definitions 3.2, and 3.3 in Johansen (1995), p. 35, with two minor differences: (i) our assumption of rational spectral density, (ii) the time span of the stochastic processes is t = 0, 1, . . . in Johansen's book, t ∈ Z in the present paper. Also, under the assumption that (1 − L)y t has an ARMA structure, our definition of cointegration is equivalent to that in Johansen (1995), p. 37.

Representation Theory for Singular I(1) Vectors
In Section 3.1 we prove our generalization to singular vectors of the Granger representation theorem (from MA to AR). We closely follow the proof in Johansen (1995), Theorem 4.5, p. 55-57. In Section 3.2 we show that, under a suitable parameterization, the matrix of the autoregressive representation is generically of finite degree.

The Granger Representation Theorem (MA to AR)
Suppose that r ≥ q, c > 0 and r > c ≥ r − q. Let B(L) be an r × q polynomial matrix of degree s 1 ≥ 0 and S(L) an r × r polynomial matrix of degree s 2 ≥ 0 with S(0) = I r .
Assumption 2 implies that the rank of B(0) is q. The next is a stronger version of Assumption 2: Assumption 3. If z * is a zero of B(z) then z * = 1.
Under Assumption 1, let y t be a solution of the equation We have where µ t is defined in (8) and K is a constant stochastic process. By Assumption 4, S(1) −1 B(1) = 0, so that y t is I(1) with cointegrating rank c, see Lemma 1, (II) and (III). Consider the finite Taylor expansion of B(z) around z = 1: Assumption 4 implies that Lancaster and Tismenetsky (1985, p. 97, Proposition 3). The Taylor expansion above can be rewritten as where B * = −B (1) and E(z) is a polynomial matrix. Let ξ ξ ξ ⊥ be an r × c matrix whose columns are orthogonal to all columns of ξ: (i) the columns of ξ ⊥ are a complete set of cointegrating vectors for B(L)u t , (ii) the columns of the matrix S (1)ξ ⊥ are a complete set of cointegrating vectors for y t . Regarding (i), using (11) and (12), we have so that ξ ⊥ S(L)y t − ξ ⊥ S(1)K has an ARMA structure. Regarding (ii), see the proof of Proposition 2.
Remark 3. Let y t be a solution of (10) so that (1 − L)y t is stationary and S(L)[(1 − L)y t ] = B(L)u t . Assumption 2, and therefore 3, implies that u t is fundamental for (1 − L)y t , see Section 2.2.
We are now ready for our main representation result.
Proposition 2. (I) Weak form. Suppose that Assumptions 1, 2, 4, 5 and 6 hold and let y t be a solution of the difference Equation (10), so that y t = S(L) −1 B(L)µ t + K, with µ t defined in (8) and K a constant stochastic process. Set β = S(1) ξ ⊥ . Then a c-dimensional stochastic vector w can be determined such that (i) β y t − w is I(0), (ii) y t has the error correction representation where A(L) is a rational r × r matrix with no poles in or on the unit circle, A(1) = I r , A * (L) = (A(L) − A(1)L)(1 − L) −1 , α is r × c and full rank, αβ = A(1).
We obtain Taking the first c rows in (15), This implies that where w is a c-dimensional constant stochastic vector. Comparing with (13), w = ξ ⊥ S(1)K. On the other hand, where the last equality has been obtained using (16) and H H H(L) is a suitable polynomial matrix. Thus β y t − w = ξ ⊥ S(1)y t − w has an ARMA structure. Moreover, by Assumption 6, β y t − w is I(0).
Joining (16) with the last r − c rows of (15), where By (15) and (19), By Assumption 5, M(z) has no zero at z = 1, see (19). On the other hand, we see that A(1) = α α αβ and Some remarks are in order.

Remark 4. (I)
Under our assumption of an ARMA structure, Assumption 1 corresponds to Definition 3.1 in Johansen's book, see p. 34. Assumption 2 is Johansen's Assumption 1 (see p. 14), adapted for singularity. Assumption 3 has no counterpart in Johansen's nonsingular framework. In Section 3.2 we show that under the parameterization adopted in Definition 5, Assumption 3 holds generically.
(II) Simplifying the model by taking S(L) = I r , Assumption 5 generalizes to the singular case Johansen's assumption that ξ ⊥ C * η η η ⊥ is full rank (see Theorem 4.5, p. 55; C * corresponds to our B * ). For, assuming that r = q, multiplying the matrix in Assumption 5 by the nonsingular matrix η ⊥ η η η , we obtain that Assumption 5 holds if and only if ξ ⊥ B * η ⊥ is full rank. Assumption 5 is used in the proof of Proposition 2 to invert the matrix M(L), which remains on the right-hand side after the removal of the unit roots, see Equation (18), which is the same rôle played by Johansen's assumption in his proof.
(III) Under S(L) = I r , assumption 6 simplifies to ξ ⊥ B * = 0. If d > 0 Assumption 6 is a consequence of Assumption 5. For, if d > 0 then r − c = q − d < q. On the other hand, r − c is the number of rows of η η η , so that Assumption 5 holds only if Assumption 6 holds. In particular, if r = q and c = d > 0, Assumption 6 is redundant. However if r > q and d = 0, so that the rank of η η η is q, then Assumption 5 holds even if ξ ξ ξ ⊥ B * = 0. Assumption 6 is necessary in Proposition 2 to prove that the error correction term is I(0), not only stationary.
Remark 5. Uniqueness issues arise with autoregressive representations of singular vectors. For example, suppose that c = r − q > 0, so that d = 0. Representation (14) has an (r − q)-dimensional error correction term β y t − w. On the other hand, in this case B(1) has full rank q, so that Proposition 1 (I) applies and, in spite of cointegration, y t has an autoregressive representation in differences In Appendix B.1 we sketch a proof of the statement that in general, y t has VECM representations with a number of error correction terms ranging from d to c. However, as we show in Appendix B.2, different autoregressive representations of y t produce the same impulse-response functions. Both in this and the companion paper Barigozzi et al. (2019) the number of error correction terms in the error correction representation for reduced-rank I(1) vectors is always the maximum c. It is worth reporting that, in our experiments with simulated data, the best results in estimation of singular VECMs are obtained using c as the number of error correction terms.
Remark 6. Assume for simplicity that S(L) = I r . From equation (17): If r = q, Assumption 5 implies that ξ ⊥ B * has rank c, so that no c-dimensional vector d = 0 can be determined such that some of the coordinates of de t is stationary but not I(0). Thus, according to the definition introduced in Franchi and Paruolo (2019), p. 1181, the error term e t is a "non-cointegrated I(0) process." When r > q and c ≤ q, i.e., r ≤ 2q − d, elementary examples can be produced in which e t is an I(0) but not a non-cointegrated I(0) process (one is given in Appendix A.2). Thus Assumption 6 only implies that e t is I(0). Of course, under c ≤ q, the assumption that ξ ξ ξ ⊥ (B * − S * (1)S(1) −1 ξη ) has rank c, an enhancement of Assumption 6, implies that e t is a non-cointegrated I(0) process. On the other hand, if c > q, i.e., r > 2q − d, e t cannot be a non-cointegrated I(0) process.

Generically, A(L) Is a Finite-Degree Polynomial
Suppose that the couple (S(L), B(L)) is parameterized as in Definition 3. It easy to see that B(1) has generically rank q, so that generically the cointegrating rank of y t is r − q. In particular, if r = q cointegration is non generic.
It is quite easy to see that this paradoxical result only depends on the choice of a parameter set that is unfit to study cointegration. Our starting point here is that a specific value of c between r − q and r − 1 has a motivation in economic theory or in statistical inference, and must be therefore built in the parameter set. Thus in Definition 5 below the family of filters is redefined so that generically the cointegrating rank is equal to a given c between r − q and r − 1.

Definition 5. (Rational reduced-rank family of filters with cointegrating rank c)
Assume that r > q, c > 0 and r > c ≥ r − q. Let G be a set of couples (S(L), B(L)), where: (i) The matrix B(L) has the parameterization where ξ and η are r × (r − c) and q × (r − c) respectively, B * is an r × q matrix and E(L) is an r × q matrix polynomial of degree s 1 ≥ 0.
We say that G is a rational reduced-rank family of filters with cointegrating rank c.
Proposition 3. Assume that r > q. Let y t be a I(1) solution of Equation (10), where (S(L), B(L)) belongs to a rational reduced-rank family of filters with cointegrating rank c. For generic values of the parameters in Π, Assumptions 1, 3, 4, 5 and 6 hold. Thus the Strong Form of Proposition 2 holds and y t has an error correction representation where A(L) is a finite-degree polynomial matrix.
Proof. Part (iii) of Definition 5 implies that Assumptions 1 and 4 hold for all p ∈ Π. The sets where Assumptions 5 and 6 do not hold are the intersections of the open set Π with the algebraic varieties (a) rank ξ ξ ξ ⊥ B * η η η < q, (b) ξ ξ ξ ⊥ (B * − S * (1)S(1) −1 ξη ) = 0 (the variety described by (a) is obtained by equating to zero the determinant of all the q × q submatrices of the r × q matrix between brackets). It is easy to see that the varieties (a) and (b) are not trivial, i.e., that their dimension is lower than λ. Thus Assumptions 5 and 6 hold generically. The same result holds for Assumption 3. The points of Π where it is not fulfilled belong to a lower-dimensional algebraic variety. This is proved in A.1, see in particular Lemma A4.
Remark 8. A general comment on genericity results is in order. Theorems like Proposition 3 or Proposition 1, part (II), show that the subset where some statement does not hold belong to some algebraic variety of lower dimension (see the proof of Proposition 3 in particular), and is therefore negligible from a topological point of view. This suggests the working hypothesis that such subset is negligible from an economic or statistical point of view as well. If, for example, economic theory produces a singular vector y t with cointegrationg rank c, we may find reasonable to conclude that y t has representation (14) with a finite autoregressive polynomial. However, a greater degree of certainty is obtained by checking that the parameters of (S(L), B(L)), that are implicit in the theory, do not necessarily lie in one of the three algebraic varieties described in the proof of Proposition 3.
Definition 5 does not assume that B(L) has no zeros inside the unit circle. Thus we have not assumed that u t is fundamental for (1 − L)F t , see Section 2.2. However, Proposition 3 shows that for generic values of the parameters in Π, the assumptions of Proposition 2, strong form, hold, Assumption 3 in particular, so that B(L) has no zeros of non-unit modulus and therefore inside the unit circle. Thus: Proposition 4. Assume that r > q. Let y t be a solution of Equation (10), where (S(L), B(L)) belongs to a rational reduced-rank family of filters with cointegrating rank c. For generic values of the parameters in Π, u t is fundamental for (1 − L)y t .
Remark 9. Note that Propositions 3 and 4 do not hold in the nonsingular case, where no genericity argument can be used to rule out non-unit zeros of B(L), either inside or outside the unit circle. In particular, fundamentalness of u t for (1 − L)y t is not generic if r = q.

Permanent and Transitory Shocks
Let η ⊥ be a q × d matrix whose columns are independent and orthogonal to the columns of η, and let Defining v 1t = η ⊥ u t , and v 2t = η u t , we have We have where G 1 (L) = (B * + (1 − L)E(L)) η ⊥ , and G 2 (L) = (B * + (1 − L)E(L)) η. All the solutions of the difference equation (1 − L)y t = S(L) −1 C(L)u t are where K is a constant stochastic process, and As ξ is full rank, we see that y t is driven by the q − d = r − c permanent shocks v 2t , and by the d temporary shocks v 1t . In representation (21), the component T t is the common-trend of Stock and Watson (1988). Note that the number of permanent shocks is obtained as r minus the cointegrating rank, as usual. However, the number of transitory shocks is only d = c − (r − q), as though r − q transitory shocks had a zero coefficient.

VECMs and Unrestricted VARs in The Levels
Several papers have addressed the issue of whether and when an error correction model or an unrestricted VAR in the levels should be used for estimation in the case of nonsingular cointegrated vectors: Sims et al. (1990) have shown that the parameters of a cointegrated VAR are consistently estimated using an unrestricted VAR in the levels; on the other hand, Phillips (1998) shows that if the variables are cointegrated, the long-run features of the impulse-response functions are consistently estimated only if the unit roots are explicitly taken into account, that is within a VECM specification. The simulation exercise described below provides evidence in favour of the VECM specification in the singular case.
The 4 × 4 matrix A(L) is of degree 2. The impulse-response functions are identified by assuming that the upper 3 × 3 submatrix of B(0) is lower triangular (see Appendix C for details). We replicate the generation of y t 1000 times for T = 100, 500, 1000, 5000. (II) For each replication, we estimate a (misspecified) VAR in differences (DVAR), a VAR in the levels (LVAR) and a VECM, as in Johansen (1988Johansen ( 1991, assuming known c, the degree of A(L) and that of A * (L). For the VAR in differences the impulse-response functions for (1 − L)y t are cumulated to obtain impulse-response function for y t . The root mean square error between estimated and actual impulse-response functions is computed for each replication using all 12 impulse-responses and averaged over all replications.
The results are shown in Table 1. We see that the RMSE of both the VECM and the LVAR decreases as T increases. However, for all values of T, the RMSE of the VECM stabilizes as the lag increases, whereas it deteriorates for the LVAR, in line with the claim that the long-run response of the variables are better estimated with the VECM. The performance of the misspecified DVAR is uniformly poor with the exception of lag zero.

Cointegration of the Observable Variables in a DFM
Consider again the factor model x it = χ it + it , rewritten here as where Λ is n × r, with n > r. The relationship between cointegration of the factors F t and cointegration of the variables x it is now considered. Let us recall that the the common factors F jt are assumed to be orthogonal to the idiosyncratic components ks for all i, j, t, s, i.e., Eχ t s = 0 n×n . for all t, s, see the Introduction. The other assumptions on model (22) are asymptotic, see e.g., Forni et al. (2000); Forni and Lippi (2001); Watson 2002a, 2002b), and put no restriction on the matrix Λ and the vector t for a given finite n. In particular, the first r eigenvalues of the matrix ΛΛ must diverge as n → ∞, but this has no implications on the rank of the matrix Λ corresponding to, say, n = 10. However, as we see in Proposition 5 (iii), if the idiosyncratic components are I(0), then, independently of Λ, all p-dimensional subvectors of x t are cointegrated for p > q − d, which is at odds with what is observed in the macroeconomic datasets analyzed in the empirical Dynamic Factor Model literature. This motivates assuming that t is I(1). In that case, see Proposition 5 (i), cointegration of x t requires that both the common and the idiosyncratic components are cointegrated. Some results are collected in the statement below. Now, (23) implies that where λ p (A) denotes the smallest eigenvalue of the hermitian matrix A; this is one of the Weyl's inequalities, see Franklin (2000), p. 157, Theorem 1. Because the spectral density matrices are non-negative definite, the right hand side in (24) vanishes if and only if both terms on the right hand side vanish, i.e., the spectral density of ∆x (p) t is singular at zero if and only if the spectral densities of ∆χ (p) t and ∆ (p) t are singular at zero. By definition 4, (i) is proved. Without loss of generality we can assume that S(L) = I r . By substituting (21) in (22), we obtain where on the right hand side the only non-stationary terms are T t and possibly t . By recalling that T t = ξ ∑ t s=1 v 2s where ξ is of dimension r × (q − d) and rank q − d, and by defining G t = Λ[G 1 (L)v 1t + G 2 (L)v 2t + K] and T t = ∑ t s=1 v 2s , we can rewrite (25) as where Λ (p) and G (p) t have an obvious definition. Of course cointegration of the common components t is equivalent to cointegration of Λ (p) ξT t , which in turn is equivalent to rank(Λ (p) ξ) < p. Statement (ii) follows from rank Λ (p) ξ ≤ min rank(Λ (p) ), rank(ξ) .
The first part of (iii) is obvious. Assume now that i.e., if c p > q − d, then the intersection between V χ and V is non-trivial, so that x (p) t is cointegrated.

Summary and Conclusions
The paper studies representation theory for singular I(1) stochastic vectors, the factors of an I(1) Dynamic Factor Model in particular. Singular I(1) vectors are cointegrated, with a cointegrating rank c equal to r − q, the dimension of y t minus its rank, plus d, with 0 ≤ d < q.
If (1 − L)y t has rational spectral density, under assumptions that generalize to the singular case those in Johansen (1995), we show that y t has an error correction representation with c error terms, thus generalizing the Granger representation theorem (from MA to AR) to the singular case. Important consequences of singularity are that generically: (i) the autoregressive matrix polynomial of the error correction representation is of finite degree, (ii) the white noise vector driving (1 − L)y t is fundamental.
We find that y t is driven by r − c permanent shocks and d = c − (r − q) transitory shocks, not c as in the nonsingular case.
Using simulated data generated by a simple singular VECM, confirms previous results, obtained for nonsingular vectors, showing that under cointegration the long-run features of impulse-response functions are better estimated using a VECM rather than a VAR in the levels.
In Section 4 we argue that stationarity of the idiosyncratic components in a DFM produce an amount of cointegration for the observable variables x it that is not observed in the datasets that are standard in applied Dynamic Factor Model literature. Thus the idiosyncratic vector in those datasets is likely to be I(1), so that an estimation strategy robust to the assumption that some of the idiosyncratic variables it are I(1) should be preferred.
The results in this paper are the basis for estimation of I(1) Dynamic Factor Models with cointegrated factors, which is developed in the companion paper (Barigozzi et al. 2019).
R λ vanishes on an open set if and only if it vanishes on the whole R λ , which contradicts the existence of a point in R λ where Q is false.

Lemma A3.
Recall that a zero of M(z) is a complex number z * such that rank(M(z * )) < q. If M(z) has two q × q submatrices whose determinants have no common roots, then M(z) is zeroless.
Proof. If z * is a zero of M(z), then z * is a zero of all the q × q submatrices of M(z).
For the statement and proof of our last result it is convenient to make explicit the dependence of the matrix M(z) and its submatrices on the vector p. Thus we use M p (z), etc. The parameters of the matrix S(L) play no role here. Hence, with no loss of generality, we assume s 2 = 0, so that λ = (r − c)(r + q) + rq(s 1 + 2). Lemmas A2-A4 below imply that Assumption 3 holds generically in Π. Proof. Assume that r = q + 1. To each p ∈ Π there corresponds the matrix

Lemma
Of course, the definition of M p (z) makes sense for all p ∈ R λ , see Equation (19). Let M p 1 (z) and M p 2 (z) be the matrices obtained from M p (z) by removing the first and the last row respectively. We have: We will construct a point p * ∈ R λ such that: (A) the coefficient of z d 1 in det(M p * 1 (z)) and the coefficient of z d 2 in det(M p * 2 (z)) (the leading coefficients) do not vanish, (B) the resultant of det(M p * 1 (z)) and det(M p * 2 (z)) does not vanish. Let us firstly define a family of matrices, denoted by M(z), obtained by specifying η η η, ξ, ξ ⊥ , B * and E(z) in the following way: where: the entries e, k i , h i , f i and g i being scalar polynomials of degree s 1 . We denote by q 1 the vector including the coefficients of the polynomials f i , i = 1, . . . , d and k i , i = 1, . . . , (q − d), a total of q(s 1 + 1) coefficients, by q 2 the vector including the coefficients of the polynomials e, g i , i = 1, . . . , d and h i , i = 1, . . . , (q − d − 1), a total of q(s 1 + 1) coefficients, by q 0 the vector including the zeros and the ones in the definition of ξ, η, B * , E, and define q = (q 0 q 1 q 2 ), which is a λ-dimensional parameter vector. We put no restriction on q 1 and q 2 , so that both can take any value in R ν , with ν = q(s 1 + 1). Note that q does not necessarily belong to Π. We have: The matrix M q (z) has zero entries except for the diagonal joining the positions (1, 1) and (q, q), and the diagonal joining (2, 1) and (q + 1, q). The matrices M q 1 (z) and M q 2 (z) are upper-and lower-triangular, respectively, and Note that det(M q 1 (z)) does not depend on q 2 , while det(M q 2 (z)) does not depend on q 1 . Thus we use the notation δ . Now: (i) Let q * 2 ∈ R ν be such that none of the leading coefficients of the polynomials e, g i and h i vanishes. Of course M q * 2 2,d 2 = d 2 = 0. (ii) Letž be a root of δ q * 2 2 (z). Ifž = 1 thenž is not a root of δ q 1 1 (z) for all q 1 ∈ R ν . Suppose thatž is a root of g j (z), for some j. As the parameters of the polynomials f i and k i are free to vary in R ν , then, generically in R ν , δ q 1 1 (ž) = 0. Iterating for all roots of δ q * 2 2 (z), generically in R ν , δ q 1 1 (z) and δ q * 2 2 (z) have no roots in common. Moreover, generically in R ν , M q 1 1,d 1 1 (z) and δ q * 2 2 (z) have no roots in common. (iii) Now let p * = (q 0 q * 1 q * 2 ), so that det(M p * 1 (z)) = δ q * 1 1 (z)), det(M p * 2 (z)) = δ q * 2 2 (z).
Using (i) and (ii), (A) the leading coefficients of det(M p * 1 (z)) and det(M p * 2 (z)) do not vanish, (B) det(M p * 1 (z)) and det(M p * 2 (z)) have no root in common so that their resultant does not vanish. This proves the proposition for r = q + 1.
Generalizing this result to r > q + 1 is easy. Let us define the family N(z) in the following way: (a) specify η , ξ, E 1 (z) and E 3 (z) as in the definition of M(z), (b) then let We have: It is easy to see that the (q + 1) × q lower submatrix of N(z) is identical to the matrix M q (z) in (A1).
Appendix A.2. if R > Q and C ≤ Q, Assumptions 5 and 6 Do Not Imply That e t Is a Non-Cointegrated I(0) Process.
Let r = 3, q = 2, S(L) = I 3 , In this case c = 2 and d = 1, so that c = q (see Remark 6). We have We see that Assumptions 5 and 6 hold. However, rank(ξ ⊥ B * ) = 1, so that e t , though being I(0), is not a non-cointegrated I(0) process. On the other hand, if the (3, 2) entry of B * is 1 instead of 0, e t is non-cointegrated.

Appendix B. Non Uniqueness
In Proposition 3 we prove that a singular I(1) vector with cointegrating rank c has a finite error correction representation with c error terms. On the other hand, as we have seen in Remark 5, when c = r − q the singular vector y t has also an autoregressive representation in the differences, i.e., a representation with zero error terms. In Appendix B.1 we give an example hinting that y t has error correction representations with any number of error terms between d and c. However, in Appendix B.2 we show that all such representations produce the same impulse-response functions.

Appendix B.1. Alternative Representations with Different Numbers of Error Terms
Let S(L) = I r and consider the following example, with r = 3, q = 2, c = 2, so that d = 1: where (1 − L)Ê(L) gathers the second and third terms in M(L). If the assumptions of Proposition 2 hold, we obtain an error correction representation with error terms However, we also have Under suitable assumptions on the coefficients b * ij andĚ(L), assuming in particular that the matrix is nonsingular, the matrixM(L) is zeroless and has therefore a finite-degree left inverse. Proceeding as in Proposition 2, we obtain an alternative error correction representation with just one error term, namely y 1t − y 2t . This example should be sufficient to convey the idea that y t admits error correction representations with a minimum d and a maximum c = r − q + d of error terms.
The problem of error correction representations, with different numbers of error terms, has been recently addressed in Deistler and Wagner (2017). An implication of their main result (see Theorem 1, p. 41) is that if y t has the error correction representatioñ A(L)y t =Ã * (L)(1 − L)y t +Ã(1)y t−1 =Bũ t , and rank(Ã(1)) < c (the number of error terms is not the maximum), thenÃ(L) andB are not left coprime.
The consequences of Deistler and Wagner's paper have not yet been developed. In Propositions 2 and 3 we have only considered representations with c error terms. On non-uniqueness of autoregressive representations for singular vectors with rational spectral density see also Chen et al. (2011); Anderson et al. (2012); Forni et al. (2015).

Appendix B.2. Uniqueness of Impulse-Response Functions
Suppose that the assumptions of Proposition 2, weak form, hold. Let y t be a solution of Equation (10), so that (1 − L)y t = S(L) −1 B(L)u t , and suppose that y t has the autoregressive representatioñ whereÃ(L) is a rational matrix with poles outside the unit circle,Ã(0) = I r ,ũ t is a nonsingular q-dimensional white noise,B is a full rank r × q matrix 5 . We havẽ The assumption thatB is full rank and the argument used e.g., in Brockwell and Davis (1991), p. 111, Problem 3.8, imply thatũ t is fundamental for (1 − L)y t . Thusũ t = Qu t , where Q is a nonsingular q × q matrix (see Rozanov (1967), p. 57), andBũ t = [BQ]u t .
On the other hand, from (A2)

Appendix C. Data Generating Process for the Simulations
The simulation results of Section 3.4 are obtained using the following specification of (14): A(L)y t = A * (L)(1 − L)y t + αβ y t−1 = C(0)u t = GHu t , where r = 4, q = 3, c = 3, the degree of A(L) is 2, so that the degree of A * (L) Multiplying both sides of (A3) by (1 − L) and using (A2), we obtainÃ(L)S(L) −1 B(L)u t = (1 − L)Bũ t . Comparing the spectral densities of right-and left-hand terms, it is easy to prove thatũ t must be a q-dimensional, nonsingular white noise and the rank ofB must be q.
(see Watson 1994). To get a VAR(2) we set U (L) = I r − U 1 L, and V(L) = I r , and then, by rewriting M(L) = I r − M 1 L, we get A 1 = M 1 + U 1 , and A 2 = −M 1 U 1 . Regarding the generation of the data, the diagonal entries of the matrix U 1 are drawn from a uniform distribution between 0.5 and 0.8, while the extra-diagonal entries are drawn from a uniform distribution between 0 and 0.3. U 1 is then multiplied by a scalar so that its largest eigenvalue is 0.6. The matrix G is generated as in Bai and Ng (2007): (1)G is an r × r diagonal matrix of rank q wherẽ g ii is drawn from the uniform distribution between 0.8 and 1.2, (2)Ǧ is obtained by orthogonalizing an r × r uniform random matrix, (3) G is equal to the first q columns of the matrixǦG 1/2 . Lastly, the orthogonal matrix H is such that the upper 3 × 3 submatrix of GH is lower triangular. The results are based on 1000 replications. The matrices U 1 , G and H are generated only once (the numerical values are available on request) so that the set of impulse responses to be estimated is the same for all replications, whereas the vector u t is redrawn from N (0, I 4 ) at each replication.