Assessing asset-liability risk with neural networks

We introduce a neural network approach for assessing the risk of a portfolio of assets and liabilities over a given time period. This requires a conditional valuation of the portfolio given the state of the world at a later time, a problem that is particularly challenging if the portfolio contains structured products or complex insurance contracts which do not admit closed form valuation formulas. We illustrate the method on different examples from banking and insurance. We focus on value-at-risk and expected shortfall, but the approach also works for other risk measures.


Introduction
Different financial risk management problems require an assessment of the risk of a portfolio of assets and liabilities over a given time period. Banks, mutual funds and hedge funds usually calculate value-at-risk numbers for different risks over a day, a week or longer time periods. Under Solvency II, insurance companies have to calculate one-year value-at-risk, whereas the Swiss Solvency Test demands a computation of one-year expected shortfall. A determination of these risk figures requires a conditional valuation of the portfolio given the state of the world at a later time, called risk horizon. This is particularly challenging if the portfolio contains structured products or complicated insurance policies which do not admit closed form valuation formulas. In theory, the problem can be approached with nested simulation, which is a two-stage procedure. In the outer stage, different scenarios are generated to model how the world could evolve until the risk horizon, whereas in the inner stage, cash flows occurring after the risk horizon are simulated to estimate the value of the portfolio conditionally on each scenario; see, e.g., Lee (1998), Glynn and Lee (2003), Gordy and Juneja (2008), Broadie et al. (2011) or Bauer et al. (2012). While nested simulation can be shown to converge for increasing sample sizes, it is often too time-consuming to be useful in practical applications. A more pragmatic alternative, usually used for short risk horizons, is the delta-gamma method, which approximates the portfolio loss with a second order Taylor polynomial; see, e.g., Rouvinez (1997), Britten-Jones and Schaefer (1999) or Duffie and Pan (2001). If first and second derivatives of the portfolio with respect to the underlying risk factors are accessible, the method is computationally efficient, but its accuracy depends on how well the second order Taylor polynomial approximates the true portfolio loss. Similarly, the replicating portfolio approach approximates future cashflows with a portfolio of liquid instruments that can be priced efficiently; see e.g., Wüthrich (2016), Pelsser and Schweizer (2016), Natolski and Werner (2017) or Cambou and Filipović (2018). Building on work on American option valuation (see e.g., Carriere, 1996;Longstaff and Schwartz, 2001;Tsitsiklis and Van Roy, 2001), Broadie et al. (2015) as well as Ha and Bauer (2019) have proposed to regress future cash flows on finitely many basis functions depending on state variables known at the risk horizon. This gives good results in a number of applications. But typically, for it to work well, the basis functions have to be chosen well-adapted to the problem.
In this paper we use a neural network approach to approximate the value of the portfolio at the risk horizon. Since our goal is to estimate tail risk measures such as value-at-risk and expected shortfall, we employ importance sampling when training the networks and estimating the risk figures. In addition, we try different regularization techniques and test the adequacy of the neural network approximations using the defining property of the conditional expectation. Neural networks have also been used for the calculation of solvency capital requirements by Hejazi and Jackson (2017), Fiore et al. (2018) and Castellani et al. (2019). But they all exclusively focus on value-at-risk, do not make use of importance sampling and apply neural networks in a slightly different way. Hejazi and Jackson (2017) develop a neural network interpolation scheme within a nested simulation framework. Fiore et al. (2018) and Castellani et al. (2019) both use reduced-size nested Monte Carlo to generate training samples for the calibration of the neural network. Here, we directly regress future cash-flows on a neural network without producing training samples.
The remainder of the paper is organized as follows. In Section 2, we set up the underlying risk model, introduce the importance sampling distributions we use in the implementation of our method and recall how value-at-risk and expected shortfall can be estimated from a finite sample of simulated losses. Section 3 discusses the training, validation and testing of our network approximation of the conditional valuation functional. In Section 4 we illustrate our approach on three typical risk calculation problems: the calculation of risk capital for a single put option, a portfolio of different call and put options and a variable annuity contract with guaranteed minimum income benefit. Section 5 concludes.

Asset-liability risk
We denote the current time by 0 and are interested in the value of a portfolio of assets and liabilities at a given risk horizon τ > 0. Suppose all relevant events taking place until time τ are described by a d-dimensional random vector X = (X 1 , . . . , X d ) defined on a measurable space (Ω, F) that is equipped with two equivalent probability measures, P and Q. We think of P as the real-world probability measure and use it for risk measurement. Q is a risk-neutral probability measure specifying the time-τ value of the portfolio through where v is a measurable function from R d to R describing the part of the portfolio whose value is directly given by the underlying risk factors X 1 , . . . , X d ; C t1 , . . . , C t I are random cash flows occurring at times τ < t 1 < · · · < t I ; and N τ , N t1 , . . . , N t I model the evolution of a numeraire process used for discounting. Our goal is to compute ρ(L) for a risk measure ρ and the time τ net liability L = −V , which can be written as Our approach works for a wide class of risk measures ρ. But for the sake of concreteness, we concentrate on value-at-risk and expected shortfall. We follow the convention of McNeil et al. (2015) and define value-at-risk at a level α ∈ (0, 1) such as 0.95, 0.99 or 0.995, as the left α-quantile and expected shortfall as There exist different definitions of VaR and ES in the literature. But if the distribution function F L of L is continuous and strictly increasing, they are all equivalent 1 .

Conditional expectations as minimizing functions
Let P ⊗ Q be the probability measure on F given by where π is the distribution of X under P and Q[A | X = x] is a regular conditional version of Q given X. We assume that Y belongs to L 2 (Ω, F, P ⊗ Q). Then L is of the form L = l(X) for a measurable function l: R d → R minimizing the mean squared distance see, e.g., Bru and Heinich (1985). Note that l minimizes (3) if and only if is a regular conditional Q-distribution of Y given X. This shows that l is unique up to π-almost sure equality and can alternatively be characterized by for any probability measure ν on R d that is equivalent to π. In particular, if P ν is the probability measure on σ(X) under which X has distribution ν, and Y is in L 2 (Ω, F, P ν ⊗ Q), then l can be determined by minimizing instead of (3). Since we are going to approximate the expectation (4) with averages of Monte Carlo samples, this will give us some flexibility in simulating (X, Y ).

Importance sampling
Assume now that X is of the form X = h(Z) for a transformation h: R k → R d and a k-dimensional random vector Z with density f : R k → R + . To give more weight to important outcomes, we introduce a second density g on R k satisfying g(z) > 0 for all z ∈ R k , where f (z) > 0. Let Z g be a k-dimensional random vector with density g and Z g,1 , . . . , Z g,n independent simulations of Z g . We reorder the simulations such that and consider the random weights ) .
Since f (Z g )/g(Z g ) is integrable, one obtains from the law of large numbers that for every threshold is an unbiased consistent estimator of the exceedance probability If in addition, g can be chosen so that f (Z g )/g(Z g ) is square-integrable, it follows from the central limit theorem that (6) is asymptotically normal with a standard deviation of order n −1/2 . In any case, n i=1 w (i) converges to 1 and the random measure approximates the distribution of L. Accordingly, we adapt the P-simulation estimators (5) by replacing i/n with i m=1 w (m) . This yields the g-simulation estimates If the α-quantile l α of L = l•h(Z) were known, the exceedance probability P[L ≥ l α ] could be estimated by means of (6) with x = l α . To make the procedure efficient, one would try to find a density g on R k from which it is easy to sample and such that the variance becomes as small as possible. Since does not depend on g, it can be seen that g is a good importance sampling (IS) density if is small. We use the same criterion as a basis to find a good IS density for estimating VaR α and ES α ; see e.g., Glasserman (2003) for more background on importance sampling.

The case of an underlying multivariate normal distribution
In many applications X can be modeled as a deterministic transformation of a random vector with a multivariate normal distribution; see e.g., the examples in Section 4 below. In this case, it can be written as for a function u: R p → R d , a p × k-matrix A and a k-dimensional random vector Z with a standard normal density f . To keep the problem tractable, we look for a suitable IS density g in the class of N k (m, I k )-densities f m with different mean vectors m ∈ R k and covariance matrix equal to the k-dimensional identity matrix I k . To determine a good choice of m, let v ∈ R p be the vector with components with probability 1 − α, and l • u(Az) tends to be large for z ∈ D. We choose m as the maximizer of f (z) subject to z ∈ D, which leads to It is easy to see that this yields So, if D is sufficiently similar to the region z ∈ R k : l • u(Az) ≥ l α , it can be seen from (7) that this choice of IS density will yield a reduction in variance.

Neural network approximation
Usually, the distribution of the risk factor vector X is assumed to be known. For instance, in all our examples in Section 4, they are of the form (8). On the other hand, in many real-world applications, there is no closed form expression for the loss function l: R d → R mapping X to L. Therefore, we approximate L = l(X) with l θ (X) for a neural network l θ : R d → R; see e.g., Goodfellow et al. (2016). We concentrate on feedforward neural networks of the form where • q 0 = d, q J = 1, and q 1 , . . . , q J−1 are the numbers of nodes in the hidden layers 1, . . . , J − 1; • a θ j are affine functions of the form a θ j (x) = A j x + b j for matrices A j ∈ R qj ×qj−1 and vectors b j ∈ R qj , for j = 1, . . . , J; • ϕ is a non-linear activation function used in the hidden layers and applied component-wise.
In the examples in Section 4 we choose ϕ = tanh; • ψ is the final activation function. For a portfolio of assets and liabilities a natural choice is ψ = id. To keep the presentation simple, we will consider pure liability portfolios with loss L > 0 in all our examples below. Accordingly, we choose ψ = exp.
The parameter vector θ consists of the components of the matrices A j and vectors b j , j = 1, . . . , J.
So it lives in R q for q = J j=1 q j (q j−1 + 1). It remains to determine the architecture of the network (that is, J and q 1 , . . . , q J−1 ) and to calibrate θ ∈ R q . Then VaR and ES figures can be estimated as described in Section 2 by simulating l θ (X).

Training and validation
In a first step we take the network architecture (J and q 1 , . . . , q J−1 ) as given and try to find a minimizer of θ → E (l θ (X) − Y ) 2 , where the expectation is either with respect to P ⊗ Q or P ν ⊗ Q for an IS distribution ν on R d . To do that we simulate realizations (X m , Y m ), m = 1, . . . , M 1 + M 2 , of (X, Y ) under the corresponding distribution. The first M 1 simulations are used for training and the other M 2 for validation. More precisely, we employ a stochastic gradient descent method to minimize the Monte Carlo approximation based on the training samples . At the same time we use the validation samples to check whether is decreasing as we are updating θ.

Regularization through tree structures
If the number q of parameters is large, one needs to be careful not to overfit the neural network. For instance, in the extreme case, the network could be so flexible that it can bring (9) down to zero even in cases where the true conditional expectation l( To prevent this, one can generate the training samples by first simulating N 1 realizations X i of X and then for every X i , drawing N 2 simulations Y i,j from the conditional distribution of Y given X i . In the simple example of Section 4.1, we chose N 2 = 1. In Sections 4.2 and 4.3 we used N 2 = 5.

Stochastic gradient descent
In principle, one can use any numerical method to minimize (9). But stochastic gradient descent methods have proven to work well for neural networks. We refer to Ruder (2016) for an overview of different (stochastic) gradient descent algorithms. Here, we randomly 2 split the M 1 training samples into b mini-batches of size B. Then we update θ based on the θ-gradients of We use batch normalization and Adam updating with the default values from TensorFlow. After b gradient steps, all of the training data have been used once and the first epoch is complete. For further epochs, we reshuffle the training data, form new mini-batches and perform b more gradient steps. The procedure is repeated until the training error (9) stops to decrease or the validation error (10) starts to increase.

Initialization
We follow standard practice and initialize the parameters of the network randomly. The final operation of the network is x → ψ(A J x + b J ). Since the network tries to approximate Y , and in all our examples below we use ψ = exp, we initialize the last bias as For the other parameters we use Xavier initialization; see Glorot and Bengio (2010).

Backtesting the network approximation
After having determined an approximate minimizer θ ∈ R q for a given network architecture, one can test the adequacy of the approximation l θ (X) of the true conditional expectation l( The quality of the approximation depends on different aspects: (i) Generalization error: The true conditional expectation E Q [Y | X] is of the form l(X) for the unique 3 measurable function l: R d → R minimizing the mean squared distance E (l(X) − Y ) 2 . To approximate l we choose a network architecture and try to find a θ ∈ R q that minimizes the empirical squared But if the samples (X m , Y m ), m = 1, . . . , M 1 , do not represent the distribution of (X, Y ) well, (11) might not be a good approximation of the true expectation E l θ (X) − Y 2 .
(ii) Numerical minimization method: The minimization of (11) usually is a complex problem, and one has to employ a numerical method to find an approximate solution θ. The quality of l θ (X) will depend on the performance of the numerical method being used.
(iii) Network architecture: It is well known that feedforward neural networks with one hidden layer have the universal approximation property; see e.g. Cybenko (1989), Hornik et al. (1989) or Leshno et al. (1993). That is, they can approximate any continuous function uniformly on compacts to any degree of accuracy if the activation function is of a suitable form and the hidden layers contain sufficiently many nodes. As a consequence, E (l θ (X) − l(X)) 2 can be made arbitrarily small if the hidden layer is large enough and θ is chosen appropriately. However, we do not know in advance how many nodes we need. And moreover, feedforward neural networks with two or more hidden layers have shown to yield better results in different applications.
Since we simulate from an underlying model, we are able to choose the size M 1 of the training sample large and train extensively. In addition, for any given network architecture, we also evaluate the empirical squared distance (11) on the validation set (X m , Y m ), m = M 1 +1, . . . , M 2 . So we suppose the generalization error is small and our numerical method finds a good approximate minimizer θ of (11). But since we do not know whether a given network architecture is flexible enough to provide a good approximation to the true loss function l, we test for each trained network whether it satisfies the defining properties of a conditional expectation.
The loss function l: R d → R is characterized by for all measurable functions ξ: R d → R such that Y ξ(X) is integrable. Ideally, we would like l θ to satisfy the same condition. But there will be an approximation error, and (12) cannot be checked for all measurable functions ξ: R d → R satisfying the integrability condition. Therefore, we select finitely many measurable subsets B i ⊆ R d , i = 1, . . . , I. Then we generate M 3 more samples (X m , Y m ) of (X, Y ) and test whether the differences are sufficiently close to zero. If this is not the case, we change the network architecture and train again.

Examples
As examples, we study three different risk assessment problems from banking and insurance. For comparison we generated realizations (X m , Y m ) of (X, Y ) under P ⊗ Q as well as P ν ⊗ Q, where ν is the IS distribution on R d obtained by changing the distribution of X. In all our examples, X has a transformed normal distribution as in Section 2.4. In each case we used 1.5 million simulations with mini-batches of size 10,000 for training, 500,000 simulations for validation and 500,000 for backtesting. After training and testing the network, we simulated another 500,000 realizations of X, once under P and then under P ν , to estimate VaR 99.5% (L) and ES 99% (L).
We implemented the algorithms in Python. To train the networks we used the TensorFlow package.

Single put option
As a first example, we consider a liability consisting of a single put option with strike price K = 100 and maturity T = 1/3 on an underlying asset starting from s 0 = 100 and evolving according to for an interest rate r = 1%, a drift µ = 5%, a volatility σ = 20%, a P-Brownian motion W P and a Q-Brownian motion W Q . As risk horizon we choose τ = 1/52. The time τ -value of this liability is Using Itô's formula, one can write where Z and V are two independent standard normal random variables under P ⊗ Q. It is well-known that L is of the form P (S τ , r, σ, T − τ ), where P is the Black-Scholes formula for a put option. This allows us to calculate reference values for VaR α (L) and ES α (L).
To train the neural network approximation of L, we simulated realizations of the pair (X, Y ) for X = S τ and Y = e −r(T −τ ) (K − S T ) + . For comparison, we first simulated according to P ⊗ Q and then according to P ν ⊗ Q for an IS distribution P ν . Clearly, L is decreasing in Z. Therefore, we chose P ν so as to make Z normally distributed with mean equal to the 1 − α-quantile of a standard normal and variance 1. Since x → P (x, r, σ, T − τ ) is a simple one-dimensional function, we selected a network with a single hidden layer containing 5 nodes. This is sufficient to approximate P and faster to train than a network with more hidden layers. We did not use the tree structure of Section 3.1.1 for training (that is, N 2 = 1) and trained the network over 40 epochs.
It can be seen in Figure 1 that the empirical squared distance on both, the training and validation data, decreases with a very similar decaying pattern. This provides a first validation of our approximation procedure.   Figure 2 shows the empirical evaluation of (a) and (b) of Section 3.2 on the test data after each training epoch.
Similarly, Figure 3 illustrates the empirical evaluation of (c) of Section 3.2 on the test data after each training epoch for the sets where s β denotes the β-quantile of X = S τ . Training, validation and testing with IS worked similarly.
Once the network has been trained and tested, one can estimate VaR and ES numbers. Figure 4 shows our results for increasing sample sizes. The left panel shows our estimate of VaR 99.5% (L) without and with IS. Plugging the 0.5%-quantile of S τ into the Black-Scholes formula gives a reference value of 8.3356. Our method yielded 8.3358 without and 8.3424 with importance sampling. The right panel shows our results for ES 99% (L). Transforming simulations of S τ with the Black-Scholes formula and using the empirical ES estimate (5) resulted in a reference value of 8.509. Without importance sampling, the neural network learned a value of 8.456 versus 8.478 with importance sampling. It can be seen that in both cases, IS made the method more efficient. It has to be noted that for increasing sample sizes, the VaR and ES estimates converge to the corresponding risk figures in the neural network model, which are not exactly equal to their analogs in the correct model. But it can be seen that the blue lines are close to their final values after very few simulations.

Portfolio of call and put options
In our second example we introduce a larger set of risk factors. We consider a portfolio of 20 short call and put options on different underlying assets with initial prices s i 0 > 0 and dynamics for P-Brownian motions W P,i and Q-Brownian motions W Q,i , i = 1, . . . , 20 such that (W P,1 , . . . , W P,20 ) is a multivariate Gaussian process under P with an instantaneous correlation of 30% between different components and (W Q,1 , . . . , W Q,20 ) is a multivariate Gaussian process under Q, also with instantaneous correlation of 30% between different components. We set s i 0 = 100 for all i = 1, . . . , 20 and r = 1%. The drift and volatility parameters are assumed to be µ i = µ 10+i = (2.5+i/2)% and σ i = σ 10+i = (14+i)%, i = 1, . . . , 10. As in the first example, we choose a maturity of T = 1/3 and a risk horizon of τ = 1/52. We assume all options have the same strike price K = 100. Then the time-τ value of the liability is where X is the vector S 1 τ , . . . , S 20 τ . In this example we trained a neural network with two hidden layers containing 15 nodes each. We first simulated according to P ⊗ Q and trained for 100 epochs. Figure 5 shows the decay of the empirical squared distance on the training and validation data set. After training and testing the network under P ⊗ Q, we did the same under P ν ⊗ Q for the IS distributions ν resulting from the procedure of Section 2.4 for α = 99.5% (for VaR) and 99% (for ES). The two plots in Figure 6 show the empirical evaluations of (a) and (b) on the test data under the IS measure P ν ⊗ Q corresponding to α = 99%.
As an additional test, we consider the two sets  where s i β is the β-quantile of S i τ under P ν ⊗ Q, and evaluate (c) on the test data generated under the IS distribution P ν ⊗ Q corresponding to α = 99%. The results are depicted in Figure 7.
After training and testing, we generated simulations of X to estimate VaR 99.5% (L) and ES 99% (L). The convergence for increasing sample sizes is shown in Figure 8. The reference values, 104.92 for 99.5%-VaR and 105.59 for 99%-ES, were obtained from the empirical estimates (5) by simulating S τ and using the Black-Scholes formula for each of the 20 options. The neural network estimates of 99.5%-VaR were 104.56 without and 104.48 with IS. Those of 99%-ES were 105.03 without and 104.65 with IS. In both cases, IS made the procedure more efficient.

Variable annuity with GMIB
As a third example we study a variable annuity (VA) with guaranteed minimum income benefit (GMIB). We consider the contract analyzed by Ha and Bauer (2019) using polynomial regression. At time 0 the contract is sold to an x-year old policyholder. If she is still alive at maturity T , she can choose between the balance of an underlying investment account and a lifetime annuity. Therefore, in case of survival, the time-T value of the contract is where S T is the account value, b a guaranteed rate and a x+T (T ) the time-T value of an annuity paying one unit of currency to a (x + T )-year old policyholder at times T + 1, T + 2, ... for as long as the person lives. The contract is exposed to three types of risk: investment risk, interest rate risk and mortality risk. We suppose the log-account value q t = log(S t ), the interest rate r t and the mortality rate µ x+t of our policyholder start from known constants q 0 , r 0 , µ x and for x + t ≤ 120, have P-dynamics for given parameters m, ζ, γ, κ, σ S , σ r , σ µ and P-Brownian motions W P,S , W P,r and W P,µ forming a three-dimensional Gaussian process with instantaneous correlations ρ 12 , ρ 13 and ρ 23 . We assume that our policyholder does not live longer than 120 years. Therefore, we set µ x+t ≡ ∞ for x + t > 120. The dynamics for x + t ≤ 120 under the risk-neutral probability Q are assumed to be where W Q,S , W Q,r , W Q,µ are Q-Brownian motions constituting a three-dimensional Gaussian process with the same instantaneous correlations as the corresponding P-Brownian motions. As Ha and Bauer (2019), we assume there is no risk premium for mortality and a constant risk premium λ for interest rate risk, such thatγ = γ − λσ r /ζ. Provided that the policyholder is still alive at the risk horizon τ < T , the value of the contract at that time is where we denote X = (q τ , r τ , µ x+τ ). Discounting with r s + µ x+s takes into account that the policyholder might die between τ and T . On the other hand, a possible death time between 0 and τ is not considered. This results in a conservative estimate of the capital requirement for the issuer of the contract. Alternatively, one could model the loss as I A L, where A ⊆ Ω is the event that the policyholder survives until time τ . We follow Ha and Bauer (2019) and set x = 55, τ = 1, T = 15, b = 10.792, q 0 = 4.605, m = 5%, σ S = 18%, r 0 = 2.5%, ζ = 25%, γ = 2%, σ r = 1%, λ = 2%, µ x = 1%, κ = 7%, σ µ = 0.12%, ρ 12 = −30%, ρ 13 = 6%, ρ 23 = −4%. Then is the time-t value of a pure endowment contract with maturity t + k. Since r and µ are affine, one has k E x+t (t) = F (t, k, r t , µ x+t ) for the function F (t, k, r t , µ x+t ) = A(t, t + k)e −Br(t,t+k)rt−Bµ(t,t+k)µx+t , with A, B r , and B µ as given in the Appendix of Ha and Bauer (2019). Moreover, one can write for the probability measure Q E given by Under P, X = (q τ , r τ , µ τ ) is a three-dimensional normal vector, and the conditional Q E -distribution of (q T , r T , µ x+T ) given X is normal too (the precise form of these distributions is given in the Appendix of Ha and Bauer (2019)). This makes it possible to efficiently simulate (X, Y ) under P ⊗ Q E , where Y = F (τ, T − τ, r τ , µ x+τ ) max e q T , b 50 k=1 F (T, k, r T , µ x+T ) .
To approximate L we chose a network with two hidden layers containing 4 nodes each and trained it for 40 epochs. Figure 9 shows the empirical squared distance without IS on the training and validation data set. The panels in Figure 10 illustrate the empirical evaluations of the test criteria (a) and (b) from Section 3.2. To test (c) from Section 3.2 we considered the sets B 1 = x ∈ R 3 : x 1 > q 70% and x 2 < r 30% and B 2 = x ∈ R 3 : x 1 < q 30% and x 2 > r 70% where q β and r β denote the β-quantiles of q τ and r τ under P ⊗ Q E ; see Figure 11.  Figure 11: Empirical evaluation of (c) of Section 3.2 for the sets B 1 (left) and B 2 (right) under P ⊗ Q E .
We also trained and tested under P ν ⊗ Q E for an IS distribution ν on R d . Our results are reported in Figure 12. Our estimate of VaR 99.5% (L) was 138.64 without and 138.52 with IS. In comparison, the VaR estimate obtained by Ha and Bauer (2019) using 37 monomials and 40 million simulations is 139.74. Our estimates of ES 99% (L) came out as 141.12 without and 142.12 with IS. There exist no reference values for this case.

Conclusion
In this paper we have developed a deep learning method for assessing the risk of an asset-liability portfolio over a given time horizon. It first computes a neural network approximation of the portfolio value at the risk horizon. Then the approximation is used to estimate a risk measure, such as valueat-risk or expected shortfall from Monte Carlo simulations. We have investigated how to choose the architecture of the network, how to learn the network parameters under a suitable importance sampling distribution and how to test the adequacy of the network approximation. We have illustrated the approach by computing value-at-risk and expected shortfall in three typical risk assessment problems from banking and insurance. In all cases the approach has worked efficiently and produced accurate results.