Multi-Objective Defocus Robust Source and Mask Optimization Using Sensitive Penalty

: The continuous decrease in the size of lithographic technology nodes has led to the development of source and mask optimization (SMO) and also to the control of defocus becoming stringent in the actual lithography process. Due to multi-factor impact, defocusing is always changeable and uncertain in the real exposure process. But conventional SMO assumes the lithography system is ideal, which only compensates the optical proximity e ﬀ ect (OPE) in the best focus plane. Therefore, to solve the inverse lithography problem with more uniformity of pattern in di ﬀ erent defocus variations, we proposed a defocus robust SMO (DRSMO) approach that is driven by a defocus sensitivity penalty function for the ﬁrst time. This multi-objective optimization samples a wide range of defocus disturbances and it can be proceeded by the mini-batch gradient descent (MBGD) algorithm e ﬀ ectively. The simulation results showed that a more robust defocus source and mask can be designed through DRSMO optimization. The defocus sensitivity factor s β maximally decreased 63.5% compared to conventional SMO, and due to the low error sensitivity and the depth of defocus (DOF), the process window (PW) was further enlarged e ﬀ ectively. Compared to conventional SMO, the exposure latitude (EL) maximally increased from 4.5% to 10.5% and DOF maximally increased 54.5% (EL = 5%), which proved the validity of the DRSMO method in improving the focusing performance.


Introduction
With the shrink in critical dimension (CD), the impact of the optical proximity effect (OPE) become obvious, it causes distortion in the exposure pattern and reduction of pattern fidelity and contrast, so that it must be corrected effectively. Besides, continuous shrinking also allows for control of the defocus in lithography to become increasingly stringent. In actual lithography processes, defocus is always uncertain at the wafer level, because of the unevenness of the wafer surface [1]. In addition, aberrations, thermal aberrations [2], thermal mask effects [3], and thick mask effects [4] all inevitably cause best-focus plane constant shift and further increase the OPE in off-focus conditions. Meanwhile, the continuous shrinkage of technology nodes has promoted the introduction of resolution enhancement technology (RET). Conventional RET, such as optical proximity correction (OPC) and source and mask optimization (SMO), generate the best exposure conditions by optimizing mask or simultaneously optimizing source and mask, respectively [5,6]. However, general RET methods assume the lithography system is ideal, which only compensates the OPE in the nominal condition [7][8][9]. But, with CD shrinkage, it results in the imaging quality becoming more sensitive to defocus, Figure 1 illustrates the DRSMO optimization framework which is composed of the forward calculation to evaluate pattern fidelity and the inverse optimization to update the source and mask parameters. In current lithography processes, source and mask are freeform pixel-based configurations. Therefore, both the source and mask can be represented by matrix form J and m, respectively. Each source element J x s ,y s ∈ (0, 1) represent normalized light intensity and each mask element m rs is subject to 0-1 binary distribution. To overcome the complexity of the constrained optimization problem, a parameters transfer was made to convert J k and m k to unconstrained sources mask parameters Ω k J and Ω k M (see Appendix B and Equation (A9)) in the kth iteration, respectively [8].

DRSMO Modeling
Appl. Sci. 2019, 9, x FOR PEER REVIEW 3 of 17 As for the forward calculation process, the printed resist pattern was calculated by the given light source and mask parameters through the corresponding physical process. Then, the printed pattern was compared with the target resist pattern to evaluate the pattern fidelity and CD error. Under the Abbe imaging principle, the aerial image that takes into account defocusing can be represented as ( ) defocus I β . For the model-based SMO method, the scalar imaging model is inaccurate in hyper-NA (NA > 1) immersion lithography systems [22]. Thus, we previously studied the vector imaging model for aerial image calculation [16], and it can be formulated as where sum J is the summation of all the source intensity and is used as a normalization factor.
wafer p E is the electric fields (x-, y-, and z-directions) in the exposure plane which can be expressed as It should be noted that the impact of defocusing on resulting aerial image Def can be described as an aberration of a sort [23]. This causes a distribution of the phase change on the ideal aperture which can be described as where i y is the direction cosine in the propagation direction and β is the defocus value.
Next, the resist image was adopted to the continuous derivable sig model [24], which can be expressed as As for the forward calculation process, the printed resist pattern was calculated by the given light source and mask parameters through the corresponding physical process. Then, the printed pattern was compared with the target resist pattern to evaluate the pattern fidelity and CD error. Under the Abbe imaging principle, the aerial image that takes into account defocusing can be represented as I de f ocus (β). For the model-based SMO method, the scalar imaging model is inaccurate in hyper-NA (NA > 1) immersion lithography systems [22]. Thus, we previously studied the vector imaging model for aerial image calculation [16], and it can be formulated as where J sum is the summation of all the source intensity and is used as a normalization factor. E wa f er p is the electric fields (x-, y-, and z-directions) in the exposure plane which can be expressed as E wa f er p where n w is the index of refraction, the magnification of R = 4 normally, V(x s , y s ) is the vector matrix for hyper-NA systems, C is the irradiance correction factor, U ideal is the ideal pupil filter, F [M near (x s , y s )] is the mask diffraction near field, E i (x s , y s ) is the electric field of the source that is represented by a 2 × 1 vector. Operators , F [ ], and F −1 [ ] are represented by matrix entry-by-entry multiplication, forward Fourier transform, and inverse Fourier transform, respectively. It should be noted that the impact of defocusing on resulting aerial image Def can be described as an aberration of a sort [23]. This causes a distribution of the phase change on the ideal aperture which can be described as where y i is the direction cosine in the propagation direction and β is the defocus value. Next, the resist image was adopted to the continuous derivable sig model [24], which can be expressed as Z(β) = sig I de f ocus (β) = 1 where a indicates the steepness of the sigmoid function, and t r is the threshold.

DRSMO Inverse Optimization Framework
As for the inverse optimization process in Figure 1, it is a continuous update source and mask parameter to meet the final target resist pattern and overcome the OPE. The inverse optimization process of DRSMO relies on the corresponding cost function establishment. In DRSMO, the multi-objective cost function is composed of the statistics expected in plentiful defocus disturbances, and the total cost function can be divided into two parts: the pattern fidelity part and defocus sensitivity part.
The pattern fidelity part is in light of Jia's [19] approach, which is defined as the expectation of the Euclidean distance between the target resistance pattern Z and the resistance pattern Z(β i ) in plentiful defocusing disturbances. It can be formulated as where β i is a stochastic variable representing defocusing disturbances. It is subject to a certain range of evenly distributed disturbances, namely, β i ∈ U(−α, α). β = β i represents the whole training set.
It should be noted that the sample range of ±α is selected according to the actual situation, larger α can theoretically lead to wider DOFs, but too large a DOF will be beyond the potential of optimization and Z, Z(β i ) β i , respectively. ε{ } is the mathematical expectation.
To the core ideal of DRSMO is introduced the defocus sensitivity penalty function, which aims at minimizing the expected quadratic change ratio of the aerial image to defocusing and can be formulated as This penalty function directly controls the change rate of the aerial image to the defocus, which improves the consistency of the pattern and CD in different defocus disturbances. Therefore, it is beneficial to optimize a more robust source and mask with a lower defocus sensitivity. Besides, due to the improvement of process robustness for the optimized system, the PW will also enlarge. Appendix A and Equation (A5) define the details about the analytical sensitivity penalty Y i , so that the total cost function G consists of the weighted sum of F and Y.
where ω is the weighting factor of the sensitivity part. Typically, ω = 0 means the optimization only operates at the fidelity part F. In Section 3, we will discuss that the optimization results only operate at the fidelity part or simultaneously operates at the fidelity part and sensitivity part.

DRSMO Optimization Algorithm
In our method, the DRSMO process can be regarded as a machine learning process, training variable β i as a stochastic disturbance in the cost function to solve this multi-objective optimization problem. Table 1 illustrates the optimization flow of the SGD and MBGD algorithm, respectively. In our previous works [25,26], SGD was adopted to calculate the gradient in a single training sample β i,k in each iteration with fast speed. However, due to the wider sample range of defocus disturbances, the SGD algorithm could not guarantee each iteration was conducted in the global optimal direction.
In order to give consideration to both optimization speed and accuracy, mini-batch gradient descent (MBGD) was proposed to traverse a part of the random defocus samples β i,k , β i+1,k , · · · , β i+l batch −1,k in each iteration, and l batch is the batch number in each iteration [27]. Different from the SGD method, MBGD updates part of the training set, thus leading to a relatively correct search direction, so it is easier to converge to the global optimal solution.
Both the SGD and MBGD algorithm need to calculate the analytic gradient expression of cost function, which can be formulated as where ∇ J and ∇ M are the gradient to the source parameter Ω J and mask pattern Ω M , respectively. We directed a large amount of study toward the derivation of the analytic gradient formula about sensitive penalty ∇ J Y i , ∇ M Y i , and more details can be found in Appendix B, Equations (A11) and (A13). Similarly, the expansion of ∇ J F i , ∇ M F i can be found in Appendix C, Equations (A19) and (A20). Table 1. Stochastic gradient descent (SGD) and mini-batch gradient descent (MBGD) optimization procedure.
SGD procedure 1. Initialization: Assign the starting source parameter Ω J , mask parameter Ω M , the source step size s J , the mask step size s M , the upper limit iteration number l smo 2. Optimization: Simultaneously update the source and mask patterns: Update the source and mask parameters

Output: the optimized source and mask parameters.
MBGD procedure 1. Initialization: Assign the starting source parameter Ω J mask parameter Ω M , the source step size s J , the mask step size s M , the upper limit iteration number l smo , the batch number l batch 2. Optimization: Simultaneously update the source and mask patterns: , · · · , ∇ M G k−1 i+l batch −1 respectively; Update the source and mask parameters 3. Output: the optimized source and mask parameters.

Simulation Conditions
We illustrate the DRSMO method in two test patterns as shown in Figure 2. The critical dimension (CD) of each pattern was 45 nm. Resistance patterns and mask were represented by a 201 × 201 matrix with a resolution of 5.625 nm × 5.625 nm and 22.500 nm × 22.500 nm per pixel, respectively. The imaging system parameters were set to be λ = 193 nm and NA = 1.2. The freeform source was represented by a 21 × 21 matrix which uses TE-polarization illumination. In this paper, the whole training set β = β i consisted of 900 random sampling points in the range of (−100 nm, 100 nm), since this sample range was extremely larger than initial DOF, which was without optimization. Taking into account both optimization speed and accuracy, the batch number l batch was set to be three per iteration and 300 iterations totally.
To evaluate the imaging fidelity, pattern error (PAE) refers to the Euclidean distance between the target pattern and the actual pattern in the resist. Generally, the smaller PAE means the higher fidelity of the lithographic imaging. It can be formulated as where Z is the binary target resist pattern and Z(β) is the actual resistance pattern under defocus β. Meanwhile, to evaluate the defocusing sensitivity quantitatively, we defined the defocus sensitivity factor S β as the change ratio of PAE to defocusing Moreover, to evaluate the process robustness in the actual exposure process, the PW was introduced to describe the restrictive relation between dose variation and focus variation. It was composed of two parameters, DOF and exposure latitude (EL). Exposure latitude is the allowable range of dose variation under a fixed defocus. Similarly, DOF is the largest acceptable defocus range under a fixed dose. Thus, PW consists of all pairs of DOF and EL which satisfy the exposure quality specification. Generally, taking the DOF when corresponding to ELs equal to 5% or 10% as process evaluation standard. Meanwhile, the PW representative calculation positions are marked at yellow lines in Figure 2.
We illustrate the DRSMO method in two test patterns as shown in Figure 2. The critical dimension (CD) of each pattern was 45 nm. Resistance patterns and mask were represented by a 201 × 201 matrix with a resolution of 5.625 nm × 5.625 nm and 22.500 nm × 22.500 nm per pixel, respectively. The imaging system parameters were set to be λ = 193 nm and NA = 1.2. The freeform source was represented by a 21 × 21 matrix which uses TE-polarization illumination. In this paper, the whole training set consisted of 900 random sampling points in the range of (−100 nm, 100 nm), since this sample range was extremely larger than initial DOF, which was without optimization. Taking into account both optimization speed and accuracy, the batch number batch l was set to be three per iteration and 300 iterations totally.
To evaluate the imaging fidelity, pattern error (PAE) refers to the Euclidean distance between the target pattern and the actual pattern in the resist. Generally, the smaller PAE means the higher fidelity of the lithographic imaging. It can be formulated as where Z  is the binary target resist pattern and ( ) Z β is the actual resistance pattern under defocus β . Meanwhile, to evaluate the defocusing sensitivity quantitatively, we defined the defocus sensitivity factor S β as the change ratio of PAE to defocusing Moreover, to evaluate the process robustness in the actual exposure process, the PW was introduced to describe the restrictive relation between dose variation and focus variation. It was composed of two parameters, DOF and exposure latitude (EL). Exposure latitude is the allowable range of dose variation under a fixed defocus. Similarly, DOF is the largest acceptable defocus range under a fixed dose. Thus, PW consists of all pairs of DOF and EL which satisfy the exposure quality specification. Generally, taking the DOF when corresponding to ELs equal to 5% or 10% as process evaluation standard. Meanwhile, the PW representative calculation positions are marked at yellow lines in Figure 2.

Optimization Results and Analysis
In order to illustrate the negative influence of defocus, Figure 3a-f shows the optimization proceeded by initial SMO which merely operated at the best focus plane [7], and the evaluation of the printed image was under different defocus planes. Figure 3a,b show the optimized source for initial SMO and optimized mask for initial SMO, respectively. Figure 3c shows the printed image at the best focus plane, and Figure 3d-f shows the printed image under 50 nm, 70 nm, and 100 nm defocus, respectively. It clearly shows that the PAE increased extremely with an increase of defocus, proving that the initial SMO could not compensate the defocus distortion because the cost function was not involved in the defocus term. However, the defocusing error inevitably existed in the actual lithography process, thereby it was necessary to gain a better and more robust defocusing via DRSMO. Appl. Sci. 2019, 9, x FOR PEER REVIEW 8 of 17 To further prove a robust improvement, Figure 4 depicts the defocus-PAE curves in the evaluation range of 0-100 nm for target 1 optimized systems. It should be noted that each weight factor ω corresponds to a set of optimized source and mask. The slope of each curve reflected the process robustness, and a lower slope meant lower sensitivity for the optimized systems to focus on shifting. It should be noted that the slope of each curve gradual decreased in the order of initial SMO (blue curve), DRSMO with ω = 0 (green curve), DRSMO with ω = 0.1 (red curve), DRSMO with ω = 0.2 (azury curve), and DRSMO with ω = 0.3 (purple curve). We concluded that DRSMO is beneficial to reduce defocusing sensitivity and to gain a more uniform exposure pattern within a long range of defocusing, which means a better system robustness against uncertain and changeable focus shifts in a real lithography process. The core idea of the DRSMO is to introduce the defocusing sensitivity Y Similarly, Figure 3g-l illustrates Peng's [14] SMO method which merely operates at an assigned defocus plane (100 nm defocusing). In this method, the established cost function can be formulated as the weight sum of the nominal term and defocus term distortion and PAE were acquired in each defocus plane. However, since this method merely operated at an assigned defocusing plane, the global fidelity was not so good. In Figure 3j under 50 nm and Figure 3k under 70 nm defocusing, apparent hot spots existed, shown in the center of the red circles.
Finally, the optimization results of the proposed DRSMO with ω = 0.2 are show in Figure 3m-r. It clearly shows that the distortion and PAE further declined in each defocus plane, so that more robust source and mask were designed through this optimization. Compare to SMO under an assigned defocusing plane, the DRSMO guaranteed better global fidelity in a wide range of defocus.
To further prove a robust improvement, Figure 4 depicts the defocus-PAE curves in the evaluation range of 0-100 nm for target 1 optimized systems. It should be noted that each weight factor ω corresponds to a set of optimized source and mask. The slope of each curve reflected the process robustness, and a lower slope meant lower sensitivity for the optimized systems to focus on shifting. It should be noted that the slope of each curve gradual decreased in the order of initial SMO (blue curve), DRSMO with ω = 0 (green curve), DRSMO with ω = 0.1 (red curve), DRSMO with ω = 0.2 (azury curve), and DRSMO with ω = 0.3 (purple curve). We concluded that DRSMO is beneficial to reduce defocusing sensitivity and to gain a more uniform exposure pattern within a long range of defocusing, which means a better system robustness against uncertain and changeable focus shifts in a real lithography process. The core idea of the DRSMO is to introduce the defocusing sensitivity Y to constrain the uniformity of printed patterns in different defocus variations. Thus, simulations further compare the optimization performance of DRSMO, which merely operate at the fidelity part F (ω = 0) and DRSMO driven by the sensitivity penalty (ω 0). For instance, comparing the DRSMO with ω = 0 (green curve) and DRSMO with ω = 0.1 (red curve) in Figure 4, the slope of the red curve is lower than that of the green curve, which infers that the introduction of the sensitivity penalty Y can further improve pattern uniformity with a wide range of defocus variations. It was proved that the validity of introduction sensitivity penalty Y further improves the optimization performance. In brief, to maximize the DRSMO optimization performance, both the fidelity part F and the penalty term Y must be introduced into the optimization framework, and the weight factor ω must be chosen appropriately. to constrain the uniformity of printed patterns in different defocus variations. Thus, simulations further compare the optimization performance of DRSMO, which merely operate at the fidelity part F (ω = 0) and DRSMO driven by the sensitivity penalty (ω ≠ 0). For instance, comparing the DRSMO with ω = 0 (green curve) and DRSMO with ω = 0.1 (red curve) in Figure 4, the slope of the red curve is lower than that of the green curve, which infers that the introduction of the sensitivity penalty Y can further improve pattern uniformity with a wide range of defocus variations. It was proved that the validity of introduction sensitivity penalty Y further improves the optimization performance.
In brief, to maximize the DRSMO optimization performance, both the fidelity part F and the penalty term Y must be introduced into the optimization framework, and the weight factor ω must be chosen appropriately. In actual lithography processes, PW is one of the critical evaluation criteria which refers to the exposure error tolerance. Figure 5 shows the PWs for the conventional SMO (blue curve), the DRSMO with the weight factor ω = 0 (green curve), ω = 0.1 (red curve), ω = 0.2 (azury curve), and ω = 0.3 (purple curve). It is illustrated that the PW of the proposed DRSMO was evidently larger than that of the initial SMO. For the initial SMO, the maximal EL was less than 5%, which was far below the actual exposure requirements. By using DRSMO, the EL maximal increased from 4.5% to 10.5%. Similar results were found when comparing the difference of DRSMO without the sensitive penalty and DRSMO with the sensitive penalty. For example, when comparing PW with ω = 0 (green curve) and ω = 0.1 (red curve), a wider PW was found for the red curve than the green curve. It is inferred that In actual lithography processes, PW is one of the critical evaluation criteria which refers to the exposure error tolerance. Figure 5 shows the PWs for the conventional SMO (blue curve), the DRSMO with the weight factor ω = 0 (green curve), ω = 0.1 (red curve), ω = 0.2 (azury curve), and ω = 0.3 (purple curve). It is illustrated that the PW of the proposed DRSMO was evidently larger than that of the initial SMO. For the initial SMO, the maximal EL was less than 5%, which was far below the actual exposure requirements. By using DRSMO, the EL maximal increased from 4.5% to 10.5%. Similar results were found when comparing the difference of DRSMO without the sensitive penalty and DRSMO with the sensitive penalty. For example, when comparing PW with ω = 0 (green curve) and ω = 0.1 (red curve), a wider PW was found for the red curve than the green curve. It is inferred that the sensitive penalty is helpful for improving system robustness so that it indirectly boosts PW. However, because the cost function does not involve terms which directly relate to EL and DOF, the relationship between weight factor ω and PW is uncertain and unclear. In this way, although DRSMO with ω = 0.3 had the best defocus robustness, the PW was shrunk because the weight factor was too large that it led to overfitting during the training process. This illustrates that a well-chosen weight factor ω is important to simultaneously improve robustness and PW.  Table 2 summarizes the target 1 comparison of optimization results for conventional SMO, DRSMO with ω = 0, ω = 0.1, ω = 0.2, and ω = 0.3, respectively. It should be noted that PAE sensitivity factor S β declined significantly with the ω increase. Compare with initial SMO, the largest decrease of S β in DRSMO with ω = 0.3 is 63.5%. Integrated consider the improvement of both S β and PW, ω = 0.1 is a relative reasonable weight factor for maximize optimization performance. Table 2. The target 1 optimized values of S β , depth of focus (DOF) (nm) corresponding to exposure latitude (El) equal to 5% and 8% for conventional SMO, and DRSMO with ω = 0, ω = 0.1, ω = 0.2, and ω = 0.3, respectively. Target 2 consisted of a series of vertical and horizontal mixed lines. Figure 6 shows the defocus-PAE curves in the evaluation range of (0 nm, 100 nm) for target 2; optimizations were proceeded by the MBGD algorithm. It should be noted that the slope of the DRSMO with a ω = 0.2 (azury curve) lower than that of DRSMO with a ω = 0 (green curve) and initial SMO (blue curve) proved the effectiveness of the sensitivity penalty. latitude (El) equal to 5% and 8% for conventional SMO, and DRSMO with ω = 0, ω = 0.1, ω = 0.2, and ω = 0.3, respectively.
Meanwhile, the improvement of PW for target 2 was still apparent for the DRSMO approach. Figure 7 shows the PWs for the initial SMO (blue curve) and the DRSMO with the weight factors ω = 0 (green curve) and ω = 0.2 (azury curve). It should be noted that the PW had no significant improvement for the DRSMO with ω = 0 compared to the initial SMO, but due to the lower defocus sensitivity, the PW of the DRSMO with ω = 0.2 was enlarged. It was apparent that the introduction of the sensitivity penalty was beneficial for further improvement of the PW. Similarly, Table 3 summarizes the comparison of the target 2 optimization results for the initial SMO and DRSMO with ω = 0 and ω = 0.2, respectively. It shows that the DOF corresponding to EL = 5% maximally increased by 54.5%.  Figure 6 shows the defocus-PAE curves in the evaluation range of (0 nm, 100 nm) for target 2; optimizations were proceeded by the MBGD algorithm. It should be noted that the slope of the DRSMO with a ω = 0.2 (azury curve) lower than that of DRSMO with a ω = 0 (green curve) and initial SMO (blue curve) proved the effectiveness of the sensitivity penalty.
Meanwhile, the improvement of PW for target 2 was still apparent for the DRSMO approach. Figure 7 shows the PWs for the initial SMO (blue curve) and the DRSMO with the weight factors ω = 0 (green curve) and ω = 0.2 (azury curve). It should be noted that the PW had no significant improvement for the DRSMO with ω = 0 compared to the initial SMO, but due to the lower defocus sensitivity, the PW of the DRSMO with ω = 0.2 was enlarged. It was apparent that the introduction of the sensitivity penalty was beneficial for further improvement of the PW. Similarly, Table 3 summarizes the comparison of the target 2 optimization results for the initial SMO and DRSMO with ω = 0 and ω = 0.2, respectively. It shows that the DOF corresponding to EL = 5% maximally increased by 54.5%.

Comparison of SGD and MBGD Algorithm for DRSMO
We have previously used the SGD algorithm to solve multi-objective SMO [25,26] and it converged well with fast speed. However, due to the wider sampling range of defocusing in the DRSMO framework, it was hard for the SGD algorithm to search for the global optimal direction if each iteration was only driven by one sample gradient in the training set. To compare the SGD and MBGD optimization performance for the DRSMO in terms of speed and accuracy, we generated the same training set with 900 sample points for both MBGD and SGD optimization processes (for the MBGD algorithm, there were a total of three samples per iteration and 300 iterations. For the SGD algorithm, there was a total of one sample per iteration and 900 iterations). Figure 8 illustrates that the defocus-PAE curves for target 1 optimization in the case of the same weight factor, ω = 0.1, was proceeded by the SGD and MBGD algorithms, respectively. It was demonstrated that the slope of the DRSMO was proceeded by the MBGD (red curve), which was lower than that of the SGD (jasper curve). It indicates the better optimization performance for the MBGD in the same training set. Similarly, Figure 9 shows that the PWs with the same weight factor were proceeded by the SGD and MBGD algorithms, respectively. It was the MBGD algorithm that provided a wider PW due to the lower defocus sensitivity.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 12 of 17 the defocus-PAE curves for target 1 optimization in the case of the same weight factor, ω = 0.1, was proceeded by the SGD and MBGD algorithms, respectively. It was demonstrated that the slope of the DRSMO was proceeded by the MBGD (red curve), which was lower than that of the SGD (jasper curve). It indicates the better optimization performance for the MBGD in the same training set. Similarly, Figure 9 shows that the PWs with the same weight factor were proceeded by the SGD and MBGD algorithms, respectively. It was the MBGD algorithm that provided a wider PW due to the lower defocus sensitivity.  Table 4 summarizes the comparison of the optimization performances for target 1 proceeded by the MBGD and SGD, respectively. It clearly shows that the lower Sβ and larger DOF were improved by MBGD optimization. Meanwhile, Table 4 shows the comparison of run times, although SGD had a relatively weak optimization performance but was faster in regard to convergence rates. In addition, all computations were carried out on a server with an Intel core i5 8400 CPU, 2.8GHz, 16.0 GB of RAM. Figure 8. The defocus-PAE curves of target 1 for the DRSMO with the same weight factor ω = 0.1 which was proceeded by the SGD algorithm (jasper curve) and MBGD algorithms (red curve), respectively. Table 4 summarizes the comparison of the optimization performances for target 1 proceeded by the MBGD and SGD, respectively. It clearly shows that the lower S β and larger DOF were improved by MBGD optimization. Meanwhile, Table 4 shows the comparison of run times, although SGD had a relatively weak optimization performance but was faster in regard to convergence rates. In addition, all computations were carried out on a server with an Intel core i5 8400 CPU, 2.8GHz, 16.0 GB of RAM. Table 4 summarizes the comparison of the optimization performances for target 1 proceeded by the MBGD and SGD, respectively. It clearly shows that the lower Sβ and larger DOF were improved by MBGD optimization. Meanwhile, Table 4 shows the comparison of run times, although SGD had a relatively weak optimization performance but was faster in regard to convergence rates. In addition, all computations were carried out on a server with an Intel core i5 8400 CPU, 2.8GHz, 16.0 GB of RAM. Figure 9. PWs of target 1 for the DRSMO with the same weight factor ω = 0.1 that were proceeded by the SGD algorithm (jasper curve) and MBGD algorithms (red curve), respectively.

PAE
In conclusion, both the MBGD and SGD were beneficial for improving DOF and PW. However, the massive samples taken from the defocusing disturbances made it hard for the SGD to converge Figure 9. PWs of target 1 for the DRSMO with the same weight factor ω = 0.1 that were proceeded by the SGD algorithm (jasper curve) and MBGD algorithms (red curve), respectively.
In conclusion, both the MBGD and SGD were beneficial for improving DOF and PW. However, the massive samples taken from the defocusing disturbances made it hard for the SGD to converge to a global search direction. Therefore, the MBGD algorithm was applied to the DRSMO multi-objective optimization problem most effectively. Table 4. The values of S β , DOF (nm) corresponding to ELs equal to 5% and 8% and run time (seconds) for DRSMO proceeded by the MBGD and SGD, respectively.

Conclusions
In conclusion, we proposed the DRSMO to compensate for uncertain defocus and OPE in real lithography processes. The inverse optimization framework was based on a new cost function that constrained the uniformity of an aerial image in different defocus disturbances, and a more robust lithographic source and mask with lower defocus sensitivities were designed. Using this method, the robustness against focus shifting was dramatically improved and the DOF and PW were extremely enlarged as well. It created a larger exposure tolerance in the actual lithography process and it was especially applied to high fidelity exposures in cutting-edge technical nodes.