### Refine

#### Document Type

- Doctoral Thesis (4) (remove)

#### Language

- English (4) (remove)

#### Keywords

- Monte-Carlo-Simulation (4) (remove)

#### Institute

The reduction of information contained in model time series through the use of aggregating statistical performance measures is very high compared to the amount of information that one would like to draw from it for model identification and calibration purposes. It is readily known that this loss imposes important limitations on model identification and -diagnostics and thus constitutes an element of the overall model uncertainty as essentially different model realizations with almost identical performance measures (e.g. r-² or RMSE) can be generated. In three consecutive studies the present work proposes an alternative approach towards hydrological model evaluation based on the application of Self-Organizing Maps (SOM; Kohonen, 2001). The Self-Organizing Map is a type of artificial neural network and unsupervised learning algorithm that is used for clustering, visualization and abstraction of multidimensional data. It maps vectorial input data items with similar patterns onto contiguous locations of a discrete low-dimensional grid of neurons. The iterative training of the SOM causes the neurons to form a discrete, data-compressed representation of the high-dimensional input data. Using appropriate visualization techniques, information on distributions, patterns and relationships in complex data sets can be extracted. Irrespective of their potential, SOM applications have earned very little attention in hydrological modelling compared to other artificial neural network techniques. Therefore, the aim of the present work is to demonstrate that the application of Self-Organizing Maps has very high potential to address fundamental issues of model evaluation: It is shown that the clustering and classification of model time series by means of SOM can provide useful insights into model behaviour. In combination with the diagnostic properties of Signature Indices (Gupta et al., 2008; Yilmaz et al., 2008) SOM provides a novel tool for interpreting the model parameters in the hydrological context and identifying parameter sets that simultaneously meet multiple objectives, even if the corresponding model realizations belong to different models. Moreover, the presented studies and reviews also encourage further studies on the application of SOM in hydrological modelling.

This thesis introduces a calibration problem for financial market models based on a Monte Carlo approximation of the option payoff and a discretization of the underlying stochastic differential equation. It is desirable to benefit from fast deterministic optimization methods to solve this problem. To be able to achieve this goal, possible non-differentiabilities are smoothed out with an appropriately chosen twice continuously differentiable polynomial. On the basis of this so derived calibration problem, this work is essentially concerned about two issues. First, the question occurs, if a computed solution of the approximating problem, derived by applying Monte Carlo, discretizing the SDE and preserving differentiability is an approximation of a solution of the true problem. Unfortunately, this does not hold in general but is linked to certain assumptions. It will turn out, that a uniform convergence of the approximated objective function and its gradient to the true objective and gradient can be shown under typical assumptions, for instance the Lipschitz continuity of the SDE coefficients. This uniform convergence then allows to show convergence of the solutions in the sense of a first order critical point. Furthermore, an order of this convergence in relation to the number of simulations, the step size for the SDE discretization and the parameter controlling the smooth approximation of non-differentiabilites will be shown. Additionally the uniqueness of a solution of the stochastic differential equation will be analyzed in detail. Secondly, the Monte Carlo method provides only a very slow convergence. The numerical results in this thesis will show, that the Monte Carlo based calibration indeed is feasible if one is concerned about the calculated solution, but the required calculation time is too long for practical applications. Thus, techniques to speed up the calibration are strongly desired. As already mentioned above, the gradient of the objective is a starting point to improve efficiency. Due to its simplicity, finite differences is a frequently chosen method to calculate the required derivatives. However, finite differences is well known to be very slow and furthermore, it will turn out, that there may also occur severe instabilities during optimization which may lead to the break down of the algorithm before convergence has been reached. In this manner a sensitivity equation is certainly an improvement but suffers unfortunately from the same computational effort as the finite difference method. Thus, an adjoint based gradient calculation will be the method of choice as it combines the exactness of the derivative with a reduced computational effort. Furthermore, several other techniques will be introduced throughout this thesis, that enhance the efficiency of the calibration algorithm. A multi-layer method will be very effective in the case, that the chosen initial value is not already close to the solution. Variance reduction techniques are helpful to increase accuracy of the Monte Carlo estimator and thus allow for fewer simulations. Storing instead of regenerating the random numbers required for the Brownian increments in the SDE will be efficient, as deterministic optimization methods anyway require to employ the identical random sequence in each function evaluation. Finally, Monte Carlo is very well suited for a parallelization, which will be done on several central processing units (CPUs).

The demand for reliable statistics has been growing over the past decades, because more and more political and economic decisions are based on statistics, e.g. regional planning, allocation of funds or business decisions. Therefore, it has become increasingly important to develop and to obtain precise regional indicators as well as disaggregated values in order to compare regions or specific groups. In general, surveys provide the information for these indicators only for larger areas like countries or administrative divisions. However, in practice, it is more interesting to obtain indicators for specific subdivisions like on NUTS 2 or NUTS 3 levels. The Nomenclature of Units for Territorial Statistics (NUTS) is a hierarchical system of the European Union used in statistics to refer to subdivisions of countries. In many cases, the sample information on such detailed levels is not available. Thus, there are projects such as the European Census, which have the goal to provide precise numbers on NUTS 3 or even community level. The European Census is conducted amongst others in Germany and Switzerland in 2011. Most of the participating countries use sample and register information in a combined form for the estimation process. The classical estimation methods of small areas or subgroups, such as the Horvitz-Thompson (HT) estimator or the generalized regression (GREG) estimator, suffer from small area-specific sample sizes which cause high variances of the estimates. The application of small area methods, for instance the empirical best linear unbiased predictor (EBLUP), reduces the variance of the estimates by including auxiliary information to increase the effective sample size. These estimation methods lead to higher accuracy of the variables of interest. Small area estimation is also used in the context of business data. For example during the estimation of the revenues of specific subgroups like on NACE 3 or NACE 4 levels, small sample sizes can occur. The Nomenclature statistique des activités économiques dans la Communauté européenne (NACE) is a system of the European Union which defines an industry standard classification. Besides small sample sizes, business data have further special characteristics. The main challenge is that business data have skewed distributions with a few large companies and many small businesses. For instance, in the automotive industry in Germany, there are many small suppliers but only few large original equipment manufacturers (OEM). Altogether, highly influential units and outliers can be observed in business statistics. These extreme values in connection with small sample sizes cause severe problems when standard small area models are applied. These models are generally based on the normality assumption, which does not hold in the case of outliers. One way to solve these peculiarities is to apply outlier robust small area methods. The availability of adequate covariates is important for the accuracy of the above described small area methods. However, in business data, the auxiliary variables are hardly available on population level. One of several reasons for that is the fact that in Germany a lot of enterprises are not reflected in business registers due to truncation limits. Furthermore, only listed enterprises or companies which trespass specific thresholds are obligated to publish their results. This limits the number of potential auxiliary variables for the estimation. Even though there are issues with available covariates, business data often include spatial dependencies which can be used to enhance small area methods. Next to spatial information based on geographic characteristics, group-specific similarities like related industries based on NACE codes can be used. For instance, enterprises from the same NACE 2 level, e.g. sector 47 retail trade, behave more similar than two companies from different NACE 2 levels, e.g. sector 05 mining of coal and sector 64 financial services. This spatial correlation can be incorporated by extending the general linear mixed model trough the integration of spatially correlated random effects. In business data, outliers as well as geographic or content-wise spatial dependencies between areas or domains are closely linked. The coincidence of these two factors and the resulting consequences have not been fully covered in the relevant literature. The only approach that combines robust small area methods with spatial dependencies is the M-quantile geographically weighted regression model. In the context of EBLUP-based small area models, the combination of robust and spatial methods has not been considered yet. Therefore, this thesis provides a theoretical approach to this scientific and practical problem and shows its relevance in an empirical study.

This thesis is divided into three main parts: The description of the calibration problem, the numerical solution of this problem and the connection to optimal stochastic control problems. Fitting model prices to given market prices leads to an abstract least squares formulation as calibration problem. The corresponding option price can be computed by solving a stochastic differential equation via the Monte-Carlo method which seems to be preferred by most practitioners. Due to the fact that the Monte-Carlo method is expensive in terms of computational effort and requires memory, more sophisticated stochastic predictor-corrector schemes are established in this thesis. The numerical advantage of these predictor-corrector schemes ispresented and discussed. The adjoint method is applied to the calibration. The theoretical advantage of the adjoint method is discussed in detail. It is shown that the computational effort of gradient calculation via the adjoint method is independent of the number of calibration parameters. Numerical results confirm the theoretical results and summarize the computational advantage of the adjoint method. Furthermore, provides the connection to optimal stochastic control problems is proven in this thesis.