文摘
In this dissertation we first study a distributionally robust least squares problem. We present three different robust frameworks using probabilistic ambiguity descriptions of the data in least squares problems. These probability ambiguity descriptions are given by: 1) confidence region over the first two moments; 2) bounds on the probability measure with moment constraints; 3) the Kantorovich probability distance from a given measure. Next,we study a two-stage stochastic convex programming model using moments to define the probability ambiguity set for the objective function coefficients in both stages. A decomposition based algorithm is given. We show that this two-stage model can be solved to any precision in polynomial time. A special case is considered where the probability ambiguity sets are described by the exact information of the first two moments and the convex functions are piece-wise linear utility functions and computational results on the performance of this problem using a portfolio optimization application are given. Results show that the two stage modeling is effective when forecasting models have predictive power. The second part of this dissertation focuses on the process mining techniques via electronic medical record audits. We model patterns of patient record usage and evaluate our approach using several months of data from a large academic medical center. Empirical results show a small portion of accesses constitutes outliers from care workflows. We simulate anomalies in the context of real accesses to illustrate the efficiency of the proposed method for different medical services. The results suggest that our approach is better and more efficient than the existing state-of-the-art in its outlier detection performance. Next,we present a four-step framework to analyze process models with noisy data at an abstract level. Traces are separated into blocks which are then clustered into several groups and the original traces are transformed to traces consisting of high level blocks. Based on the occurrence of the abstract events,these traces are then clustered into several subgroups,each of which can be used to analyze the process model with mining tools such as ProM. Empirical results show that our framework can extract the process models effectively.