Prior to joining eBay, Uwe was a senior research scientist at Yahoo, and before that was a director of Analytic Science at FICO. He has been a professor of mathematics at universities in both the U.S. and in Germany.

Uwe received his MA and PhD in mathematics from the University of Utah where he was a Fulbright scholar, with an extended research stay at the Institute for Advanced Studies at Princeton. He carried out his undergraduate studies with a double major in Mathematics and Computer Sciences in Germany. Bringing his academic career full circle from computer sciences to mathematics back to computers, Uwe also has co-advised a PhD student in data mining at the University of California, San Diego.

Research interests include parallel and distributed computing, text mining, and more generally data mining and machine learning, applied to recommendations and merchandising, and to demand and supply estimations.

### Canary in the e-Commerce Coal Mine: Detecting and Predicting Poor Experiences Using Buyer-to-Seller Messages

Reputation and feedback systems in online marketplaces are often biased, making it difficult to ascertain the quality of sellers. We use post-transaction, buyer-to-seller message traffic to detect signals of unsatisfactory transactions on eBay. We posit that a message sent after the item was paid for serves as a reliable indicator that the buyer may be unhappy with that purchase, particularly when the message included words associated with a negative experience. The fraction of a seller's message traffic that was negative predicts whether a buyer who transacts with this seller will stop purchasing on eBay, implying that platforms can use these messages as an additional signal of seller quality.

**Keywords**

**Categories**

### Bootstrapped Language Identification For Multi-Site Internet Domains

We present an algorithm for language identification, in particular of short documents, for the case of an Internet domain with sites in multiple countries with differing languages.

The algorithm is significantly faster than standard language identification methods, while providing state-of-the-art identification. We bootstrap the algorithm based on the language identification based on the site alone, a methodology suitable for any supervised language identification algorithm.

We demonstrate the bootstrapping and algorithm on eBay email data and on Twitter status updates data. The algorithm is deployed at eBay as part of the back-office development data repository.

**Categories**

### Classifying non-Gaussian and Mixed Data Sets in their Natural Parameter Space

We consider the problem of both supervised and unsupervised classification for multidimensional data that are nongaussian and of mixed types (continuous and/or discrete). An important subclass of graphical model techniques called Generalized Linear Statistics (GLS) is used to capture the underlying statistical structure of these complex data.

GLS exploits the properties of exponential family distributions, which are assumed to describe the data components, and constrains latent variables to a lower dimensional parameter subspace.

Based on the latent variable information, classification is performed in the natural parameter subspace with classical statistical techniques. The benefits of decision making in parameter space is illustrated with examples of categorical data text categorization and mixed-type data classification.

As a text document preprocessing tool, an extension from binary to categorical data of the conditional mutual information maximization based feature selection algorithm is presented.

**Keywords**

**Categories**

### A Unifying Viewpoint of some Clustering Techniques Using Bregman Divergences and Extensions to Mixed Data Sets

We present a general viewpoint using Bregman divergences and exponential family properties that contains as special cases the three following algorithms: 1) exponential family Principal Component Analysis (exponential PCA), 2) Semi-Parametric exponential family Principal Component Analysis (SP-PCA) and 3) Bregman soft clustering. This framework is equivalent to a mixed data-type hierarchical Bayes graphical model assumption with latent variables constrained to a low-dimensional parameter subspace. We show that within this framework exponential PCA and SPPCA are similar to the Bregman soft clustering technique with the addition of a linear constraint in the parameter space. We implement the resulting modifications to SP-PCA and Bregman soft clustering for mixed (continuous and/or discrete) data sets, and add a nonparametric estimation of the point-mass probabilities to exponential PCA. Finally, we compare the relative performances of the three algorithms in a clustering setting for mixed data sets.

**Keywords**

**Categories**

### Generalized Statistical Methods for Unsupervised Minority Class Detection in Mixed Data Sets

Minority class detection is the problem of detecting the occurrence of rare key events differing from the majority of a data set. This paper considers the problem of unsupervised minority class detection for multidimensional data that are highly nongaussian, mixed (continuous and/or discrete), noisy, and nonlinearly related, such as occurs, for example, in fraud detection in typical financial data.

A statistical modeling approach is proposed which is a subclass of graphical model techniques. It exploits the properties of exponential family distributions and generalizes techniques from classical linear statistics into a framework referred to as Generalized Linear Statistics (GLS). The methodology exploits the split between the data space and the parameter space for exponential family distributions and solves a nonlinear problem by using classical linear statistical tools applied to data that has been mapped into the parameter space.

A fraud detection technique utilizing low-dimensional information learned by using an Iteratively Reweighted Least Squares (IRLS) based approach to GLS is proposed in the parameter space for data of mixed type. ROC curves for an initial simulation on synthetic data are presented, which gives predictions for results on actual financial data sets.

**Keywords**

**Categories**

### Data-Pattern Discovery Methods for Detection in Nongaussian High-dimensional Data Sets

Many important expert system applications depend on the ability to accurately detect or predict the occurrence of key events given a data set of observations. We concentrate on multidimensional data that are highly nongaussian (continuous and/or discrete), noisy and nonlinearly related.

We investigate the feasibility of data-pattern discovery and event detection by applying generalized principal component analysis (GPCA) techniques for pattern extraction based on an exponential family probability distribution assumption.

We develop theoretical extensions of the GPCA model by exploiting results from the theory of generalized linear models and nonparametric mixture density estimation.

**Keywords**

**Categories**

### Experimental Design for Solicitation Campaigns

Data mining techniques are routinely used by fundraisers to select those prospects from a large pool of candidates who are most likely to make a financial contribution. These techniques often rely on statistical models based on trial performance data.

This trial performance data is typically obtained by soliciting a smaller sample of the possible prospect pool. Collecting this trial data involves a cost; therefore the fundraiser is interested in keeping the trial size small while still collecting enough data to build a reliable statistical model that will be used to evaluate the remain-der of the prospects.

We describe an experimental design approach to optimally choose the trial prospects from an existing large pool of prospects. Pros-pects are clustered to render the problem practically tractable. We modify the standard D-optimality algorithm to prevent repeated selection of the same prospect cluster, since each prospect can only be solicited at most once. We assess the benefits of this approach on the KDD-98 data set by comparing the performance of the model based on the optimal trial data set with that of a model based on a randomly selected trial data set of equal size.

**Categories**

### Self-intersections for the Willmore Flow

We prove that the Willmore flow can drive embedded surfaces to self-intersections in finite time

**Keywords**

**Categories**

### A numerical scheme for axisymmetric solutions of curvature driven free boundary problems, with applications to the Willmore Flow

We present a numerical scheme for radially symmetric solutions to curvature driven moving boundary problems governed by a local law of motion, e.g. the mean curvature flow, the surface diffusion flow, and the Willmore flow. We then present several numerical experiments for the Willmore flow. In particular, we provide numerical evidence that the Willmore flow can develop singularities in finite time.

**Keywords**

**Categories**

### Numerical solutions for the surface diffusion flow in three space dimensions

The surface diffusion flow is a moving boundary problem that has a gradient flow structure. This gradient flow structure suggests an implicit finite differences approach to compute numerical solutions.

The resulting numerical scheme will allow to compute the flow for any smooth orientable immersed initial surface. Observations include the loss of embeddedness for some initially embedded surface, the creation of singularities, and the long term behavior of solutions.

**Keywords**

**Categories**

### Loss of convexity for a modified Mullins-Sekerka model arising in diblock copolymer melts

This modified (two-sided) Mullins-Sekerka model is a nonlocal evolution model for closed hypersurfaces, which appears as a singular limit of a modified Cahn-Hilliard equation describing microphase separation of diblock copolymer.

Under this evolution the propagating interfaces maintain the enclosed volumes of the two phases. We will show by means of an example that this model does not preserve convexity in two space dimensions.

**Keywords**

**Categories**

### A singular example for the averaged mean curvature flow

An example of an embedded curve is presented which under numerical simulation of the averaged mean curvature flow develops first a loss of embeddedness, and then a singularity where the curvature becomes infinite, all in finite time.

This leads to the conjecture that not all smooth embedded curves persist for all times under the averaged mean curvature flow.

**Keywords**

**Categories**

### A numerical scheme for free boundary problems that are gradient flows for the area functional

Many moving boundary problems that are driven in some way by the curvature of the free boundary are gradient flows for the area of the moving interface. Examples are the Mullins-Sekerka flow, the Hele-Shaw flow, flow by mean curvature, and flow by averaged mean curvature. The gradient flow structure suggests an implicit finite differences approach to compute numerical solutions.

The proposed numerical scheme will allow to treat such free boundary problems in both R2 and R3. The advantage of such an approach is the re-usability of much of the setup for all of the different problems. As an example of the method we will compute solutions to the averaged mean curvature flow that exhibit the formation of a singularity.

**Keywords**

**Categories**

### Self-intersections for the surface diffusion and the volume preserving mean curvature flow

We prove that the surface diffusion flow and the volume preserving mean curvature flow can drive embedded hypersurfaces to self-intersections.

**Keywords**

**Categories**

### On diffusion-induced grain-boundary motion

We consider a sharp interface model which describes diffusion-induced grain-boundary motion in a poly-crystalline material. This model leads to a fully nonlinear coupled system of partial differential equations. We show existence and uniqueness of smooth solutions.

**Keywords**

**Categories**

### Classical solutions for diffusion-induced grain-boundary motion

We prove existence and uniqueness of classical solutions for the motion of hypersurfaces driven by mean curvature and diffusion of a solute along the surface. This free boundary problem involves solving a coupled system of fully nonlinear partial differential equations.

**Keywords**

**Categories**

### On the surface diffusion flow

In this paper we present recent existence, uniqueness, and stability results for the motion of immersed hypersurfaces driven by surface diffusion. We provide numerical simulations for curves and surfaces that exhibit the creation of singularities.

Moreover, our numerical simulations show that the flow causes a loss of embeddedness for some initially embedded configurations.

**Keywords**

**Categories**

### The surface diffusion flow for immersed hypersurfaces

We show existence and uniqueness of classical solutions for the motion of immersed hypersurfaces driven by surface diffusion. If the initial surface is embedded and close to a sphere, we prove that the solution exists globally and converges exponentially fast to a sphere. Furthermore, we provide numerical simulations showing the creation of singularities for immersed curves.

**Keywords**

**Categories**

### Gradient flows on nonpositively curved metric spaces and harmonic maps

The notion of gradient flows is generalized to a metric space setting without any linear structure. The metric spaces considered are a generalization of Hilbert spaces. The properties of such metric spaces are used to set up a finite-difference scheme of variational form.

The proof of the Crandall-Liggett generation theorem is adapted to show convergence. The resulting flow generates a strongly continuous semigroup of Lipschitz-continuous mappings, is Lipschitz continuous in time for positive time, and decreases the energy functional along a path of steepest descent.

In case the underlying metric space is a Hilbert space, the solutions resulting from this new theory coincide with those obtained by classical methods. As an application, the harmonic map flow problem for maps from a manifold into a nonpositively curved metric space is considered, and the existence of a solution to the initial boundary value problem is established.

**Keywords**

**Categories**

### Two-sided Mullins-Sekerka flow does not preserve convexity

The (two-sided) Mullins-Sekerka model is a nonlocal evolution model for closed hypersurfaces, which was originally proposed as a model for phase transitions of materials of negligible specific heat. Under this evolution the propagating interfaces maintain the enclosed volume while the area of the interfaces decreases.

We will show by means of an example that the Mullins-Sekerka flow does not preserve convexity in two space dimensions, where we consider both the Mullins-Sekerka model on a bounded domain, and the Mullins-Sekerka model defined on the whole plane.

**Keywords**

**Categories**

### One-sided Mullins-Sekerka flow does not preserve convexity

The Mullins-Sekerka model is a nonlocal evolution model for hypersurfaces, which arises as a singular limit for the Cahn-Hilliard equation. Assuming the existence of sufficiently smooth solutions we will show that the one-sided Mullins-Sekerka flow does not preserve convexity. The main tool is the strong maximum principle for elliptic second order differential equations.

**Keywords**

**Categories**