Publications

Publications
Publications
We strongly believe in open source and giving to our community. We work directly with researchers in academia and seek out new perspectives with our intern and fellowship programs. We generalize our solutions and release them to the world as open source projects. We host discussions and publish our results.

Publications

Physician Incentives and Treatment Choices in Heart Attack Management

Dominic Coey

We estimate how physicians’ financial incentives affect their treatment choices in heart Attack management, using a large dataset of private health insurance claims. Different insurance plans pay physicians different amounts for the same services, generating the required variation in financial incentives.

We begin by presenting evidence that, unconditionally, plans that pay physicians more for more invasive treatments are associated with a considerably larger fraction of such treatments. To interpret this correlation as causal, we continue by showing that it survives conditioning on a rich set of diagnosis and provider-specific variables.

We perform a host of additional checks verifying that differences in unobservable patient or provider characteristics across plans are unlikely to be driving our results. We find that physicians’ treatment choices respond positively to the payments they receive, and that the response is quite large.

If physicians received bundled payments instead of fee-for-service incentives, for example, heart attack management would become considerably more conservative. Our estimates imply that 20 percent of patients would receive different treatments, physician costs would decrease by 27 percent, and social welfare would increase.

Keywords
Categories
To appear in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Mobile Vision Workshop, 2013.

Style Finder: Fine-Grained Clothing Style Recognition and Retrieval

Wei Di, Catherine Wah, Anurag Bhardwaj, Robinson Piramuthu, Neel Sundaresan

With the rapid proliferation of smartphones and tablet computers, search has moved beyond text to other modalities like images and voice. For many applications like Fashion, visual search offers a compelling interface that can capture stylistic visual elements beyond color and pattern that cannot be as easily described using text.

However, extracting and matching such attributes remains an extremely challenging task due to high variability and deformability of clothing items. In this paper, we propose a fine-grained learning model and multimedia retrieval framework to address this problem.

First, an attribute vocabulary is constructed using human annotations obtained on a novel fine-grained clothing dataset. This vocabulary is then used to train a fine-grained visual recognition system for clothing styles.

We report benchmark recognition and retrieval results on Women's Fashion Coat Dataset and illustrate potential mobile applications for attribute-based multimedia retrieval of clothing items and image annotation.

KDD-2013

Palette Power: Enabling Visual Search through Colors

Anurag Bhardwaj, Atish DasSarma, Wei Di, Raffay Hamid, Robinson Piramuthu, Neel Sundaresan

xplosion of mobile devices with cameras, online search has moved beyond text to other modalities like images, voice, and writing. For many applications like Fashion, image-based search offers a compelling interface as compared to text forms by better capturing the visual attributes.

In this paper we present a simple and fast search algorithm that uses color as the main feature for building visual search. We show that low level cues such as color can be used to quantify image similarity and also to discriminate among products with different visual appearances.

We demonstrate the effectiveness of our approach through a mobile shopping application (eBay Fashion App available at https://itunes.apple.com/us/app/ebay-fashion/id378358380?mt=8 and eBay image swatch is the feature indexing millions of real world fashion images).

Our approach outperforms several other state-of-the-art image retrieval algorithms for large scale image data.

To appear in Proceedings of IEEE International Conference on Computer Vision & Pattern Recognition (CVPR) 2013

Dense Non-Rigid Point-Matching Using Random Projections

Raffay Hamid, Dennis DeCoste, Chih-Jen Lin

We present a robust and efficient technique for matching dense sets of points undergoing non-rigid spatial transformations. Our main intuition is that the subset of points that can be matched with high confidence should be used to guide the matching procedure for the rest.

We propose a novel algorithm that incorporates these high-confidence matches as a spatial prior to learn a discriminative subspace that simultaneously encodes both the feature similarity as well as their spatial arrangement.

Conventional subspace learning usually requires spectral decomposition of the pair-wise distance matrix across the point-sets, which can become inefficient even for moderately sized problems. To this end, we propose the use of random projections for approximate subspace learning, which can provide significant time improvements at the cost of minimal precision loss.

This efficiency gain allows us to iteratively find and remove high-confidence matches from the point sets, resulting in high recall. To show the effectiveness of our approach, we present a systematic set of experiments and results for the problem of dense non-rigid image-feature matching.

Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL), 2012.

Structuring E-Commerce Inventory

Karin Maugé, Khashayar Rohanimanesh, Jean-David Ruvini

Large e-commerce enterprises feature millions of items entered daily by a large variety of sellers. While some sellers provide rich, structured descriptions of their items, a vast majority of them provide unstructured natural language descriptions.

In the paper we present a 2 steps method for structuring items into descriptive properties. The first step consists in unsupervised property discovery and extraction. The second step involves supervised property synonym discovery using a maximum entropy based clustering algorithm.

We evaluate our method on a year worth of eCommerce data and show that it achieves excellent precision with good recall.

Keywords
ACL 2012: 805-814

Structuring E-Commerce Inventory

Karin Maugé, Khashayar Rohanimanesh, Jean-David Ruvini

Large e-commerce enterprises feature millions of items entered daily by a large variety of sellers. While some sellers provide rich, structured descriptions of their items, a vast majority of them provide unstructured natural language descriptions.

In the paper we present a 2 steps method for structuring items into descriptive properties. The first step consists in unsupervised property discovery and extraction. The second step involves supervised property synonym discovery using a maximum entropy based clustering algorithm.

We evaluate our method on a year worth of eCommerce data and show that it achieves excellent precision with good recall.

Keywords
CIKM 2012:596-604

Large-scale Item Categorization for e-Commerce

Dan Shen, Jean-David Ruvini, Badrul Sarwar

This paper studies the problem of leveraging computationally intensive classification algorithms for large scale text categorization problems. We propose a hierarchical approach which decomposes the classification problem into a coarse level task and a fine level task.

A simple yet scalable classifier is applied to perform the coarse level classification while a more sophisticated model is used to separate classes at the fine level. However, instead of relying on a human-defined hierarchy to decompose the problem, we use a graph algorithm to discover automatically groups of highly similar classes.

As an illustrative example, we apply our approach to real-world industrial data from eBay, a major e-commerce site where the goal is to classify live items into a large taxonomy of categories.

In such industrial setting, classification is very challenging due to the number of classes, the amount of training data, the size of the feature space and the real world requirements on the response time. We demonstrate through extensive experimental evaluation that (1) the proposed hierarchical approach is superior to flat models, and (2) the data-driven extraction of latent groups works significantly better than the existing human-defined hierarchy.

Keywords
SDM 2012

Multi-Skill Collaborative Teams based on Densest Subgraphs

Atish Das Sarma, Amita Gajewar, Atish Das Sarma, Amita Gajewar

We consider the problem of identifying a team of skilled individuals for collaboration, in the presence of a social network, with the goal to maximize the collaborative compatibility of the team. Each node in the social network is associated with skills, and edge-weights specify affinity between respective nodes. We measure collaborative compatibility objective as the density of the induced subgraph on selected nodes.

This problem is NP-hard even when the team requires individuals of only one skill. We present a 3-approximation algorithm for the single-skill team formulation problem. We show the same approximation can be extended to a special case of multiple skills.

Our problem generalizes the formulation studied by Lappas et al. [KDD ’09] who measure team compatibility in terms of diameter or spanning tree. The experimental results show that the density-based algorithms outperform the diameter-based objective on several metrics.

Keywords

Pages