Publications

Publications
Publications
We strongly believe in open source and giving to our community. We work directly with researchers in academia and seek out new perspectives with our intern and fellowship programs. We generalize our solutions and release them to the world as open source projects. We host discussions and publish our results.

Publications

2015 International Conference for Machine Learning (ICML)

Bayesian and Empirical Bayesian Forests

Matt Taddy, Chun-Sheng Chen, Jun Yu, Mitch Wyle

We derive ensembles of decision trees through a nonparametric Bayesian model, allowing us to view random forests as samples from a posterior distribution. This insight provides large gains in interpretability, and motivates a class of Bayesian forest (BF) algorithms that yield small but reliable performance gains.

Based on the BF framework, we are able to show that high-level tree hierarchy is stable in large samples. This leads to an empirical Bayesian forest (EBF) algorithm for building approximate BFs on massive distributed datasets and we show that EBFs outperform subsampling based alternatives by a large margin.

Keywords
Categories
CVPR, June, 2015

ConceptLearner: Discovering Visual Concepts from Weakly Labeled Image Collections

Bolei Zhou, Vignesh Jagadeesh, Robinson Piramuthu
Discovering visual knowledge from weakly labeled data are crucial to scale up computer vision recognition system, since it is expensive to obtain fully labeled data for a large number of concept categories while the weakly labeled data could be collected from the Internet cheaply and massively.
 
In this paper we proposes a scalable approach to discover visual concepts from weakly labeled image collections, with thousands of visual concept detectors learned. Then we show that the learned detectors could be applied to recognize concepts at image-level and to detect concepts at image region-level accurately.
 
Under domain-selected supervision, we further evaluate the learned concepts for scene recognition on SUN database and for object detection on Pascal VOC 2007. It shows promising performance compared to the fully supervised and weakly supervised methods.
 
KDD 2014

Large Scale Visual Recommendations From Street Fashion Images

Vignesh Jagadeesh, Robinson Piramuthu, Anurag Bhardwaj, Wei Di, Neel Sundaresan

We describe a completely automated large scale visual recommendation system for fashion. Our focus is to efficiently harness the availability of large quantities of online fashion images and their rich meta-data.

Specifically, we propose two classes of data driven models in the Deterministic Fashion Recommenders (DFR) and Stochastic Fashion Recommenders (SFR) for solving this problem. We analyze relative merits and pitfalls of these algorithms through extensive experimentation on a large-scale data set and baseline them against existing ideas from color science.

We also illustrate key fashion insights learned through these experiments and show how they can be employed to design better recommendation systems.

The industrial applicability of proposed models is in the context of mobile fashion shopping. Finally, we also outline a largescale annotated data set of fashion images (Fashion-136K) that can be exploited for future research in data driven visual fashion.

WSDM, 2014

Is a picture really worth a thousand words?: - on the role of images in e-commerce

Wei Di, Neel Sundaresan, Anurag Bhardwaj, Robinson Piramuthu

In online peer-to-peer commerce places where physical examination of the goods is infeasible, textual descriptions, images of the products, reputation of the participants, play key roles. Visual image is a powerful channel to convey crucial information towards e-shoppers and influence their choice.

In this paper, we investigate a well-known online marketplace where over millions of products change hands and most are described with the help of one or more images. We present a systematic data mining and knowledge discovery approach that aims to quantitatively dissect the role of images in e-commerce in great detail. Our goal is two-fold.

First, we aim to get a thorough understanding of impact of images across various dimensions: product categories, user segments, conversion rate. We present quantitative evaluation of the influence of images and show how to leverage different image aspects, such as quantity and quality, to effectively raise sale. Second, we study interaction of image data with other selling dimensions by jointly modeling them with user behavior data.

Results suggest that "watch" behavior encodes complex signals combining both attention and hesitation from buyer, in which image still holds an important role when compared to other selling variables, especially for products for which appearance is important. We conclude on how these findings can benefit sellers in a high competitive online e-commerce market.

Keywords
Categories
In proceedings of the Workshop on Log-based Personalization (the 4th WSCD workshop) at WSDM 2014

A Large Scale Query Logs Analysis for Assessing Personalization Opportunities in E-commerce Sites

Neel Sundaresan, Zitao Liu

Personalization offers the promise of improving online search and shopping experience. In this work, we perform a large scale analysis on the sample of eBay query logs, which involves 9.24 billion session data spanning 12 months (08/2012-07/2013) and address the following topics

(1) What user information is useful for personalization;

(2) Importance of per-query personalization

(3) Importance of recency in query prediction.

In this paper, we study these problems and provide some preliminary conclusions

Keywords
Categories
CHIMoney (Workshop at CHI-2014)

Shopping with Bonus Money: eBay, loyalty schemes and consumer spending

Darrell Hoy, Elizabeth Churchill, Atish Das Sarma, Kamal Jain, Darrell Hoy, Elizabeth Churchill, Atish Das Sarma, Kamal Jain

No information

Keywords
In ECIR 2014 (To Appear)

A Study of Query Term Deletion using Large-scale E-commerce Search Logs

Bishan Yang, Nish Parikh, Gyanit Singh, Neel Sundaresan

Query term deletion is one of the commonly used strategies for query rewriting. In this paper, we study the problem of query term deletion using large-scale e-commerce search logs. Especially we focus on queries that do not lead to user clicks and aim to predict a reduced and better query that can lead to clicks by term deletion. Accurate prediction of term deletion can potentially help users recover from poor search results and improve shopping experience.

To achieve this,we use various term-dependent and query-dependent measures as features and build a classifier to predict which term is the most likely to be deleted from a given query. Different from previous work on query term deletion, we compute the features not only based on the query history and the available document collection, but also conditioned on the query category, which captures the high-level context of the query.

We validate our approach using a large collection of query sessions logs from a leading e-commerce site, and show that it provides promising performance in term deletion prediction, and significantly outperforms baselines that rely on query history and corpus-based statistics without incorporating the query context information.

Keywords
International Symposium on Electronic Imaging Symposium, February 2016

Im2Fit: Fast 3D Model Fitting and Anthropometrics using Single Consumer Depth Camera and Synthetic Data

Qiaosong Wang, Vignesh Jagadeesh, Bryan Ressler, Robinson Piramuthu

Recent advances in consumer depth sensors have created many opportunities for human body measurement and modeling. Estimation of 3D body shape is particularly useful for fashion e-commerce applications such as virtual try-on or fit personalization.

In this paper, we propose a method for capturing accurate human body shape and anthropometrics from a single consumer grade depth sensor. We first generate a large dataset of synthetic 3D human body models using real-world body size distributions.

Next, we estimate key body measurements from a single monocular depth image. We combine body measurement estimates with local geometry features around key joint positions to form a robust multi-dimensional feature vector.

This allows us to conduct a fast nearest-neighbor search to every sample in the dataset and return the closest one. Compared to existing methods, our approach is able to predict accurate full body parameters from a partial view using measurement parameters learned from the synthetic dataset.

Furthermore, our system is capable of generating 3D human mesh models in real-time, which is significantly faster than methods which attempt to model shape and pose deformations.

To validate the efficiency and applicability of our system, we collected a dataset that contains frontal and back scans of 83 clothed people with ground truth height and weight. Experiments on real-world dataset show that the proposed method can achieve real-time performance with competing results achieving an average error of 1.9 cm in estimated measurements.

Pages