Search Technology

Search Technology
Creating the best finding experience for our customers.

There are a lot of research problems behind constructing a good search results page. This includes what items to show (recall), in what order to show them (ranking), how to detect users who are gaming our system (spam), what to show in the left-hand control panel (search suggestions). I'll mention two examples to give the flavor of the problems. Some sellers game the system with shill buying rings: they have shills purchase their items to get artificially high feedback. Detecting this is challenging. Second example is ranking. In web search, results are ranked by relevance. But in eBay we have to balance relevance with value. The item that is most relevant to a query could be from a seller with bad feedback, or at an unreasonable price. How should that be ranked compared to an item that is slightly less relevant, but has better value?

Advances in Neural Information Processing Systems (NIPS), 2014

Parallel Feature Selection inspired by Group Testing

Yingbo Zhou, Utkarsh Porwal, Ce Zhang, Hung Q Ngo, Long Nguyen, Christopher Ré, Venu Govindaraju

This paper presents a parallel feature selection method for classification that scales up to very high dimensions and large data sizes. Our original method is inspired by group testing theory, under which the feature selection procedure consists of a collection of randomized tests to be performed in parallel. Each test corresponds to a subset of features, for which a scoring function may be applied to measure the relevance of the features in a classification task. We develop a general theory providing sufficient conditions under which true features are guaranteed to be correctly identified. Superior performance of our method is demonstrated on a challenging relation extraction task from a very large data set that have both redundant features and sample size in the order of millions. We present comprehensive comparisons with state-of-the-art feature selection methods on a range of data sets, for which our method exhibits competitive performance in terms of running time and accuracy. Moreover, it also yields substantial speedup when used as a pre-processing step for most other existing methods.