Rencent works on LLMs:
Automatic Engineering of Long Prompts, ACL Findings 2024 [arXiv]
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent [arXiv]
Two-stage LLM Fine-tuning with Less Specialization and More Generalization, ICLR 2024 [arXiv]
SpecTr: Fast Speculative Decoding via Optimal Transport, ICML 2023 [arXiv]
Large Language Models with Controllable Working Memory, ACL findings 2023 [arXiv]
The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers, ICLR 2023 [arXiv]
Recent works:
Serving Graph Compression for Graph Neural Networks, ICLR 2023
Correlated quantization for distributed mean estimation and optimization, ICML 2022
RankDistil: Knowledge Distillation for Ranking, AISTATS 2021
Modifying Memories in Transformer Models [arXiv]
Recent works:
Federated learning with only positive labels, ICML 2020 [arXiv]
Pre-training tasks for embedding-based large-scale retrieval, ICLR 2020 [arXiv]
Learning discrete distributions: user-level vs item-level privacy, NeurIPS 2020 [arXiv]
Semantic label smoothing for sequence to sequence problems, EMNLP 2020 [arXiv]

About Me

I am a Sr. Staff Research Scientist at Google, New York. I work on deep retrieval, efficient LLMs and their intersections. The deep retrieval algorithms and models developed by me and team are widely used at Google (Search, Ads, YouTube, Play, Ads and the new GenAI experience). Before Google, I received my Ph.D from EE@Columbia University in 2015. I serve as ACs for ICML and NeurIPS.
Google Scholar LinkedIn


Research Interests

Embedding-based deep retrival and ranking
    Loss function and negative sampling (e.g. stochasitc negative mining, RFF-softmax)
    Regularization and optimization (e.g. spreadout)
    Distillation (e.g. RankDistil)
    Pre-training tasks (e.g. paragraph-level tasks)

Topics of LLMs
    Parameter efficient fine tuning
    Automated prompt engineering (e.g. long context prompt optimization)
    Prompt tuning (e.g. ProMoT)
    Efficient decoding (e.g. SpecTr)
    Agent and tool use (e.g. ReST + ReAct)
    Memory (e.g. controllable working memory)
    Sparsity (e.g. activation sparsity)

I also worked on (and am still interested in) many other topics including structrued random/trained matrices, communication efficient distrbuted learning, privacy and learning with weakly supervised data.


Selected Papers

Cho-Jui Hsieh, Si Si, Felix X. Yu, Inderjit S. Dhillon
Automatic Engineering of Long Prompts
ACL Findings 2024 [arXiv]
Yihan Wang, Si Si, Daliang Li, Michal Lukasik, Felix X. Yu, Cho-Jui Hsieh, Inderjit S Dhillon, Sanjiv Kumar
Two-stage LLM Fine-tuning with Less Specialization and More Generalization
ICLR 2024 [arXiv]
Renat Aksitov, Sobhan Miryoosefi, Zonglin Li, Daliang Li, Sheila Babayan, Kavya Kopparapu, Zachary Fisher, Ruiqi Guo, Sushant Prakash, Pranesh Srinivasan, Manzil Zaheer, Felix X. Yu, Sanjiv Kumar
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
[arXiv]
Ziteng Sun, Ananda Theertha Suresh, Jae Hun Ro, Ahmad Beirami, Himanshu Jain, Felix X. Yu
SpecTr: Fast Speculative Decoding via Optimal Transport
ICML 2023 [arXiv]
Daliang Li, Ankit Singh Rawat, Manzil Zaheer, Xin Wang, Michal Lukasik, Andreas Veit, Felix X. Yu, Sanjiv Kumar
Large Language Models with Controllable Working Memory
ACL Findings 2023 [arXiv]
Zonglin Li, Chong You, Srinadh Bhojanapalli, Daliang Li, Ankit Singh Rawat, Sashank J. Reddi, Ke Ye, Felix Chern, Felix X. Yu, Ruiqi Guo, Sanjiv Kumar
The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers
ICLR 2023 [arXiv]
Si Si, Felix X. Yu, Ankit Singh Rawat, Cho-Jui Hsieh, Sanjiv Kumar
Serving Graph Compression for Graph Neural Networks
ICLR 2023 [PDF]
Ananda Theertha Suresh, Ziteng Sun, Jae Hun Ro, Felix X. Yu
Correlated quantization for distributed mean estimation and optimization
ICML 2022 [arXiv]
Chen Zhu, Ankit Singh Rawat, Manzil Zaheer, Srinadh Bhojanapalli, Daliang Li, Felix X. Yu, Sanjiv Kumar
Modifying Memories in Transformer Models
[arXiv]
Sashank Reddi, Rama Kumar Pasumarthi, Aditya Menon, Ankit Singh Rawat, Felix X. Yu, Seungyeon Kim, Andreas Veit, Sanjiv Kumar
RankDistil: Knowledge Distillation for Ranking
AISTATS 2021 [PDF]
Felix X. Yu, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar
Federated learning with only positive labels
ICML 2020 [arXiv]
Wei-Cheng Chang, Felix X. Yu, Yin-Wen Chang, Yiming Yang, Sanjiv Kumar
Pre-training tasks for embedding-based large-scale retrieval
ICLR 2020 [arXiv]
Ankit Singh Rawat, Jiecao Chen, Felix X. Yu, Ananda Theertha Suresh, Sanjiv Kumar
Sampled softmax with random Fourier features
NeurIPS 2019 [arXiv]
Sashank J. Reddi, Satyen Kale, Felix X. Yu, Dan Holtmann-Rice, Jiecao Chen, Sanjiv Kumar
Stochastic Negative Mining for learning with large output spaces
AISTATS 2019 [arXiv]
Naman Agarwal, Ananda Theertha Suresh, Felix X. Yu, Sanjiv Kumar, H. Brendan McMahan
cpSGD: Communication-efficient and differentially-private distributed SGD
NeurIPS 2018 spotlight [arXiv]
Ian E.H. Yen, Satyen Kale, Felix X. Yu, Daniel Holtmann-Rice, Sanjiv Kumar, Pradeep Ravikumar
Loss decomposition for fast learning in large output spaces
ICML 2018 [PDF]
Felix X. Yu, Aditya Bhaskara, Sanjiv Kumar, Yunchao Gong, Shih-Fu Chang
On binary embedding using circulant matrices
JMLR 2018 [PDF]
Ananda Theertha Suresh, Felix X. Yu, Sanjiv Kumar, H. Brendan McMahan
Distributed mean estimation with limited communication
ICML 2017 [arXiv]
Felix X. Yu, Ananda Theertha Suresh, Krzysztof Choromanski, Daniel Holtmann-Rice, Sanjiv Kumar
Orthogonal random features
NIPS 2016 oral [arXiv]
Yu Cheng*, Felix X. Yu*, Rogerio Feris, Sanjiv Kumar, Alok Choudhary, Shih-Fu Chang
An exploration of parameter redundancy in deep networks with circulant projections
ICCV 2015 [arXiv]
Felix X. Yu, Sanjiv Kumar, Yunchao Gong, Shih-Fu Chang
Circulant Binary Embedding
ICML 2014 oral [arXiv]