About me
- I am currently a postdoctoral researcher at the Gaoling School of Artificial Intelligence, Renmin University of China. I earned my PhD from University of Montreal, where I was mentored by Prof. Jian-Yun Nie.
- I completed my master’s (2019) and bachelor’s (2016) degrees at Renmin University of China, under the guidance of Prof. Zhicheng Dou and Prof. Ji-Rong Wen, delving into various NLP challenges.
- Research interests: Retrieval-augmented generation, large language models for information retrieval, session-based document ranking
News
- 2024.11: Congrats! Our paper “A Text-guided Protein Design Framework” has been accepted by Nature Machine Intelligence!
- 2024.10: We write a new survey about conversational search. See more details.
- 2024.7: We publish YuLan-Base-12B and YuLan-Chat-3-12B, a series of new LLMs training from scratch! See more details.
- 2024.5: We propose a new lightweight tuning method for RAG. Only adding one token can siginificantly improve LLMs’ RAG performance! See more details.
- 2024.5: We publish a new toolkit ⚡FlashRAG, which can help implement RAG methods quickly! See more details.
- 2024.5: Congrats! Our three papers have been accepted by ACL 2024!
- 2024.4: We write a new survey about generative information retrieval. See more details.
Publications
* for corresponding author.
2024
NMI
A Text-guided Protein Design Framework, Shengchao Liu, Yanjing Li, Zhuoxinran Li, Anthony Gitter, Yutao Zhu, Jiarui Lu, Zhao Xu, Weili Nie, Arvind Ramanathan, Chaowei Xiao, Jian Tang, Hongyu Guo, and Anima Anandkumar.KDD
Embedding Prior Task-specific Knowledge into Language Models for Context-aware Document Ranking, Shuting Wang, Yutao Zhu, and Zhicheng Dou.TKDE
CAGS: Context-Aware Document Ranking with Contrastive Graph Sampling, Zhaoheng Huang, Yutao Zhu*, Zhicheng Dou, and Ji-Rong Wen.arXiv
A Survey of Conversational Search, Fengran Mo, Kelong Mao, Ziliang Zhao, Hongjin Qian, Haonan Chen, Yiruo Cheng, Xiaoxi Li, Yutao Zhu, Zhicheng Dou, and Jian-Yun Nie.arXiv
Toward General Instruction-Following Alignment for Retrieval-Augmented Generation, Guanting Dong, Xiaoshuai Song, Yutao Zhu, Runqi Qiao, Zhicheng Dou, and Ji-Rong WenarXiv
From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language Models, Yuying Shang, Xinyi Zeng, Yutao Zhu, Xiao Yang, Zhengwei Fang, Jingyuan Zhang, Jiawei Chen, Zinan Liu, and Yu Tian.arXiv
Root Defence Strategies: Ensuring Safety of LLM at the Decoding Level, Xinyi Zeng, Yuying Shang, Yutao Zhu, Xiao Yang, Jiawei Chen, and Yu Tian.arXiv
LLMs + Persona-Plug = Personalized LLMs, Jiongnan Liu, Yutao Zhu, Shuting Wang, Xiaochi Wei, Erxue Min, Yu Lu, Shuaiqiang Wang, Dawei Yin, and Zhicheng Dou.arXiv
Towards Effective and Efficient Continual Pre-training of Large Language Models, Jie Chen, Zhipeng Chen, Jiapeng Wang, Kun Zhou, Yutao Zhu, Jinhao Jiang, Yingqian Min, Wayne Xin Zhao, Zhicheng Dou, Jiaxin Mao, Yankai Lin, Ruihua Song, Jun Xu, Xu Chen, Rui Yan, Zhewei Wei, Di Hu, Wenbing Huang, and Ji-Rong Wen.arXiv
YuLan: An Open-source Large Language Model, Yutao Zhu, Kun Zhou, Kelong Mao, Wentong Chen, Yiding Sun, Zhipeng Chen, Qian Cao, Yihan Wu, Yushuo Chen, Feng Wang, Lei Zhang, Junyi Li, Xiaolei Wang, Lei Wang, Beichen Zhang, Zican Dong, Xiaoxue Cheng, Yuhan Chen, Xinyu Tang, Yupeng Hou, Qiangqiang Ren, Xincheng Pang, Shufang Xie, Wayne Xin Zhao, Zhicheng Dou, Jiaxin Mao, Yankai Lin, Ruihua Song, Jun Xu, Xu Chen, Rui Yan, Zhewei Wei, Di Hu, Wenbing Huang, Ze-Feng Gao, Yueguo Chen, Weizheng Lu, and Ji-Rong Wen.arXiv
Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation, Guanting Dong, Yutao Zhu, Chenghao Zhang, Zechen Wang, Zhicheng Dou, and Ji-Rong Wen.arXiv
DemoRank: Selecting Effective Demonstrations for Large Language Models in Ranking Task, Wenhan Liu, Yutao Zhu, and Zhicheng Dou.arXiv
RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation, Shuting Wang, Xin Xu, Mang Wang, Weipeng Chen, Yutao Zhu, and Zhicheng Dou.arXiv
DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation, Shuting Wang, Jiongnan Liu, Shiren Song, Jiehan Cheng, Yuqi Fu, Peidong Guo, Kun Fang, Yutao Zhu, and Zhicheng Dou.TKDE
Query-oriented Data Augmentation for Session Search, Haonan Chen, Zhicheng Dou, Yutao Zhu, and Ji-Rong Wen.arXiv
One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models, Yutao Zhu, Zhaoheng Huang, Zhicheng Dou, and Ji-Rong Wen.arXiv
FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research, Jiajie Jin, Yutao Zhu, Xinyu Yang, Chenghao Zhang, and Zhicheng Dou.ACL 2024
INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning, Yutao Zhu, Peitian Zhang, Chenghao Zhang, Yifei Chen, Binyu Xie, Zheng Liu, Ji-Rong Wen, and Zhicheng Dou.ACL 2024
Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs, Jiejun Tan, Zhicheng Dou, Yutao Zhu, Peidong Guo, Kun Fang, and Ji-Rong Wen.ACL 2024 Findings
BIDER: Bridging Knowledge Inconsistency for Efficient Retrieval-Augmented LLMs via Key Supporting Evidence, Jiajie Jin, Yutao Zhu, Yujia Zhou, and Zhicheng Dou.KAIS
How to Personalize and Whether to Personalize? Candidate Documents Decide, Wenhan Liu, Yujia Zhou, Yutao Zhu, and Zhicheng Dou.arXiv
From Matching to Generation: A Survey on Generative Information Retrieval, Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yuyao Zhang, Peitian Zhang, Yutao Zhu, and Zhicheng Dou.SIGIR 2024 Resource
JDivPS: A Diversified Product Search Dataset, Zhirui Deng, Zhicheng Dou, Yutao Zhu, Xubo Qin, Pengchao Cheng, Jiangxu Wu, and Hao Wang.SIGIR 2024 Demo
An Integrated Data Processing Framework for Pretraining Foundation Models, Yiding Sun, Feng Wang, Yutao Zhu, Wayne Xin Zhao, and Jiaxin Mao.TOIS
Passage-aware Search Result Diversification, Zhan Su, Zhicheng Dou, Yutao Zhu, and Ji-Rong Wen.arXiv
UFO: a Unified and Flexible Framework for Evaluating Factuality of Large Language Models, Zhaoheng Huang, Zhicheng Dou, Yutao Zhu, and Ji-rong Wen.WWW 2024
Mining Exploratory Queries for Conversational Search, Wenhan Liu, Ziliang Zhao, Yutao Zhu, Zhicheng Dou.WSDM 2024
CL4DIV: A Contrastive Learning Framework for Search Result Diversification, Zhirui Deng, Zhicheng Dou, Yutao Zhu, and Ji-Rong Wen.
2023
EMNLP 2023 Findings
Joint Semantic and Strategy Matching for Persuasive Dialogue, Chuhao Jin, Yutao Zhu, Lingzhen Kong, Shijie Li, Xiao Zhang, Ruihua Song, Xu Chen, huan chen, Yuchong Sun, Yu Chen, and Jun Xu.KDD 2023
Learning to Relate to Previous Turns in Conversational Search, Fengran Mo, Jian-Yun Nie, Kaiyu Huang, Kelong Mao, Yutao Zhu, Peng Li, and Yang Liu.ACL 2023
ConvGQR: Generative Query Reformulation for Conversational Search, Fengran Mo, Kelong Mao, Yutao Zhu, Yihong Wu, Kaiyu Huang, and Jian-Yun Nie.ACL 2023 Findings
Hence, Socrates is mortal: A Benchmark for Natural Language Syllogistic Reasoning, Yongkang Wu, Meng Han, Yutao Zhu, Lei Li, Xinyu Zhang, Ruofei Lai, Xiaoguang Li, Yuanhang Ren, Zhicheng Dou, and Zhao Cao.TOIS
Contrastive Learning for Legal Judgment Prediction, Han Zhang, Zhicheng Dou, Yutao Zhu, and Ji-Rong Wen.AAAI 2023
Learning from the Wisdom of Crowds: Exploiting Similar Sessions for Session Search, Yuhang Ye, Zhonghua Li, Zhicheng Dou, Yutao Zhu, Changwang Zhang, Shangquan Wu, and Zhao Cao.WSDM 2023
Heterogeneous Graph-based Context-aware Document Ranking, Shuting Wang, Zhicheng Dou, and Yutao Zhu.arXiv
Don’t Make Your LLM an Evaluation Benchmark Cheater, Kun Zhou, Yutao Zhu, Zhipeng Chen, Wentong Chen, Wayne Xin Zhao, Xu Chen, Yankai Lin, Ji-Rong Wen, and Jiawei Han.arXiv
Large Language Models for Information Retrieval: A Survey, Yutao Zhu, Huaying Yuan, Shuting Wang, Jiongnan Liu, Wenhan Liu, Chenlong Deng, Zhicheng Dou, and Ji-Rong Wen.arXiv
WebBrain: Learning to Generate Factually Correct Articles for Queries by Grounding on Large Web Corpus, Hongjing Qian*, Yutao Zhu*, Zhicheng Dou, Haoqi Gu, Xinyu Zhang, Zheng Liu, Ruofei Lai, Zhao Cao, Jian-Yun Nie, and Ji-Rong Wen.arXiv
An Empirical Study of Uniform-Architecture Knowledge Distillation in Document Ranking, Xubo Qin, Xiyuan Liu, Xiongfeng Zheng, Jie Liu, and Yutao Zhu.
2022
EMNLP 2022 Findings
MCP: Self-supervised Pre-training for Personalized Chatbots with Multi-level Contrastive Sampling, Zhaoheng Huang, Zhicheng Dou, Yutao Zhu, and Zhengyi Ma.COLING 2022
Coarse-to-Fine: Hierarchical Multi-task Learning for Natural Language Understanding, Zhaoye Fei, Yu Tian, Yongkang Wu, Xinyu Zhang, Yutao Zhu, Zheng Liu, Jiawen Wu, Dejiang Kong, Ruofei Lai, Zhao Cao, Zhicheng Dou, and Xipeng Qiu.CIKM 2022
From Easy to Hard: A Dual Curriculum Learning Framework for Context-Aware Document Ranking, Yutao Zhu, Jian-Yun Nie, Yixuan Su, Haonan Chen, Xinyu Zhang, and Zhicheng Dou.CIKM 2022
Enhancing User Behavior Sequence Modeling by Generative Tasks for Session Search, Haonan Chen, Zhicheng Dou, Yutao Zhu, Zhao Cao, Xiaohua Cheng, and Ji-Rong Wen.TOIS
GDESA: Greedy Diversity Encoder with Self-Attention for Search Results Diversification, Xubo Qin, Zhicheng Dou, Yutao Zhu, and Ji-Rong Wen.KDD 2022
Knowledge Enhanced Search Result Diversification, Zhan Su, Zhicheng Dou, Yutao Zhu, and Ji-Rong Wen.NAACL-HLT 2022
Less is More: Learning to Refine Dialogue History for Personalized Dialogue Generation, Hanxun Zhong, Zhicheng Dou, Yutao Zhu, Hongjin Qian, and Ji-Rong Wen.TOIS
Leveraging Narrative to Generate Movie Script, Yutao Zhu, Ruihua Song, Jian-Yun Nie, Pan Du, Zhicheng Dou, and Jin Zhou.arXiv
PReGAN: Answer Oriented Passage Ranking with Weakly Supervised GAN, Du Pan, Jian-Yun Nie, Yutao Zhu, Hao Jiang, Lixin Zou, and Xiaohui Yan.
2021
CIKM 2021
Contrastive Learning of User Behavior Sequence for Context-Aware Document Ranking, Yutao Zhu, Jian-Yun Nie, Zhicheng Dou, Zhengyi Ma, Xinyu Zhang, Pan Du, Xiaochen Zuo, and Hao Jiang.CIKM 2021
PSSL: Self-supervised Learning for Personalized Search with Contrastive Sampling, Yujia Zhou, Zhicheng Dou, Yutao Zhu, and Ji-Rong Wen.CIKM 2021
Learning Implicit User Profile for Personalized Retrieval-Based Chatbot, Hongjin Qian, Zhicheng Dou, Yutao Zhu, Yueyuan Ma, and Ji-Rong Wen.CCIR 2021
(Best Paper Award) Interaction-Based Document Matching for Implicit Search Result Diversification, Xubo Qin, Zhicheng Dou, Yutao Zhu, and Ji-Rong Wen.TOIS
Graph Neural Collaborative Topic Model for Citation Recommendation, Qianqian Xie, Yutao Zhu, Jimin Huang, Pan Du, and Jian-Yun Nie.CCL 2021
Few-Shot Charge Prediction with Multi-grained Features and Mutual Information, Han Zhang, Zhicheng Dou, Yutao Zhu, and Ji-Rong Wen.SIGIR 2021 Short
Proactive Retrieval-based Chatbots based on Relevant Knowledge and Goals, Yutao Zhu, Jian-Yun Nie, Kun Zhou, Pan Du, Hao Jiang, and Zhicheng Dou.SIGIR 2021
Modeling Intent Graph for Search Result Diversification, Zhan Su, Zhicheng Dou, Yutao Zhu, Xubo Qin, and Ji-Rong Wen.SIGIR 2021
One Chatbot Per Person: Creating Personalized Chatbots based on Implicit User Profiles, Zhengyi Ma, Zhicheng Dou, Yutao Zhu, Hanxun Zhong, and Ji-Rong Wen.SIGIR 2021 Resource
Pchatbot: A Large-Scale Dataset for Personalized Chatbot, Hongjin Qian, Xiaohe Li, Hanxun Zhong, Yu Guo, Yueyuan Ma, Yutao Zhu, Zhanliang Liu, Zhicheng Dou, and Ji-Rong Wen.ECIR 2021
Content Selection Network for Document-grounded Retrieval-based Chatbots, Yutao Zhu, Jian-Yun Nie, Kun Zhou, Pan Du, and Zhicheng Dou.AAAI 2021
Neural Sentence Ordering Based on Constraint Graphs, Yutao Zhu, Kun Zhou, Jian-Yun Nie, Shengchao Liu, and Zhicheng Dou.arXiv
Emotion Eliciting Machine: Emotion Eliciting Conversation Generation based on Dual Generator, Hao Jiang, Yutao Zhu, Xinyu Zhang, Zhicheng Dou, Pan Du, Te Pi, and Yantao Jia.arXiv
BERT4SO: Neural Sentence Ordering by Fine-tuning BERT, Yutao Zhu, Jian-Yun Nie, Kun Zhou, Shengchao Liu, Yabo Ling, and Pan Du.
2020
CIKM 2020
S^3-Rec: Self-Supervised Learning for Sequential Recommendation with Mutual Information Maximization, Kun Zhou, Hui Wang, Wayne Xin Zhao, Yutao Zhu, Sirui Wang, Fuzheng Zhang, Zhongyuan Wang, and Ji-Rong Wen.ACL 2020
ScriptWriter: Narrative-Guided Script Generation, Yutao Zhu, Ruihua Song, Zhicheng Dou, Jian-Yun Nie, and Jin Zhou.PAKDD 2020
Improving Multi-Turn Response Selection Models with Complementary Last-Utterance Selection by Instance Weighting, Kun Zhou, Wayne Xin Zhao, Yutao Zhu, Ji-Rong Wen, and Jingsong Yu.
2019
IRJ
ReBoost: A Retrieval-Boosted Sequence-to-SequenceModel for Neural Response Generation, Yutao Zhu, Zhicheng Dou, Jian-Yun Nie, and Ji-Rong Wen.IRJ
Deep Cross-platform Product Matching in E-commerce, Juan Li, Zhicheng Dou, Yutao Zhu, Xiaochen Zuo, and Ji-Rong Wen.NTCIR 2019
A Hybrid Framework of Emotion-Aware Seq2Seq Model for Emotional Conversation Generation, Xiaohe Li, Jiaqing Liu, Weihao Zheng, Xiangbo Wang, Yutao Zhu, and Zhicheng Dou.
2018
SIGIR 2018 Short
An Attribute-aware Neural Attentive Model for Next Basket Recommendation, Ting Bai, Jian-Yun Nie, Wayne Xin Zhao, Yutao Zhu, Pan Du, and Ji-Rong Wen.
Experiences
- 2021.12 - 2022.12, Research Intern, Poisson Lab, Huawei . Supervised by Xinyu Zhang
- 2018.8 - 2019.6, Research Intern, XiaoIce, Microsoft Asia . Supervised by Ruihua Song
- 2016.9 - 2019.6, Research Assistant, Beijing Key Lab of Big Data Management and Analysis Methods. Supervised by Zhicheng Dou and Ji-Rong Wen
- 2016.6 - 2016.9, Software Engineer, Infosys Technology Limited . Supervised by Anjaneyulu Pasala
Academic Services
- PC Member: ACL, SIGIR, WWW, NeurIPS, ICLR, SIGKDD, AAAI, EMNLP, CIKM, WSDM, COLING, COLM
- Journal Reviewer: TOIS, JASIST, KAIS, TALLIP, Computing Surveys, ACL Rolling Review