Xizhou Zhu

11.4k total citations · 8 hit papers
25 papers, 2.7k citations indexed

About

Xizhou Zhu is a scholar working on Computer Vision and Pattern Recognition, Artificial Intelligence and Electrical and Electronic Engineering. According to data from OpenAlex, Xizhou Zhu has authored 25 papers receiving a total of 2.7k indexed citations (citations by other indexed papers that have themselves been cited), including 22 papers in Computer Vision and Pattern Recognition, 13 papers in Artificial Intelligence and 2 papers in Electrical and Electronic Engineering. Recurrent topics in Xizhou Zhu's work include Advanced Neural Network Applications (14 papers), Multimodal Machine Learning Applications (12 papers) and Domain Adaptation and Few-Shot Learning (8 papers). Xizhou Zhu is often cited by papers focused on Advanced Neural Network Applications (14 papers), Multimodal Machine Learning Applications (12 papers) and Domain Adaptation and Few-Shot Learning (8 papers). Xizhou Zhu collaborates with scholars based in China, Hong Kong and United Kingdom. Xizhou Zhu's co-authors include Jifeng Dai, Lu Yuan, Yichen Wei, Lewei Lu, Yu Qiao, Yuwen Xiong, Yujie Wang, Stephen Lin, Dazhi Cheng and Zheng Zhang and has published in prestigious journals such as IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Transactions on Visualization and Computer Graphics and 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

In The Last Decade

Xizhou Zhu

21 papers receiving 2.6k citations

Hit Papers

5 papers align trajectories

What are hit papers?

Hit papers significantly outperform the citation benchmark for their cohort. A paper qualifies if it has ≥500 total citations, achieves ≥1.5× the top-1% citation threshold for papers in the same subfield and year (this is the minimum needed to enter the top 1%, not the average within it), or reaches the top citation threshold in at least one of its specific research topics.

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

2023 479 citations Wenhai Wang, Jifeng Dai et al. profile →
Deep Feature Flow for Video Recognition

2017 416 citations Xizhou Zhu, Yuwen Xiong et al. profile →
Flow-Guided Feature Aggregation for Video Object Detection

2017 413 citations Xizhou Zhu, Yujie Wang et al. profile →
Planning-oriented Autonomous Driving

2023 349 citations Yihan Hu, Li Chen et al. profile →
An Empirical Study of Spatial Attention Mechanisms in Deep Networks

2019 341 citations Xizhou Zhu, Dazhi Cheng et al. profile →
BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

2023 147 citations Chenyu Yang, Yuntao Chen et al. The HKU Scholars Hub (University of Hong Kong) profile →
Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

2024 105 citations Zhe Chen, Jiannan Wu et al. profile →
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

2024 66 citations Yuwen Xiong, Zhiqi Li et al. profile →

Peers — A (Enhanced Table)

Peers by citation overlap · career bar shows stage (early→late) cites · hero ref

Name	h						Papers	Cites
Xizhou Zhu China	15	1.9k	644	321	249	227	25	2.7k
Yassine Ruichek France	27	1.6k 0.9×	436 0.7×	437 1.4×	393 1.6×	248 1.1×	176	2.6k
Yuanqing Lin United States	27	2.4k 1.3×	749 1.2×	460 1.4×	263 1.1×	238 1.0×	44	3.2k
Zhaowei Cai United States	9	1.6k 0.9×	588 0.9×	260 0.8×	207 0.8×	152 0.7×	15	2.2k
Ming Tang China	25	1.7k 0.9×	558 0.9×	263 0.8×	216 0.9×	129 0.6×	110	2.3k
Tai‐Jiang Mu China	18	1.6k 0.8×	568 0.9×	386 1.2×	323 1.3×	114 0.5×	56	2.8k
Kaiwen Duan China	10	1.8k 0.9×	429 0.7×	486 1.5×	274 1.1×	120 0.5×	19	2.6k
Jan Hosang Germany	9	2.0k 1.0×	562 0.9×	324 1.0×	191 0.8×	195 0.9×	10	2.4k
Eduardo Romera Spain	18	1.5k 0.8×	394 0.6×	407 1.3×	170 0.7×	376 1.7×	27	2.0k
Longyin Wen China	27	2.4k 1.3×	554 0.9×	508 1.6×	192 0.8×	190 0.8×	49	2.8k

Countries citing papers authored by Xizhou Zhu

Since Specialization

Citations

This map shows the geographic impact of Xizhou Zhu's research. It shows the number of citations coming from papers published by authors working in each country. You can also color the map by specialization and compare the number of citations received by Xizhou Zhu with the expected number of citations based on a country's size and research output (numbers larger than one mean the country cites Xizhou Zhu more than expected).

Fields of papers citing papers by Xizhou Zhu

Since Specialization

Physical SciencesHealth SciencesLife SciencesSocial Sciences

This network shows the impact of papers produced by Xizhou Zhu. Nodes represent research fields, and links connect fields that are likely to share authors. Colored nodes show fields that tend to cite the papers produced by Xizhou Zhu. The network helps show where Xizhou Zhu may publish in the future.

Co-authorship network of co-authors of Xizhou Zhu

This figure shows the co-authorship network connecting the top 25 collaborators of Xizhou Zhu. A scholar is included among the top collaborators of Xizhou Zhu based on the total number of citations received by their joint publications. Widths of edges represent the number of papers authors have co-authored together. Node borders signify the number of papers an author published with Xizhou Zhu. Xizhou Zhu is excluded from the visualization to improve readability, since they are connected to all nodes in the network.

All Works

Sort: Min cites: Since: Top N: Style:

20 of 20 papers shown

Tao, Chenxin, Xizhou Zhu, Chenyu Zhang, et al.. (2025). HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding. 14559–14569.

Shao, Jie, Xizhou Zhu, Zhaokai Wang, et al.. (2025). SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding. 29767–29779.

Luo, Gen, Xue Yang, Zhaokai Wang, et al.. (2025). Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training. 24960–24971. 1 indexed citations

Wang, Zhaokai, Xizhou Zhu, Xue Yang, et al.. (2025). Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence. 47(11). 10142–10159. 1 indexed citations

Chen, Zhe, Jiannan Wu, Wenhai Wang, et al.. (2024). Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks. 24185–24198. 105 indexed citations breakdown →

Xiong, Yuwen, Zhiqi Li, Yuntao Chen, et al.. (2024). Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications. 5652–5661. 66 indexed citations breakdown →

Li, Hao, Xue Yang, Zhaokai Wang, et al.. (2024). Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft. 16426–16435. 6 indexed citations

Li, Hao, Xiaohu Jiang, Xizhou Zhu, et al.. (2023). Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks. 2691–2700. 18 indexed citations

Hu, Yihan, Li Chen, Keyu Li, et al.. (2023). Planning-oriented Autonomous Driving. 17853–17862. 349 indexed citations breakdown →

10.

Yang, Chenyu, Yuntao Chen, Hao Tian, et al.. (2023). BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision. The HKU Scholars Hub (University of Hong Kong). 17830–17839. 147 indexed citations breakdown →

11.

Wang, Wenhai, Jifeng Dai, Zhe Chen, et al.. (2023). InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions. 14408–14419. 479 indexed citations breakdown →

12.

Li, Hao, et al.. (2022). AutoLoss-Zero: Searching Loss Functions from Scratch for Generic Tasks. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 999–1008. 14 indexed citations

13.

Tao, Chenxin, Honghui Wang, Xizhou Zhu, et al.. (2022). Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 14411–14420. 29 indexed citations

14.

Gao, Hang, Xizhou Zhu, Stephen Lin, & Jifeng Dai. (2020). Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation. arXiv (Cornell University). 8 indexed citations

15.

Li, Hao, Chenxin Tao, Xizhou Zhu, et al.. (2020). Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation. arXiv (Cornell University). 6 indexed citations

16.

Su, Weijie, Xizhou Zhu, Yue Cao, et al.. (2020). VL-BERT: Pre-training of Generic Visual-Linguistic Representations. International Conference on Learning Representations. 112 indexed citations

17.

Zhu, Xizhou, Dazhi Cheng, Zheng Zhang, Stephen Lin, & Jifeng Dai. (2019). An Empirical Study of Spatial Attention Mechanisms in Deep Networks. 6687–6696. 341 indexed citations breakdown →

18.

Zhu, Xizhou, Yujie Wang, Jifeng Dai, Lu Yuan, & Yichen Wei. (2017). Flow-Guided Feature Aggregation for Video Object Detection. 408–417. 413 indexed citations breakdown →

19.

Zhu, Xizhou, Yuwen Xiong, Jifeng Dai, Lu Yuan, & Yichen Wei. (2017). Deep Feature Flow for Video Recognition. 4141–4150. 416 indexed citations breakdown →

20.

Liu, Mengchen, et al.. (2015). An Uncertainty-Aware Approach for Exploratory Microblog Retrieval. IEEE Transactions on Visualization and Computer Graphics. 22(1). 250–259. 52 indexed citations

Rankless uses publication and citation data sourced from OpenAlex, an open and comprehensive bibliographic database. While OpenAlex provides broad and valuable coverage of the global research landscape, it—like all bibliographic datasets—has inherent limitations. These include incomplete records, variations in author disambiguation, differences in journal indexing, and delays in data updates. As a result, some metrics and network relationships displayed in Rankless may not fully capture the entirety of a scholar's output or impact.

Explore authors with similar magnitude of impact