Chenhui Gou

Monash University

Melbourne, Australia

I am a PhD student at Monash University and a research intern with ByteDance Seed Edge. My work focuses on agentic and multimodal foundation models, especially self-evolving agents, unified multimodal understanding and generation, long-video understanding, and evaluation of multimodal models.

Working on self evolving AI.
AI Agents · LLMs · VLMs · Unified Multimodal Models

AI Agents Self-Evolving AI Unified Multimodal Models Long Video Understanding Multimodal Evaluation Efficient MLLMs

news

Jul 23, 2026	Release paper: Sample-Efficient Learning from Agent Experience.
Jul 02, 2026	EdgeBench released: unveiling scaling laws of learning from real-world environments.
Mar 01, 2026	Two papers accepted at CVPR 2026: VQ-VA World and An Empirical Study on How Video-LLMs Answer Video Questions.
Jan 01, 2025	BAGEL (Emerging Properties in Unified Multimodal Pretraining) released.

🚀 experience

2020-12 — 2021-06

Aibee Inc

Research Intern

Beijing, China

2020

2021-06 — 2021-12

NIO Inc, Autonomous Driving

Research Intern

Beijing, China

2021

2021-12 — 2022-08

Baidu, Vision Technology Department

Research Intern

Beijing, China

2021

2022-08 — 2022-10

University of Technology Sydney

Research Project

Sydney, Australia

2022

2022-07 — 2023-07

Australian National University

Research Project

Canberra, Australia

2022

2022-12 — 2023-03

Sensetime Inc

Research Intern

ShenZhen, China

2022

2023-06 — 2024-12

Vision-CAIR Group, KAUST

Research Intern

2023

2024-12 — 2025-10

ByteDance Seed VLM-BAGEL Group

Research Intern

2024

2025-10 — Now

ByteDance Seed Edge

Research Intern

2025

← drag to explore →

selected publications

For a full list, please refer to Google Scholar or the publications page.

Preprint

Sample-Efficient Learning from Agent Experience

Chenhui Gou, Haoqin Tu, Yunhao Fang, and 2 more authors

Jul 2026

arXiv Bib

@article{gou2026sampleefficient,
  title = {Sample-Efficient Learning from Agent Experience},
  author = {Gou, Chenhui and Tu, Haoqin and Fang, Yunhao and Cai, Jianfei and Rezatofighi, Hamid},
  month = jul,
  year = {2026},
}

Tech Report

Seed1.5-VL Technical Report

ByteDance Seed Team

Contributor. , 2025

Bib

BAGEL

Emerging Properties in Unified Multimodal Pretraining

ByteDance BAGEL Team

Core contributor.

, May 2025

Bib Code

CVPR

VQ-VA World: Towards High-Quality Visual Question-Visual Answering

Chenhui Gou^*, Zilong Chen^*, Zeyu Wang^*, and 10 more authors

In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

, 2026

Bib

@inproceedings{gou2026vqva,
  title = {VQ-VA World: Towards High-Quality Visual Question-Visual Answering},
  author = {Gou, Chenhui and Chen, Zilong and Wang, Zeyu and Li, Feng and Zhu, Deyao and Duan, Zicheng and Li, Kunchang and Deng, Chaorui and Yuan, Hongyi and Fan, Haoqi and Xie, Cihang and Cai, Jianfei and Rezatofighi, Hamid},
  year = {2026},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
}

CVPR

DrVideo: Document Retrieval Based Long Video Understanding

Ziyu Ma^*, Chenhui Gou^*, Hengcan Shi, and 4 more authors

In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

Bib

@inproceedings{ma2025drvideo,
  title = {DrVideo: Document Retrieval Based Long Video Understanding},
  author = {Ma, Ziyu and Gou, Chenhui and Shi, Hengcan and Sun, Bin and Li, Shutao and Rezatofighi, Hamid and Cai, Jianfei},
  year = {2025},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
}

CVPR

An Empirical Study on How Video-LLMs Answer Video Questions

Chenhui Gou, Ziyu Ma, Zicheng Duan, and 6 more authors

In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026

Bib

@inproceedings{gou2026empirical,
  title = {An Empirical Study on How Video-LLMs Answer Video Questions},
  author = {Gou, Chenhui and Ma, Ziyu and Duan, Zicheng and He, Haoyu and Chen, Feng and Liu, Akide and Zhuang, Bohan and Cai, Jianfei and Rezatofighi, Hamid},
  year = {2026},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
}

Preprint

How Well Can Vision Language Models See Image Details?

Chenhui Gou, Faizan Khan, Deyao Zhu, and 4 more authors

2024

Bib

@article{gou2024howwell,
  title = {How Well Can Vision Language Models See Image Details?},
  author = {Gou, Chenhui and Khan, Faizan and Zhu, Deyao and Felemban, Abdulwahab and Cai, Jianfei and Rezatofighi, Hamid and Elhoseiny, Mohamed},
  year = {2024},
}

NeurIPS

RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer

Jian Wang^*, Chenhui Gou^*, Qiman Wu^*, and 4 more authors

In Advances in Neural Information Processing Systems (NeurIPS). Spotlight Presentation , 2022

Bib

@inproceedings{wang2022rtformer,
  title = {RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer},
  author = {Wang, Jian and Gou, Chenhui and Wu, Qiman and Feng, Haocheng and Han, Junyu and Ding, Errui and Wang, Jingdong},
  year = {2022},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
}

news

🚀 experience

selected publications

🌍 visitors