publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2026

  1. CVPR
    VQ-VA World: Towards High-Quality Visual Question-Visual Answering
    Chenhui Gou*, Zilong Chen*, Zeyu Wang*, and 10 more authors
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). HF Downloads HF Downloads , 2026
  2. CVPR
    An Empirical Study on How Video-LLMs Answer Video Questions
    Chenhui Gou, Ziyu Ma, Zicheng Duan, and 6 more authors
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
  3. AAAI
    Where and What Matters: Sensitivity-Aware Task Vectors for Many-Shot Multimodal In-Context Learning
    Ziyu Ma*, Chenhui Gou*, Yiming Hu, and 4 more authors
    In AAAI Conference on Artificial Intelligence (AAAI), 2026
  4. ICLR
    Sparsity Forcing: Reinforcing Token Sparsity of MLLMs
    Feng Chen, Yefei He, Lequan Lin, and 4 more authors
    In International Conference on Learning Representations (ICLR), 2026
  5. CVPR
    Evaluating and Advancing Multimodal Large Language Models in Ability Lens
    Feng Chen*, Chenhui Gou, Jing Liu, and 6 more authors
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings, 2026

2025

  1. Tech Report
    Seed1.5-VL Technical Report
    ByteDance Seed Team
    Contributor. , 2025
  2. BAGEL
    Emerging Properties in Unified Multimodal Pretraining
    ByteDance BAGEL Team
    Core contributor. GitHub stars , May 2025
  3. Preprint
    LightBagel: A Light-Weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
    Zeyu Wang*, Zilong Chen*, Chenhui Gou*, and 8 more authors
    2025
  4. Preprint
    UniMedVL: Unifying Medical Multimodal Understanding and Generation Through Observation-Knowledge-Analysis
    . Contributor. , 2025
  5. CVPR
    DrVideo: Document Retrieval Based Long Video Understanding
    Ziyu Ma*, Chenhui Gou*, Hengcan Shi, and 4 more authors
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
  6. EMNLP
    InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video Understanding
    Kirolos Ataallah, Chenhui Gou, Eslam Mohamed Bakr, and 3 more authors
    In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025
  7. Preprint
    LiveWorld: Simulating Out-of-Sight Dynamics in Generative Video World Models
    Zicheng Duan, Jiatong Xia, Zeyu Zhang, and 7 more authors
    2025
  8. CVPR
    Point-Cache: Test-time Dynamic and Hierarchical Cache for Robust and Generalizable Point Cloud Analysis
    Hongyu Sun, Qiuhong Ke, Ming Cheng, and 4 more authors
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

2024

  1. Preprint
    Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model
    Abdelrahman Shaker, Muhammad Maaz, Chenhui Gou, and 3 more authors
    2024
  2. Preprint
    How Well Can Vision Language Models See Image Details?
    Chenhui Gou, Faizan Khan, Deyao Zhu, and 4 more authors
    2024
  3. CVPR
    JRDB-PanoTrack: An Open-World Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments
    Duy-Tho Le*, Chenhui Gou*, Stavya Datta, and 4 more authors
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

2023

  1. Preprint
    Strong and Controllable Blind Image Decomposition
    Zeyu Zhang*, Junlin Han*, Chenhui Gou*, and 2 more authors
    2023

2022

  1. NeurIPS
    RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer
    Jian Wang*, Chenhui Gou*, Qiman Wu*, and 4 more authors
    In Advances in Neural Information Processing Systems (NeurIPS). Spotlight Presentation , 2022