I am an incoming PhD student at HKUST. I received my B.E. degree from Beihang University. I am currently working as a research intern at Microsoft Research Asia. Previously, I have also interned at SenseTime Research. My research interest includes efficient large vision/language models, video generation, and world models.

I’m always actively seeking internship/collaboration opportunities. If you are interested, please feel free to contact me 😎. Here’s my CV.

🔥 News

  • 2024.10:  🎉🎉 Our LLMC is accepted to EMNLP Industry Track.
  • 2024.07:  🎉🎉 Our PTSBench is accepted to ACM MM.
  • 2024.06:   Graduate from Beihang University.
  • 2024.02:  🎉🎉 Our TFMQ-DM is accepted to CVPR as a Highlight Poster (Top 2.8%).

📝 Publications

(* indicates equal contribution, 📧 indicates corresponding author.)

Arxiv
sym

HarmoniCa: Harmonizing Training and Inference for Better Feature Cache in Diffusion Transformer Acceleration

Yushi Huang*, Zining Wang*, Ruihao Gong📧, Jing Liu, Xinjie Zhang, Jun Zhang📧

  • Uncover two discrepancies between training and inference for the existing learning-based feature cache method.
  • Propose HarmoniCa built upon two training techniques to alleviate the discrepancies.
  • Extensive experiments on 2 tasks across 7 models and 4 samplers with resolutions ranging from $256\times256$ to $2048\times2048$ proves the superiority and universality of our framework.
[paper] [abstract]
Arxiv
sym

Temporal Feature Matters: A Framework for Diffusion Model Quantization

Yushi Huang, Ruihao Gong, Xianglong Liu📧, Jing Liu, Yuhang Li, Jiwen Lu, Dacheng Tao

  • Compare and analyze the sensitivity and disturbance for temporal and non-temporal features.
  • Propose TIB-based and Cache-based Maintenance with Disturbance-aware Selection for temporal feature maintenance.
  • Reduce the FID score by 5.61 under the w4a8 configuration for SD-XL. Additionally, achieve 2.20$\times$ and 5.76$\times$ speedup on CPU and GPU, respectively.
[paper] [abstract]
EMNLP 2024 Industry Track
sym

LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkitsym

Ruihao Gong*, Yang Yong*, Shiqiao Gu*, Yushi Huang*, Chengtao Lv, Yunchen Zhang, Dacheng Tao, Xianglong Liu📧

  • A versatile LLM compression toolkit LLMC supports dozens of algorithms, models, and multiple inference backends with powerful expandability and all-around evaluation, enabling users to perform compression for 100-billion-parameter LLMs with just a single GPU.
  • Modularly and fairly benchmark LLM quantization considering calibration data, algorithms, and data type.
  • With detailed observation and analysis, various types of novel points for performance and method improvements under different configurations.
[paper] [code] [abstract]
ACM MM 2024
sym

PTSBench: A Comprehensive Post-Training Sparsity Benchmark Towards Algorithms and Modelssym

Zining Wang, Jinyang Guo, Ruihao Gong, Yang Yong, Aishan Liu, Yushi Huang, Jiaheng Liu, Xianglong Liu📧

  • The first systematic benchmark to conduct a comprehensive evaluation of PTS methods.
  • Uncover and summarize several useful insights and takeaway conclusions, which can serve as a guidance for future PTS method design.
  • Serve as a well-organized codebase for future research of PTS algorithms.
[paper] [code] [abstract]
CVPR 2024 (Highlight)
sym

TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Modelssym

Yushi Huang*, Ruihao Gong*, Jing Liu, Tianlong Chen, Xianglong Liu📧

  • First observe temporal disturbance and provide detailed analyses.
  • Propose TIAR and FSC for temporal feature maintenance.
  • Reduce FID by 6.71 and 2.26 for CelebA-HQ $256\times256$ and LSUN-Bedrooms $256\times256$, respectively.
[paper] [code] [abstract] [project page]

📋 Services

  • Conference Reviews: NeurIPS 2024, ICLR 2025

📖 Educations

  • 2020.09 - 2024.06, B.Eng. in Computer Science and Engineering, Shenyuan Honors College, Beihang University.

💻 Internships

  • 2024.12 - Now, Microsoft Research Asia.
  • 2023.05 - 2024.12, SenseTime Research.