I am a 1st-year Ph.D. student at HKUST, supervised by Prof. Jun Zhang. I received my B.E. degree from Beihang University. I also work as a research intern at SenseTime Research, closely with Dr. Ruihao Gong. Previously, I have also interned at Microsoft Research Asia and SenseTime Research. My research interests include efficient large vision/language models and image/video generation.

I’m always actively seeking collaboration opportunities. If you are interested, please feel free to contact me 😎.

🔥 News

  • 2024.10:  🎉🎉 Our LLMC is accepted to EMNLP Industry Track.
  • 2024.07:  🎉🎉 Our PTSBench is accepted to ACM MM.
  • 2024.02:  🎉🎉 Our TFMQ-DM is accepted to CVPR as a Highlight Poster (Top $2.8\%$).

📝 Publications

(* indicates equal contribution, 📧 indicates corresponding author.)

Preprint
sym

HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration

Yushi Huang*, Zining Wang*, Ruihao Gong📧, Jing Liu, Xinjie Zhang, Jinyang Guo, Xianglong Liu, Jun Zhang📧

  • Uncover two discrepancies between training and inference for the existing learning-based feature cache method.
  • Propose HarmoniCa built upon two training techniques to alleviate the discrepancies.
  • Extensive experiments on $2$ tasks across $8$ models and $4$ samplers with resolutions ranging from $256\times256$ to $2048\times2048$ prove the superiority and universality of our framework.
  • Achieve over $40\%$ latency reduction (i.e., $2.07\times$ theoretical speedup) and improved performance on PixArt-$\alpha$. Remarkably, our image-free approach reduces training time by $25\%$ compared with the previous method.
[paper] [abstract]
Preprint
sym

Temporal Feature Matters: A Framework for Diffusion Model Quantization

Yushi Huang, Ruihao Gong, Xianglong Liu📧, Jing Liu, Yuhang Li, Jiwen Lu, Dacheng Tao

  • Compare and analyze the sensitivity and disturbance for temporal and non-temporal features.
  • Propose TIB-based and Cache-based Maintenance with Disturbance-aware Selection for temporal feature maintenance.
  • Reduce the FID score by $5.61$ under the w4a8 configuration for SD-XL. Additionally, achieve $2.20\times$ and $5.76\times$ speedup on CPU and GPU, respectively.
[paper] [abstract]
EMNLP 2024 Industry Track
sym

LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit sym

Ruihao Gong*, Yang Yong*, Shiqiao Gu*, Yushi Huang*, Chengtao Lv, Yunchen Zhang, Dacheng Tao, Xianglong Liu📧

  • A versatile LLM compression toolkit LLMC supports dozens of algorithms, models, and multiple inference backends with powerful expandability and all-around evaluation, enabling users to perform compression for LLMs (i.e., DeepSeek-V3 and LLaMA-3.1 $405$B) with just a single GPU.
  • Modularly and fairly benchmark LLM quantization considering calibration data, algorithms, and data type.
  • With detailed observation and analysis, various types of novel points for performance and method improvements under different configurations.
[paper] [code] [abstract]
ACM MM 2024
sym

PTSBench: A Comprehensive Post-Training Sparsity Benchmark Towards Algorithms and Models sym

Zining Wang, Jinyang Guo, Ruihao Gong, Yang Yong, Aishan Liu, Yushi Huang, Jiaheng Liu, Xianglong Liu📧

  • The first systematic benchmark to conduct a comprehensive evaluation of PTS methods.
  • Uncover and summarize several useful insights and takeaway conclusions, which can serve as guidance for future PTS method design.
  • Serve as a well-organized codebase for future research of PTS algorithms.
[paper] [code] [abstract]
CVPR 2024 (Highlight)
sym

TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models sym

Yushi Huang*, Ruihao Gong*, Jing Liu, Tianlong Chen, Xianglong Liu📧

  • First observe temporal disturbance and provide detailed analyses.
  • Propose TIAR and FSC for temporal feature maintenance.
  • Reduce FID by $6.71$ and $2.26$ for CelebA-HQ $256\times256$ and LSUN-Bedrooms $256\times256$, respectively.
[paper] [code] [abstract] [project page]

📋 Services

  • Conference Reviews: NeurIPS, ICLR, ICML, COLM.

📖 Educations

  • 2025.02 - Now, Ph.D. in Electronic Computer and Engineering, Hong Kong University of Science and Technology.
  • 2020.09 - 2024.06, B.Eng. in Computer Science and Engineering, Shenyuan Honors College, Beihang University.

💻 Internships

  • 2025.02 - Now, SenseTime Research.
  • 2024.12 - 2025.02, Microsoft Research Asia.
  • 2023.05 - 2024.12, SenseTime Research.