I am an incoming PhD student at HKUST. I received my B.E. degree from Beihang University. I am currently working as a research intern at Microsoft Research Asia. Previously, I have also interned at SenseTime Research. My research interest includes efficient large vision/language models, video generation, and world models.
I’m always actively seeking internship/collaboration opportunities. If you are interested, please feel free to contact me 😎. Here’s my CV.
🔥 News
- 2024.10: 🎉🎉 Our LLMC is accepted to EMNLP Industry Track.
- 2024.07: 🎉🎉 Our PTSBench is accepted to ACM MM.
- 2024.06: Graduate from Beihang University.
- 2024.02: 🎉🎉 Our TFMQ-DM is accepted to CVPR as a Highlight Poster (Top 2.8%).
📝 Publications
(* indicates equal contribution, 📧 indicates corresponding author.)
Yushi Huang*, Zining Wang*, Ruihao Gong📧, Jing Liu, Xinjie Zhang, Jun Zhang📧
- Uncover two discrepancies between training and inference for the existing learning-based feature cache method.
- Propose HarmoniCa built upon two training techniques to alleviate the discrepancies.
- Extensive experiments on 2 tasks across 7 models and 4 samplers with resolutions ranging from $256\times256$ to $2048\times2048$ proves the superiority and universality of our framework.
Temporal Feature Matters: A Framework for Diffusion Model Quantization
Yushi Huang, Ruihao Gong, Xianglong Liu📧, Jing Liu, Yuhang Li, Jiwen Lu, Dacheng Tao
- Compare and analyze the sensitivity and disturbance for temporal and non-temporal features.
- Propose TIB-based and Cache-based Maintenance with Disturbance-aware Selection for temporal feature maintenance.
- Reduce the FID score by 5.61 under the w4a8 configuration for SD-XL. Additionally, achieve 2.20$\times$ and 5.76$\times$ speedup on CPU and GPU, respectively.
LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit
Ruihao Gong*, Yang Yong*, Shiqiao Gu*, Yushi Huang*, Chengtao Lv, Yunchen Zhang, Dacheng Tao, Xianglong Liu📧
- A versatile LLM compression toolkit LLMC supports dozens of algorithms, models, and multiple inference backends with powerful expandability and all-around evaluation, enabling users to perform compression for 100-billion-parameter LLMs with just a single GPU.
- Modularly and fairly benchmark LLM quantization considering calibration data, algorithms, and data type.
- With detailed observation and analysis, various types of novel points for performance and method improvements under different configurations.
PTSBench: A Comprehensive Post-Training Sparsity Benchmark Towards Algorithms and Models
Zining Wang, Jinyang Guo, Ruihao Gong, Yang Yong, Aishan Liu, Yushi Huang, Jiaheng Liu, Xianglong Liu📧
- The first systematic benchmark to conduct a comprehensive evaluation of PTS methods.
- Uncover and summarize several useful insights and takeaway conclusions, which can serve as a guidance for future PTS method design.
- Serve as a well-organized codebase for future research of PTS algorithms.
TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models
Yushi Huang*, Ruihao Gong*, Jing Liu, Tianlong Chen, Xianglong Liu📧
- First observe temporal disturbance and provide detailed analyses.
- Propose TIAR and FSC for temporal feature maintenance.
- Reduce FID by 6.71 and 2.26 for CelebA-HQ $256\times256$ and LSUN-Bedrooms $256\times256$, respectively.
📋 Services
- Conference Reviews: NeurIPS 2024, ICLR 2025
📖 Educations
- 2020.09 - 2024.06, B.Eng. in Computer Science and Engineering, Shenyuan Honors College, Beihang University.
💻 Internships
- 2024.12 - Now, Microsoft Research Asia.
- 2023.05 - 2024.12, SenseTime Research.