About
Hi! I'm Sicheng Lai, a senior undergrad at the Chinese University of Hong Kong, Shenzhen.
I had the pleasure of spending a semester at UC San Diego, where I got to work with Dr. Yadi Cao in Prof. Rose Yu's lab on SimulCost, a cost-aware benchmarking framework for evaluating LLMs on scientific simulation optimization. I'm thrilled that this work — one I'm genuinely proud of — is finally going open-source.
At CUHK-Shenzhen, I was fortunate to collaborate with Dr. Dingjie Song, supervised by Prof. Benyou Wang, on multimodal data contamination detection. We developed MM-Detect, a systematic framework for detecting data contamination in multimodal LLMs across both text and image modalities.
Publications
SimulCost: A Cost-Aware Benchmark for Automating Physics Simulations with LLMs
Yadi Cao†, Sicheng Lai†, Jiahe Huang†, Yang Zhang†, Zach Lawrence†, Rohan Bhakta†, Izzy F. Thomas†, Mingyun Cao†, Chung-Hao Tsai, Zihao Zhou, Yidong Zhao, Hao Liu, Alessandro Marinoni, Alexey Arefiev, Rose Yu*
† equal contribution · * corresponding author
arXiv preprint
TL;DR A cost-aware benchmark for LLM-guided simulation parameter tuning — measuring both accuracy and simulation compute across 12 physics simulators in fluid dynamics, solid mechanics, and plasma physics.
Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM
Sicheng Lai†, Dingjie Song†, Mingxuan Wang, Shunian Chen, Lichao Sun, Benyou Wang*
† equal contribution · * corresponding author
ICML 2025 DIG-BUGs Workshop (Oral) · EMNLP 2025 (Findings)
TL;DR A systematic framework for detecting multimodal data contamination in MLLMs, revealing significant contamination — even originating from unimodal pre-training — across 12 models and 5 benchmarks.
Experiences
Education
B.Sc. Computer Science and Engineering
The Chinese University of Hong Kong, Shenzhen
Sept. 2022 – June 2026
- Dean's List (Outstanding academic merit), 2022–2025
- Undergraduate Research Awards (Key research contributions), 2024–2025
Visiting Student
University of California San Diego
March 2025 – July 2025
Research Experience
SimulCost: A Cost-Aware Playground
UC San Diego — Supervisor: Prof. Rose Yu
April 2025 – Present
- Developed SimulCost, a comprehensive benchmarking framework evaluating LLM cost-accuracy trade-off on scientific simulation optimization across 13 physics solvers and 40+ hyperparameter tuning tasks.
- Engineered a scalable Playground API with caching and pandas DataFrame-based persistence, reducing redundant computations by 50%.
- Benchmarked LLMs against brute-force (grid search) and Bayesian optimization (Gaussian process) baselines across 6 real-world scenarios, conducting statistical analysis and ablation studies.
Data Contamination in Multimodal LLMs
CUHK-Shenzhen — Supervisor: Prof. Benyou Wang
June 2024 – Dec. 2024
- Pioneered MM-Detect, a systematic multimodal contamination detection framework with two novel methods: Option Order Sensitivity Test and Slot Guessing for Perturbed Caption.
- Validated by training LLaVA-1.5-7B variants with controlled leakage at early, mid, and late training stages across contamination ratios of 10%, 50%, and 100%.
- Developed a heuristic method to trace contamination origins, revealing unimodal contamination in base LLMs and cross-modal contamination via training data overlap analysis.
Internships
Multimodal Algorithm Engineer
Tsinghua Shenzhen International Graduate School, Shenzhen
Dec. 2024 – March 2025
- Optimized MLLM configurations through systematic experimentation with LLMs (Qwen2.5/Vicuna) and visual encoders (CLIP/MLCD/SigLIP) based on the LLaVA-Next framework.
- Evaluated model performance using VLMEvalKit and lmms-eval; identified optimal configuration (Qwen2.5 + SigLIP), achieving top 20% ranking on the OpenCompass MLLM Leaderboard.
- Trained Qwen2.5 + SigLIP on agricultural datasets curated and augmented with data-juicer, achieving expert-level agricultural capabilities while maintaining general-domain performance.
Teaching
Undergraduate Student Teaching Fellow
STA2001: Probability and Statistics I · CUHK-Shenzhen · Prof. Tianshi Chen
CV
My full curriculum vitae is available as a PDF document. You can view it inline or download it for your records.
Misc
A collection of miscellaneous things I find interesting or worth sharing.
Hobbies
- 🏋️ Gym — LIGHT WEIGHT BABY!
- 🏀 Basketball
- ⚽ Soccer
- 🎤 Singing (shower concerts mostly, ngl)
- 🌄 Traveling — especially when it means chasing breathtaking scenery
Fun Facts
When I’m not coding, you’ll probably find me playing Valorant — maining Raze and Phoenix, and pretending I’m the best Radiant in North America. (Give me some time 🤫)