About

Hi! I'm Sicheng Lai, a senior undergrad at the Chinese University of Hong Kong, Shenzhen.

I had the pleasure of spending a semester at UC San Diego, where I got to work with Dr. Yadi Cao in Prof. Rose Yu's lab on SimulCost, a cost-aware benchmarking framework for evaluating LLMs on scientific simulation optimization. I'm thrilled that this work — one I'm genuinely proud of — is finally going open-source.

At CUHK-Shenzhen, I was fortunate to collaborate with Dr. Dingjie Song, supervised by Prof. Benyou Wang, on multimodal data contamination detection. We developed MM-Detect, a systematic framework for detecting data contamination in multimodal LLMs across both text and image modalities.

Publications

SimulCost: A Cost-Aware Benchmark for Automating Physics Simulations with LLMs

Yadi Cao†, Sicheng Lai†, Jiahe Huang†, Yang Zhang†, Zach Lawrence†, Rohan Bhakta†, Izzy F. Thomas†, Mingyun Cao†, Chung-Hao Tsai, Zihao Zhou, Yidong Zhao, Hao Liu, Alessandro Marinoni, Alexey Arefiev, Rose Yu*

† equal contribution  ·  * corresponding author

arXiv preprint

TL;DR  A cost-aware benchmark for LLM-guided simulation parameter tuning — measuring both accuracy and simulation compute across 12 physics simulators in fluid dynamics, solid mechanics, and plasma physics.

Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM

Sicheng Lai†, Dingjie Song†, Mingxuan Wang, Shunian Chen, Lichao Sun, Benyou Wang*

† equal contribution  ·  * corresponding author

ICML 2025 DIG-BUGs Workshop (Oral)  ·  EMNLP 2025 (Findings)

TL;DR  A systematic framework for detecting multimodal data contamination in MLLMs, revealing significant contamination — even originating from unimodal pre-training — across 12 models and 5 benchmarks.

Experiences

Education

B.Sc. Computer Science and Engineering

The Chinese University of Hong Kong, Shenzhen

Sept. 2022 – June 2026

  • Dean's List (Outstanding academic merit), 2022–2025
  • Undergraduate Research Awards (Key research contributions), 2024–2025

Visiting Student

University of California San Diego

March 2025 – July 2025

Research Experience

SimulCost: A Cost-Aware Playground

UC San Diego — Supervisor: Prof. Rose Yu

April 2025 – Present

  • Developed SimulCost, a comprehensive benchmarking framework evaluating LLM cost-accuracy trade-off on scientific simulation optimization across 13 physics solvers and 40+ hyperparameter tuning tasks.
  • Engineered a scalable Playground API with caching and pandas DataFrame-based persistence, reducing redundant computations by 50%.
  • Benchmarked LLMs against brute-force (grid search) and Bayesian optimization (Gaussian process) baselines across 6 real-world scenarios, conducting statistical analysis and ablation studies.

Data Contamination in Multimodal LLMs

CUHK-Shenzhen — Supervisor: Prof. Benyou Wang

June 2024 – Dec. 2024

  • Pioneered MM-Detect, a systematic multimodal contamination detection framework with two novel methods: Option Order Sensitivity Test and Slot Guessing for Perturbed Caption.
  • Validated by training LLaVA-1.5-7B variants with controlled leakage at early, mid, and late training stages across contamination ratios of 10%, 50%, and 100%.
  • Developed a heuristic method to trace contamination origins, revealing unimodal contamination in base LLMs and cross-modal contamination via training data overlap analysis.

Internships

Multimodal Algorithm Engineer

Tsinghua Shenzhen International Graduate School, Shenzhen

Dec. 2024 – March 2025

  • Optimized MLLM configurations through systematic experimentation with LLMs (Qwen2.5/Vicuna) and visual encoders (CLIP/MLCD/SigLIP) based on the LLaVA-Next framework.
  • Evaluated model performance using VLMEvalKit and lmms-eval; identified optimal configuration (Qwen2.5 + SigLIP), achieving top 20% ranking on the OpenCompass MLLM Leaderboard.
  • Trained Qwen2.5 + SigLIP on agricultural datasets curated and augmented with data-juicer, achieving expert-level agricultural capabilities while maintaining general-domain performance.

Teaching

Undergraduate Student Teaching Fellow

STA2001: Probability and Statistics I · CUHK-Shenzhen · Prof. Tianshi Chen

Spring 2024

CV

My full curriculum vitae is available as a PDF document. You can view it inline or download it for your records.

Misc

A collection of miscellaneous things I find interesting or worth sharing.

Hobbies

  • 🏋️ Gym — LIGHT WEIGHT BABY!
  • 🏀 Basketball
  • ⚽ Soccer
  • 🎤 Singing (shower concerts mostly, ngl)
  • 🌄 Traveling — especially when it means chasing breathtaking scenery

Fun Facts

When I’m not coding, you’ll probably find me playing Valorant — maining Raze and Phoenix, and pretending I’m the best Radiant in North America. (Give me some time 🤫)