Projects

DREAMoR - Diffusion-based REconstruction And Motion prioR

Human Motion Perception

DREAMoR - Diffusion-based REconstruction And Motion prioR

We propose DREAMoR: a diffusion-based motion prior framework for reconstructing physically plausible human motion from corrupted sequences.

May 11, 2025

Mesh Any Gaussians

Computer Graphics

Mesh Any Gaussians

We have developed a pipeline that extracts high-quality meshes of user-specified objects from an input video. Our approach builds upon 3D Gaussian Splatting (3DGS), a powerful method for accurate scene reconstruction. While Gaussian splats serve as an implicit geometric primitive, converting them into explicit mesh representations to enable compatibility with modern industrial pipelines remains challenging. Existing approaches like SuGaR and GS2Mesh often suffer from poor surface quality and undesirable object adhesion, which significantly limits their practicality.

May 4, 2025

Facial Keypoint Detection

Computer Vision

Facial Keypoint Detection

This project explores three deep learning approaches for facial keypoint detection. The objective is to accurately localize each keypoint based on the input image. I investigate: (1) direct coordinate regression using a custom CNN, (2) transfer learning with pretrained ResNet18 and self-supervised DINO models, and (3) heatmap-based prediction using a U-Net architecture.

Apr 15, 2025

Computer Graphics Projects

Computer Graphics

Computer Graphics Projects

Here are my CS184 course projects~ Including: Ray Tracing, Cloth Simulation, Manipulate Meshes and Rasterization.

Jan 21, 2025

SPARK - A Scavenger Hunt Game for LLM Agents

SPARK - A Scavenger Hunt Game for LLM Agents

We propose a novel open-source testing framework and benchmark in the field of Vision-Language Navigation (VLN) to evaluate the goal-seeking capabilities of Large Language Model (LLM) agents in real-world environments. To this end, we designed a QA agent that operates without relying on human supervision or data annotations, serving as a semantic heuristic function to provide navi- gational cues to the agent under evaluation. Additionally, we leveraged techniques such as Rein-forcement Learning with AI Feedback (RLAIF) to develop new metrics for detailed analysis of the agent’s progressive information acquisition, multimodal cross-inference, and spatial reasoning abilities. Experimental results demonstrate significant room for improvement in current LLM agents across these dimensions. Future work may explore enhancing LLMs’ visual perception capabilities and their alignment of spatial information with semantic understanding.

Dec 20, 2024

Computer Vision Projects

Computer Vision Projects

Here are my CS180 course projects~

Sep 1, 2024