We propose DREAMoR: a diffusion-based motion prior framework for reconstructing physically plausible human motion from corrupted sequences.
We have developed a pipeline that extracts high-quality meshes of user-specified objects from an input video. Our approach builds upon 3D Gaussian Splatting (3DGS), a powerful method for accurate scene reconstruction. While Gaussian splats serve as an implicit geometric primitive, converting them into explicit mesh representations to enable compatibility with modern industrial pipelines remains challenging. Existing approaches like SuGaR and GS2Mesh often suffer from poor surface quality and undesirable object adhesion, which significantly limits their practicality.
This project explores three deep learning approaches for facial keypoint detection. The objective is to accurately localize each keypoint based on the input image. I investigate: (1) direct coordinate regression using a custom CNN, (2) transfer learning with pretrained ResNet18 and self-supervised DINO models, and (3) heatmap-based prediction using a U-Net architecture.
Here are my CS184 course projects~ Including: Ray Tracing, Cloth Simulation, Manipulate Meshes and Rasterization.
We propose a novel open-source testing framework and benchmark in the field of Vision-Language Navigation (VLN) to evaluate the goal-seeking capabilities of Large Language Model (LLM) agents in real-world environments. To this end, we designed a QA agent that operates without relying on human supervision or data annotations, serving as a semantic heuristic function to provide navi- gational cues to the agent under evaluation. Additionally, we leveraged techniques such as Rein-forcement Learning with AI Feedback (RLAIF) to develop new metrics for detailed analysis of the agent’s progressive information acquisition, multimodal cross-inference, and spatial reasoning abilities. Experimental results demonstrate significant room for improvement in current LLM agents across these dimensions. Future work may explore enhancing LLMs’ visual perception capabilities and their alignment of spatial information with semantic understanding.
Here are my CS180 course projects~