Junhee Park

About Me

Research Interest: Egocentric Video Understanding, Robot Vision

My ultimate goal is to enable seamless and natural interaction between humans and robots, allowing robots to integrate smoothly into human society and provide meaningful assistance in everyday life.

Publication

XNav-Pipe: Cross-Platform Robot Navigation Data Generation Pipeline

Sungwoong Kim, Minseo Kim, Junhee Park, Siyeol Kim, Jihwan Yu and Youngjae Yu.(2025)

Research in robot navigation has been accelerating rapidly with the advent of vision–language models (VLMs) that can explicitly reason about the diverse variables present in real-world driving scenarios. Despite the growing success of AI-based robot navigation, existing datasets and policies remain tightly coupled to specific hardware embodiments, which limits cross-platform generalization. As a result, platforms for which data are substantially harder to collect and far less publicly available than for wheeled robots such as legged robots remain chronically under-represented, aggravating the dataset imbalance across robot morphologies. To resolve this data discrepancy, we propose XNav-Pipe, a 2-staged synthetic data generation framework that expands specific-robot driving datasets to general-purpose navigation across robot types.

GuideDog: A Real-World Egocentric Multimodal Dataset for Blind and Low-Vision Accessibility-Aware Guidance

Junhyeok Kim*, Jaewoo Park*, Junhee Park, Sangeyl Lee, Jiwan Chung, Jisung Kim, Ji Hoon Joung and Youngjae Yu.(2025)

Mobility remains a significant challenge for the 2.2 billion people worldwide affected by blindness and low vision (BLV), with 7% of visually impaired individuals experiencing falls at least once a month. While recent advances in Multimodal Large Language Models (MLLMs) offer promising opportunities for BLV assistance, their development has been hindered by limited datasets.

arXiv:2503.12844

Work Experience

Computational Intelligence & Photography Lab | Advisor: Seonjoo Kim

Graduate Researcher - (Sep. 2025 – Present)

Multimodal Intelligence Research Lab | Advisor: Youngjae Yu

Research Intern - (Dec. 2024 – August. 2025)

XNav-Pipe: Cross-Platform Robot Navigation Data Generation Pipeline
Sungwoong Kim, Minseo Kim, Junhee Park, Siyeol Kim, Jihwan Yu

Reconstructed the CANVAS (previous work) VLA pipeline and redesigned the architecture based on Qwen2.5-VL.
Integrated real-time communication with Unitree Go2 via ROS2 topics, using camera and sensor (coordinate) inputs for multimodal processing.

GuideDog: A Real-World Multimodal Dataset for Blind and Low-Vision Accessibility-Aware Guidance
Junhyeok Kim, Jaewoo Park, Junhee Park, Sanyeol Lee, Jiwan Chung

Distilled global BLV association guidelines into a structured, machine-readable format for VLM integration.
Built and managed Label Studio pipelines for high-quality description generation, including IAA validation.

XL8.ai | U.S. Headquarter

AI Research Intern - (Oct. 2023 – Apr. 2024)

AI-driven media localization with context-aware translations

Optimized ParaCrawl Dataset: Filtered high-quality translation data using LaBSE with language-specific thresholds, automating the process and uploading to AWS S3.
Enhanced Translation Accuracy: Improved translation accuracy for German, Portuguese, and Vietnamese by incorporating number word augmentation and adding targeted test cases.
Integrated Translation Evaluation: Combined translation codes from XL8, Google, and DeepL, evaluating with BLEU, COMET, and MetricX23.
Developed Transliteration Module: Integrated English-to-Korean and Japanese transliteration modules based on comparative analysis.

RealSmart Corporation

Software Engineer - (Nov. 2020 – Dec. 2021)

OMR-based testing, survey analysis, and smart tools for education and business

Developed the WordTEST Android app using JAVA and MSSQL with optimized DB performance.
Managed and maintained 90+ MSSQL databases, performing performance tuning, backups, and security operations.
Refined a VSTO-based reporting program widely adopted in education and enterprise sectors.
Enhanced a Windows application used by 500+ academies, 200+ universities, and 300+ companies including Samsung and Seoul National University.

Projects

GemmaPaperQA - Google Machine Learning Bootcamp Hugging Face

Jul. 2024 – Oct. 2024

Developed a QA system for academic papers using the Gemma2 open-source model and LangChain.
Fine-tuned the model with LoRA to significantly improve performance.

ISIC 2024: Skin Cancer Detection with 3D-TBP - Kaggle

Aug. 2024 – Oct. 2024

Developed a skin cancer detection model from 3D body images using an ensemble of LGBM, XGBoost, and CatBoost.
Achieved Top 15% performance in the competition.

MATEY (Friend matching app for co-purchase) - Silicon Valley Bootcamp

San Francisco, CA | Jun. 2023 – Sep. 2023

Presented the project at Google’s San Francisco office and won 1st place among participating bootcamp teams.
Led a team of six engineers to develop a friend-matching app based on shared purchasing preferences.
Developed a user similarity algorithm using Word2Vec and created visual designs using the generative AI tool Midjourney.

Deepfake Detection Challenge - Kaggle

Jan. 2020 – Mar. 2020

Developed a deepfake detection model using Tensorflow and Pytorch, integrating CNN and RNN models.
Improved model performance by 52.3%, managing a team of 7 engineers.

Benchmarking CNN Models for Image Classification Performance Benchmark

Nov. 2018 – Apr. 2019

Evaluated 13 CNN models for image classification, achieving a 97.7% accuracy with ResNet-152.

Oneday Voca - English Vocabulary Learning App (Android)

Jul. 2018 – Oct. 2018

Developed an English vocabulary app with OCR using OpenCV and Vision API.
Built end-to-end server infrastructure on AWS EC2, Apache, PHP, MySQL, and Linux.