Youngbin Kim
LLM Engineer / Data Scientist (Clinical AI)
View my workAbout
Who I Am
I'm a Senior Data Scientist at NewYork-Presbyterian, where I was the first dedicated LLM hire. I build clinical NLP pipelines within HIPAA-compliant LLM infrastructure (local and cloud) and design agentic workflows that help clinicians make better decisions. I earned my PhD in Biomedical Engineering from Columbia University in 2024, focusing on machine learning approaches to cardiac signal analysis in disease models using stem cell derived engineered heart tissues.
Before NYP and Columbia, I interned at Genentech building ML models and studied Bioengineering and EECS at UC Berkeley. I've published 9 peer-reviewed papers during my PhD and built BeatProfiler, an open-source ML platform for cardiac analysis adopted by external researchers. When I'm not wrangling data, you'll find me singing with the Young New Yorkers' Chorus, biking across the city, or planning my next trip.
What I Do
-
LLM Engineering
HIPAA-compliant infrastructure, LangGraph pipelines, agentic workflows
-
Clinical NLP
Medical text extraction, clinical decision support, EHR integration
-
Cardiac Signal Analysis
ML for ECG-like/calcium transient signals, BeatProfiler platform
-
Data Science
Python, deep learning, statistical modeling
Experience
2024 – Present
Senior Data Scientist
New York-Presbyterian
First dedicated LLM hire. Built clinical NLP pipelines, HIPAA-compliant LLM infrastructure, and agentic workflows.
2019 – 2024
PhD, Biomedical Engineering
Columbia University
ML for cardiac signal analysis. Created BeatProfiler. 9 peer-reviewed publications including IEEE.
2022
Machine Learning Intern
Genentech
Built multimodal ML models for biomarker discovery from Alzheimer's drug clinical trial data.
2015 – 2019
BS, Bioengineering & EECS
UC Berkeley
Foundation in engineering and computer science.
Projects
ChartExtract
In DevelopmentExtracting structured data from clinical documents like discharge summaries, operative notes, and pathology reports is one of the most labor-intensive tasks in healthcare. ChartExtract is an agentic platform that lets clinical staff design, test, and optimize multi-step extraction pipelines through a visual interface and conversational AI instead of hand-building brittle review workflows.
Pipeline Builder Workflow
Final Pipeline
Example generated DAG
Cancer Type Extraction
Extracts cancer type from clinical text (pathology report, discharge summary).
inputs
outputs
Biomarker Extraction
Extracts HER2, ER, PR receptor status from pathology report for breast cancer cases.
inputs
outputs
BeatProfiler
Open SourceAn end-to-end machine learning platform for automated cardiac signal analysis. BeatProfiler takes raw contractile videos or calcium transient / field potential recordings from stem cell-derived cardiomyocytes, processes them through automated pipelines, and classifies disease phenotypes — turning hours of manual analysis into seconds.
Processing Pipeline
Calcium Transient Signal