Youngbin Kim
LLM Engineer / Data Scientist (Clinical AI)
View my workAbout
Who I Am
I'm a Senior Data Scientist at NewYork-Presbyterian, where I was the first dedicated LLM hire. I build clinical NLP pipelines within HIPAA-compliant LLM infrastructure (local and cloud) and design agentic workflows that help clinicians make better decisions. I earned my PhD in Biomedical Engineering from Columbia University in 2024, focusing on machine learning approaches to cardiac signal analysis in disease models using stem cell derived engineered heart tissues.
Before NYP and Columbia, I interned at Genentech building ML models and studied Bioengineering and EECS at UC Berkeley. I've published 9 peer-reviewed papers during my PhD and built BeatProfiler, an open-source ML platform for cardiac analysis adopted by external researchers. When I'm not wrangling data, you'll find me singing with the Young New Yorkers' Chorus, biking across the city, or planning my next trip.
What I Do
-
LLM Engineering
HIPAA-compliant infrastructure, LangGraph pipelines, agentic workflows
-
Clinical NLP
Medical text extraction, clinical decision support, EHR integration
-
Cardiac Signal Analysis
ML for ECG-like/calcium transient signals, BeatProfiler platform
-
Data Science
Python, deep learning, statistical modeling
Experience
2024 – Present
Senior Data Scientist
New York-Presbyterian
First dedicated LLM hire. Built clinical NLP pipelines, HIPAA-compliant LLM infrastructure, and agentic workflows.
2019 – 2024
PhD, Biomedical Engineering
Columbia University
ML for cardiac signal analysis. Created BeatProfiler. 9 peer-reviewed publications including IEEE.
2022
Machine Learning Intern
Genentech
Built multimodal ML models for biomarker discovery from Alzheimer's drug clinical trial data.
2015 – 2019
BS, Bioengineering & EECS
UC Berkeley
Foundation in engineering and computer science.
Projects
BeatProfiler
Open SourceAn end-to-end machine learning platform for automated cardiac signal analysis. BeatProfiler takes raw contractile videos or calcium transient / field potential recordings from stem cell-derived cardiomyocytes, processes them through automated pipelines, and classifies disease phenotypes — turning hours of manual analysis into seconds.
Processing Pipeline
Calcium Transient Signal