URTEXT is building the world's first dataset of real programming behavior — tracking keystrokes, edits, and reasoning — to help AI and research better understand how programmers think
Our pipeline transforms raw programming sessions into clean, structured datasets
Developers record their screen, keyboard, and voice as they solve problems
Sensitive data is automatically removed or masked
Each recording is synchronized and labeled for context
Experts check for quality, consistency, and privacy before use
Study real debugging, refactoring, and problem-solving processes
Show students how experts think while coding, not just the final result
Train systems that reason like real programmers — grounded in process, not static code
Most programming datasets capture finished code. URTEXT captures how that code was created — including mistakes, reasoning, and iteration
This provides new insight into learning, productivity, and intelligence in software creation
We design for privacy from the start:
→You decide what's recorded and what's redacted.
→Raw recordings are never shared publicly.
→All data goes through multi-stage anonymization before use.
→Contributors can withdraw participation anytime.
URTEXT is a student-led initiative exploring AI, programming, and data-driven projects
Technical development, including coding and data structuring, is carried out by a network of trusted software engineers and researchers
Grounded in academic rigor and real-world developer insights
Built by developers who understand the craft of programming
Your data stays yours with full transparency and control
We're currently preparing small pilot collaborations with select research and industry partners
If you're interested in early access, sign up below — we'll contact you when applications open
Join our pilot phase as a partner or contributor
Capturing real programming behavior for AI and research