Work

Built on a decade of shipped work

Tools and research already reaching 15 million students across the US and India. Everything we build produces open data, open tools, and open research — evidence for the whole field.

15M+

students reached

US + India

75+

publications

CHI, AIED, EDM, L@S, JCAL

34K+

open assessment items

CC-licensed

assessments via SmartPaper

Rajasthan, India

Open Items

Open assessment infrastructure for K-12 education

The open applied layer on top of open education infrastructure. 34K+ CC-licensed assessment items with AI generation, LLM evaluation, adaptive practice, and 18 interactive math widgets — built on the CZI Learning Commons Knowledge Graph. The first serious applied project demonstrating what's possible when you build openly on shared infrastructure.

34K+

CC-licensed assessment items

interactive math widgets

250K

standards via Knowledge Graph

—Built on the CZI Learning Commons Knowledge Graph (250K standards, 2K learning components, 273K relationships)
—LLM-as-Judge evaluation pipeline: 5-dimension scoring, 85% auto-approve rate, 98% mathematical accuracy
—9 item types: multiple choice, multiple select, T/F, short answer, numeric, essay, cloze, matching, ordering
—AI content generation using Gemini 3 Flash — full K-12 curriculum at $25-50 total cost
—Adaptive practice with Elo-rated difficulty calibration
—PDF worksheet generation with interactive widget renderers
—REST API and embeddable SDKs for integration by other tools

/openitems →

SmartPaper

Bridging paper and digital learning at national scale

Computer vision tool that lets teachers print worksheets, students write by hand, and AI scores instantly. Deployed across government schools in Rajasthan, India — bridging the gap between paper-based classrooms and digital assessment infrastructure.

5M+

student assessments processed

120K

item responses in longitudinal dataset

longitudinal assessments conducted

Students in a Rajasthan classroom using SmartPaper worksheets

—Deployed in partnership with Indian state governments in Rajasthan
—Measurable reduction in learning poverty across participating schools
—UNESCO-recognized assessment innovation
—Generates open psychometric data on assessment items at scale

getsmartpaper.com →

PlayPower

Free games, quests, and AI tools reaching 10M+ students

50+ free K-8 math games, 45 high school quests, AI teacher tools — distributed by Savvas Learning to schools nationwide. The commercial track record that demonstrated the need for open infrastructure.

50+

free math games (K-8)

high school quests

10M+

US students reached

PlayPower — free math games and AI teacher tools

—Distributed by Savvas Learning (one of the largest US K-12 publishers)
—Free games at playpowergames.com covering half of K-8 math topics, English and Spanish
—Teacher AI tools: lesson planning, activity planning, worksheet creation, text leveling
—Experience building at scale informed the design of Open Items

playpowergames.com →

UpGrade

Open-source A/B testing for education

An open-source experimentation platform purpose-built for educational software, enabling researchers and developers to run rigorous A/B tests within learning environments. Originally built on Carnegie Learning's infrastructure.

Open

source

major funders

—Funded by the Gates Foundation and Schmidt Futures
—Built on Carnegie Learning's infrastructure
—Purpose-built for educational experimentation
—Supports multi-armed bandit and factorial designs

AI Research Tools

Qualitative and quantitative instruments for studying AI in education

A suite of research tools for studying AI's impact — including AI-powered qualitative interviews (text and voice), adaptive testing platforms, and synthetic student simulation for rapid psychometric feedback.

Text + Voice

AI interview modalities

Adaptive

testing platform

Synthetic

student simulation

—AI-powered qualitative interviews by text and voice
—Adaptive testing engine with real-time item selection
—Synthetic student simulation for instant psychometric calibration
—Open evaluation frameworks for assessing AI tool quality

Selected publications

75+ publications across top venues

LLM Difficulty Estimation

200 experimental conditions, 15+ large language models evaluated for math item difficulty estimation.

AIED 2026

A/B Testing at Scale Workshops

Four annual workshops convening researchers on experimentation methodology in learning at scale.

L@S 2020-2023

CHI Honorable Mention

Large-scale experiment with 70,000+ subjects on design choices in educational games.

CHI

Full publication list spanning CHI, AIED, EDM, L@S, JCAL, and more. Venues include ACM, Springer, IEEE, and ISLS.

See where we're headed

Our roadmap for the next phase of research and tools.

Roadmap →