Research — Archival Intelligence

Why are New Orleans' archives in danger?

New Orleans' cultural archives — newspapers, sheet music, personal letters, photographs, and early recordings documenting the birth of jazz — are deteriorating. Many exist only in fragile physical form in under-resourced institutions. Without intervention, this irreplaceable record of one of America's most significant cultural contributions will be permanently lost. Standard digitization approaches cannot handle the volume, condition, and diversity of these materials at the scale and speed required.

The problem extends beyond cultural loss. AI systems trained predominantly on digitized, English-language, post-internet text systematically underrepresent the cultural knowledge embedded in physical archives. When these archives disappear, the cultural perspectives they contain are erased not just from human memory but from the training data that shapes how AI understands the world. Preserving endangered archives is both a cultural imperative and an AI equity issue.

The Buddy Bolden Band, New Orleans, ca. 1905. The earliest known jazz photograph. Public domain. — The Buddy Bolden Band, New Orleans, ca. 1905 — the earliest known jazz photograph. Bolden, considered the first jazz musician, was institutionalized in 1907 and never recorded. The archival photographs, sheet music, and personal letters this project preserves are often the only record that survives.

How does AI preserve degraded historical documents?

The project develops AI pipelines combining computer vision, OCR optimized for degraded historical documents, and multimodal analysis to process archival materials that resist conventional digitization. These are not clean, machine-readable texts — they are crumbling newspapers with faded ink, handwritten letters in multiple languages, fragile sheet music with annotations, and early recordings in obsolete formats. Each material type requires specialized processing that off-the-shelf tools cannot provide.

The pipeline moves from raw physical materials through digitization, enhancement, recognition, and semantic analysis to produce structured, searchable archives that preserve the richness and context of the originals. Machine learning models are trained and fine-tuned specifically for the degraded, heterogeneous materials found in New Orleans' cultural collections.

Community-Governed Data Sovereignty

Central to the project's methodology is community-governed data sovereignty: the communities whose heritage is being preserved maintain control over how their archives are accessed, represented, and used. This is not extraction — it is partnership. The people whose histories are contained in these archives determine the terms of preservation, access, and representation.

This approach reflects a broader conviction that AI systems for cultural heritage must be designed in collaboration with the people whose histories they preserve, not imposed from outside. In a city whose cultural contributions have historically been appropriated without credit or compensation, governance is not an afterthought — it is foundational to the work.

Why focus on jazz archives?

Jazz is America's original art form, and New Orleans is where it was born. The city's archives hold the primary sources — newspaper accounts, concert programs, personal correspondence, photographs, sheet music, and recordings — that document this emergence. These materials tell the story of how African American, Creole, and immigrant communities created something unprecedented through musical innovation, cultural exchange, and creative resistance.

Much of this record exists in institutions like the New Orleans Jazz Museum, operating on minimal budgets, in buildings vulnerable to hurricanes and flooding, housed in formats that degrade with each passing year. The urgency is real. What is lost in the next decade cannot be recovered. Coverage of the project has appeared on NPR/WOSU (February 2026).

What makes this approach human-centered?

The Archival Intelligence project is conducted through the Human-Centered AI Lab, a 501(c)(3) nonprofit research organization co-founded by Katherine Elkins and Jon Chun. The Lab's broader research spans AI safety and LLM evaluation (as PIs in the NIST AI Safety Institute Consortium), computational social science, and AI governance.

The archival work draws on a decade of experience integrating AI with humanistic inquiry. Since 2016, Elkins and Chun have mentored over 300 student research projects applying computational methods across the humanities and social sciences, producing work that has been downloaded 95,000+ times from institutions in 198 countries via the Digital Kenyon repository. The conviction that AI is most productive when guided by deep domain expertise — not treated as a discipline-free tool — is central to how the archival project is designed and conducted. The project is funded by a $330,000 Schmidt Sciences HAVI grant, one of 23 teams selected worldwide.

Related work

Tan, Vincent, Elkins, Sahlgren. "If Open Source Is to Win, It Must Go Public." ICML 2025. arXiv:2507.09296. Argues that open-source AI weights alone cannot democratize access — public institutions must govern, host, and maintain AI infrastructure in the public interest.