# Archival Methodology *Building a precise digital archive from original publications* --- The archive aims to be a faithful digital representation of Srila Prabhupada's original publications. This means working from the source—the printed books themselves—rather than relying solely on existing digital versions. > **In this article:** Why original sources matter · What we extract · The verification process · How you can help ### Why Original Sources Matter Digital copies accumulate errors over time. Each conversion—from print to early digital, through database migrations, to web exports—can introduce mistakes. By working from original book scans, we can: - Verify text accuracy against the printed page - Preserve the original indexes exactly as published - Capture page references for scholarship - Detect and correct errors in existing digital versions ### What We Extract **Scripture Indexes** The original BBT publications include carefully compiled indexes at the back of each volume. These indexes were created by scholars who read every page, identifying key topics, names, and concepts. We extract these indexes using AI-assisted text recognition, then convert page numbers to verse references. **Verse Content** Each verse is verified against original publications: - Devanagari/Bengali script - Romanized transliteration - Word-by-word synonyms - English translation - Purport (commentary) ### Current Status | Content | Status | |---------|--------| | Bhagavad-gita indexes | Complete | | Srimad-Bhagavatam indexes | Complete (12 cantos) | | Caitanya-caritamrta indexes | Complete (3 divisions) | | Full verse verification | In progress | | Letter transcripts | Imported and cleaned | | Lecture transcripts | Imported with timecodes | ### Future Plans With [[archive/support|additional support]], we aim to: 1. **Complete Full-Text Verification** — Character-by-character comparison with original prints 2. **Page-Accurate Digital Edition** — Original page numbers preserved for citation 3. **Multi-Edition Comparison** — Track differences between print editions --- > [!question] Frequently Asked Questions > > **Q: Why not just use existing digital versions?** > Existing versions have passed through multiple conversions and may contain accumulated errors. Working from originals ensures accuracy. > > **Q: Which editions do you use?** > We prioritize first editions and editions printed during Srila Prabhupada's presence. > > **Q: How accurate is AI text recognition?** > AI recognition is typically 95-99% accurate, but every extraction is verified by human reviewers. Sanskrit diacritics require special attention. > > **Q: Can I help with verification?** > Yes. We welcome volunteers who can compare digital text against book scans. Contact us for more information. > > **Q: How do I report an error?** > Use the feedback link at the bottom of any page. --- > [!abstract]- Technical Implementation Details > > **Scan Processing Pipeline** > > 1. High-resolution book scans (300+ DPI) > 2. Google Cloud Vision API for OCR with Sanskrit/Bengali support > 3. JSON extraction with word positions and confidence scores > 4. Transformation scripts to parse entries and map page numbers > 5. Verification against existing digital versions > > **Page-to-Verse Mapping** > > Original indexes use page numbers, not verse references. We build mapping tables by scanning PDF pages for TEXT and TRANSLATION markers using Claude Haiku (cost-efficient approach that only needs verse boundaries, not full text extraction). > > **Font Conversion** > > Early digital versions used the Balarama font for Sanskrit. We convert to Unicode using a specific mapping order (Ṣ/ṣ must be converted before Ñ/ñ to avoid conflicts). > > **Git-Based Audit Trail** > > All changes tracked with detailed commit messages including source reference, previous reading, and corrected reading. --- *Accuracy is a form of devotion. Every character verified against the original is an offering of service.*