Biotech: Clinical Data Pipeline with 21 CFR Part 11 Compliance
A clinical-stage biotech company needed a validated data pipeline for processing trial results before submitting to the FDA. Their existing pipeline was a mix of Python scripts and Excel files with no audit trail. We rebuilt it with full 21 CFR Part 11 compliance in 10 weeks.
The Challenge
21 CFR Part 11 requires electronic records systems used in FDA submissions to maintain tamper-evident audit logs, enforce access controls, and ensure data integrity through the full pipeline. The existing pipeline had none of these properties. Trial data moved through shared S3 buckets, was processed by Python scripts with no versioning, and the outputs were manually reviewed in Excel. The FDA would not accept submissions from this system.
The Approach
We redesigned the pipeline with compliance constraints as first-class requirements: every data transformation must be versioned and reproducible, every access must be logged and attributable, and every record must have a tamper-evident hash. We used AWS as the underlying infrastructure and purpose-built the compliance controls on top.
The Implementation
Immutable data storage with S3 Object Lock
All raw trial data lands in an S3 bucket with Object Lock in Compliance mode - objects cannot be deleted or modified by anyone, including root. Each file is SHA-256 hashed on ingest and the hash stored in DynamoDB. Any modification attempt is detectable and logged.
Versioned, containerised processing pipeline
Each data transformation step runs as a Docker container with a pinned image digest. Apache Airflow orchestrates the pipeline. Every run records: input file hash, processing container image digest, parameters, output file hash, run timestamp, and operator ID. Runs are immutable records.
Audit logging and electronic signature
We implemented an electronic signature workflow using AWS Cognito with MFA for the final data review step. Cognito tokens are logged to CloudTrail with the record being signed. The signature, timestamp, and reviewer identity form part of the submission package.
Validation documentation (IQ/OQ/PQ)
We produced the Installation Qualification, Operational Qualification, and Performance Qualification documents required for FDA validation. Each document references the specific infrastructure components and test evidence, and is stored in a document-controlled system.
Key Takeaways
- 21 CFR Part 11 compliance is primarily about immutability and attribution - AWS S3 Object Lock handles immutability without custom code
- Pinned container image digests (not tags) are the only way to guarantee processing reproducibility across runs
- IQ/OQ/PQ documentation takes as long as the technical implementation - budget equal time for both
- Starting with compliant data storage is non-negotiable - retrofitting audit trails onto an existing pipeline is 5× the effort
Facing Similar Challenges?
Book a free 30-minute audit and I will tell you what I see.