Database Cache
Overview
The Database (DB) Cache is an alternative cache implementation in Porch designed for larger deployments. It stores both package metadata and repository data in a PostgreSQL database rather than in memory or Kubernetes Custom Resources. This implementation provides better scalability and persistence characteristics for production environments with high package counts.
Key characteristics:
- Alternative implementation: Used when explicitly configured with database connection details
- Database-backed storage: All repository, package, and package revision data stored in PostgreSQL
- External dependency: Requires PostgreSQL database instance
- Suitable for: Large deployments with thousands of packages and package revisions
- Better persistence: Survives Porch server restarts without re-fetching from Git
- Git interaction: Interacts with Git only during approval/publish and sync operations
Implementation Details
Storage Architecture
The DB Cache uses a database-centric storage model:
PostgreSQL Database
│
├─ repositories table
│ └─ Repository metadata
│
├─ packages table
│ └─ Package metadata
│
├─ package_revisions table
│ └─ Package revision metadata
│
└─ package_revision_resources table
└─ Package resources (KRM YAML)
Database Storage:
- Repository connections and metadata
- Package metadata (names, paths)
- Package revision metadata (lifecycle, tasks, upstream locks)
- Package resources (full KRM resource content)
- Timestamps and user tracking for all entities
No In-Memory Cache:
- All data retrieved from database on demand
- No in-memory caching of package revisions
- Database query performance critical for responsiveness
- Relies on PostgreSQL query optimization and indexing
Database Handler
The DB Cache uses a singleton database handler that manages the PostgreSQL connection:
Connection management:
- Single database connection shared across all repositories
- Connection opened during cache initialization
- Connection pooling handled by PostgreSQL driver
- Ping check on connection open to verify connectivity
- Connection closed when cache is shut down
Configuration:
- Driver: PostgreSQL driver (pgx)
- DataSource: Connection string with host, port, database, credentials
- Configured via CacheOptions at Porch server startup
Singleton pattern:
- One DBHandler instance per Porch server
- GetDB() returns the singleton instance
- OpenDB() creates the singleton if not already open
- CloseDB() closes connection and clears singleton
Repository Storage
Each repository is stored in the repositories table with metadata stored as JSON:
Storage approach:
- Repository metadata (namespace, name, spec) persisted in database
- Metadata stored as JSON for flexibility
- Timestamps track last update and user
- Deployment flag indicates repository type
Repository lifecycle:
- OpenRepository checks if repository exists in database
- If not found, creates external repository adapter
- Writes repository metadata to database
- Starts background sync manager
- CloseRepository stops sync and deletes all cached packages
- Removes repository row from database
Package and Package Revision Storage
The DB Cache uses a relational model with four tables:
Relational structure:
- Repositories → Packages (one-to-many)
- Packages → Package Revisions (one-to-many)
- Package Revisions → Resources (one-to-one)
- Foreign key relationships enforce referential integrity
Key storage characteristics:
- Composite primary keys (namespace, name) match Kubernetes naming
- Metadata and specs stored as JSON for flexibility
- Lifecycle state stored as dedicated column for efficient filtering
- Latest revision tracked with boolean flag for query performance
- Resources stored in separate table to reduce row size
Background Synchronization
The DB Cache includes a sync manager for each repository:
Sync process:
- Periodically triggers repository sync (configurable frequency)
- Fetches cached package revisions from database (Published and DeletionProposed only)
- Fetches external package revisions from Git
- Compares cached vs external package revisions
- Deletes package revisions only in cache (removed from Git)
- Caches package revisions only in external repo (new in Git)
- Updates Repository CR condition with sync status
Sync scope:
- Only syncs Published and DeletionProposed package revisions
- Draft and Proposed revisions excluded from sync
- Aligns with database-first approach (drafts don’t exist in Git)
- Reduces sync overhead by ignoring work-in-progress packages
Version tracking:
- Caches external repository version (Git commit SHA)
- Only re-fetches from Git if version changed
- Reuses last external package revision map if version unchanged
- Reduces Git operations during frequent syncs
Change detection:
- Compares package revision keys between cached and external
- Identifies: cached-only, both, external-only
- Cached-only: Deleted from Git, remove from database
- External-only: New in Git, write to database
- Both: Already synchronized, no action needed
Sync statistics:
- Tracks count of cached-only, both, external-only
- Logs sync duration and statistics
- Reports sync errors to Repository CR condition
Concurrency control:
- Per-repository mutex prevents simultaneous syncs
- TryLock pattern: fails fast if sync already in progress
- Prevents database contention and duplicate work
Latest Revision Tracking
The DB Cache tracks the latest package revision:
Identification logic:
- Latest revision determined during sync
- Only considers Published package revisions
- Highest revision number wins
- Stored as boolean flag in
package_revisions.latestcolumn
Database flag:
latest=TRUEset on latest revisionlatest=FALSEon all other revisions- Updated during sync when new revisions added
- Enables efficient queries for latest revisions
Query optimization:
- Can filter by
latest=TRUEin SQL WHERE clause - Avoids scanning all revisions to find latest
- Improves performance for latest revision queries
Key Design Decisions
Relational Database Model
Why a relational model:
- Enforces referential integrity through foreign keys
- Prevents orphaned packages or package revisions
- Enables efficient joins to retrieve related data
- Supports complex filtering at database level
Composite primary keys:
- All tables use (k8s_name_space, k8s_name) as primary key
- Matches Kubernetes resource naming convention
- Enables multi-tenancy with namespace isolation
JSON Storage Strategy
Why JSON for metadata:
- Flexible schema without database migrations
- Stores arbitrary Kubernetes metadata (labels, annotations, etc.)
- Simplifies storage of complex nested structures
- Trade-off: Less efficient queries on JSON fields
What’s stored as JSON:
- Repository, package, and package revision metadata
- Package specs and tasks
- Upstream locks (external package revision IDs)
- Full KRM resource content
Separate Resources Table
Why separate resources:
- Package resources can be large (multiple KRM YAML files)
- Reduces row size in package_revisions table
- Improves query performance when resources not needed
- Allows fetching metadata without loading full content
Latest Revision Flag
Why a boolean flag:
- Pre-computed during sync for performance
- Enables fast filtering:
WHERE latest=TRUE - Avoids scanning all revisions to find latest
- Trade-off: Must be maintained during updates
Query Patterns
SQL joins for related data:
- Single query retrieves package revision with repository context
- Joins avoid multiple round-trips to database
- Filtering at database level reduces data transfer
- Resources fetched separately only when needed
Storage Mechanism
Draft Package Handling
The DB Cache has a database-first approach to draft packages:
Draft lifecycle:
- CreatePackageRevisionDraft creates package revision in database only
- Draft packages stored entirely in PostgreSQL
- All draft modifications update database, not Git
- UpdatePackageRevision operations modify database records
- ClosePackageRevisionDraft saves to database without Git interaction
Git interaction pattern:
- Draft creation: No Git interaction (database only)
- Draft updates: No Git interaction (database only)
- Proposed → Published transition: Pushes to Git repository
- Background sync: Pulls published packages from Git
Implications:
- Draft work isolated in database until approval
- Git repository only contains approved/published packages
- Lower network latency for draft operations
- Git repository can be temporarily unavailable during draft work
- Drafts survive Porch server restarts (persisted in database)
Publish workflow:
- Draft created and modified in database
- Lifecycle transitions: Draft → Proposed (database only)
- Lifecycle transitions: Proposed → Published (triggers Git push)
- Package revision pushed to Git with assigned revision number
- External package revision ID stored in database
- Placeholder package revision created for latest tracking
Cache Invalidation
The DB Cache uses targeted deletion for cache invalidation:
Package deletion:
- Deletes specific package and all its revisions from database
- No cache flush required
- Database foreign key constraints ensure referential integrity
- Orphaned records automatically prevented by database
Implications:
- Efficient deletion without affecting other packages
- No need to rebuild cache after deletion
- Database handles cleanup automatically
- Minimal overhead for targeted deletion
Data Persistence
The DB Cache provides true persistence:
Porch server restart:
- All repository, package, and package revision data survives restart
- No need to re-fetch from Git on startup
- Repositories automatically reconnect on first access
- Background sync resumes after reconnection
Database backup:
- Standard PostgreSQL backup procedures apply
- Point-in-time recovery possible
- Disaster recovery through database restore
- No dependency on Git availability for recovery
Data consistency:
- Database transactions ensure atomic updates
- Foreign key constraints prevent orphaned records
- Referential integrity maintained automatically
- Rollback on error prevents partial updates
Memory Management
The DB Cache has minimal memory footprint:
Memory characteristics:
- No in-memory caching of package revisions
- Only active repository connections in memory
- Database connection pool managed by driver
- Memory usage independent of package count
Scalability:
- Suitable for thousands of repositories
- Tens of thousands of package revisions
- Limited only by database capacity
- Horizontal scaling via database replication
Trade-offs:
- Lower memory usage than CR Cache
- Higher latency due to database queries
- Requires external PostgreSQL instance
- Additional operational complexity
When to Use DB Cache
Use DB Cache when:
- Managing hundreds of repositories
- Thousands of package revisions per repository
- Memory constraints on Porch server
- Need for data persistence across restarts
- Existing PostgreSQL infrastructure available
- Backup and disaster recovery requirements
Use CR Cache when:
- Small to medium deployments
- Prefer Kubernetes-native storage
- No external database dependencies desired
- Lower operational complexity preferred
- etcd capacity sufficient for package metadata