Caching Behavior

Detailed architecture of cache population, structure, and consistency mechanisms.

Overview

The Package Cache optimizes performance by storing repository data in memory (CR Cache) or database (DB Cache) to avoid redundant Git operations. The caching system uses lazy loading, version-based refresh, and concurrency control to balance performance with data freshness.

High-Level Architecture

┌─────────────────────────────────────────────────────────┐
│                    Caching System                       │
│                                                         │
│  ┌──────────────┐      ┌──────────────┐      ┌──────┐   │
│  │   Cache      │      │   Version    │      │ Git  │   │
│  │  Population  │ ───> │   Tracking   │ ───> │ Repo │   │
│  │              │      │              │      │      │   │
│  │ • Lazy Load  │      │ • Compare    │      │      │   │
│  │ • Refresh    │      │ • Refresh    │      │      │   │
│  └──────────────┘      └──────────────┘      └──────┘   │
│         │                      │                        │
│         └──────────┬───────────┘                        │
│                    ↓                                    │
│         ┌──────────────────┐                            │
│         │  Cache Structure │                            │
│         │                  │                            │
│         │  • Maps          │                            │
│         │  • Mutex         │                            │
│         │  • Consistency   │                            │
│         └──────────────────┘                            │
└─────────────────────────────────────────────────────────┘

Cache Population

The cache uses lazy loading and version-based refresh to minimize Git operations:

Initial Population

CaDEngine Request
        ↓
  OpenRepository
        ↓
   Cache Empty? ──No──> Return Cached Data
        │
       Yes
        ↓
  Fetch from Git
        ↓
  Build Cache Maps
        ↓
  Store in Cache
        ↓
  Return Data

Process:

Repository opened on first access from CaDEngine
Cache initially empty (lazy loading strategy)
First operation triggers fetch from Git repository
All package revisions loaded into cache
Subsequent operations served from cached data

Benefits:

No upfront cost for unused repositories
Memory allocated only for accessed repositories
Faster startup time for Porch server

Version-Based Refresh

Operation Request
        ↓
  Check Cache Version
        ↓
  Fetch Git Version
        ↓
  Versions Match? ──Yes──> Serve from Cache
        │
        No
        ↓
  Fetch from Git
        ↓
  Update Cache
        ↓
  Update Version
        ↓
  Serve Data

Version tracking:

Repository version (Git commit SHA) cached after each fetch
Version compared before serving data
If version unchanged, skip Git fetch (cache hit)
If version changed, refresh cache (cache miss)

Optimization:

Avoids expensive Git operations when repository unchanged
Ensures cache reflects current Git state
Balances freshness with performance

Force Refresh

Explicit refresh:

Operations can request force refresh (bypass version check)
Triggers immediate fetch from Git
Updates cache with latest state
Used when stale data suspected or after errors

Refresh triggers:

User-driven one-time sync using porchctl repo sync or Repository CR spec.sync.runOnceAt
Background sync operations
Version mismatch detection
Recovery from sync errors

Cache Structure

The cache maintains structured data for fast lookups and efficient operations:

CR Cache Structure

Cached Repository
    │
    ├─ Repository Metadata
    │   ├─ Key (namespace, name)
    │   ├─ Spec (Repository CR)
    │   └─ Last Version (Git SHA)
    │
    ├─ Package Revisions Map
    │   └─ PackageRevisionKey → CachedPackageRevision
    │       ├─ PackageRevision object
    │       ├─ Metadata store reference
    │       └─ isLatestRevision flag
    │
    ├─ Packages Map
    │   └─ PackageKey → CachedPackage
    │       ├─ Package object
    │       └─ Latest revision reference
    │
    └─ Concurrency Control
        └─ Read-Write Mutex

Data structures:

Package revisions map: PackageRevisionKey → PackageRevision
Packages map: PackageKey → Package
Repository version: Last known Git commit SHA
Latest revision flags: Boolean per package revision

Memory characteristics:

Grows with number of package revisions
Full repository content cached in memory
No automatic eviction (persists until repository closed)
Suitable for hundreds of repositories, thousands of revisions

DB Cache Structure

PostgreSQL Database
    │
    ├─ repositories table
    │   └─ Repository metadata (JSON)
    │
    ├─ packages table
    │   └─ Package metadata (JSON)
    │
    ├─ package_revisions table
    │   ├─ Metadata (JSON)
    │   ├─ Lifecycle (column)
    │   └─ Latest flag (boolean)
    │
    └─ package_revision_resources table
        └─ KRM resources (JSON)

Data structures:

Relational tables: Repositories → Packages → Revisions → Resources
Foreign keys: Enforce referential integrity
Indexes: Optimize queries on namespace, name, lifecycle, latest
JSON columns: Store flexible metadata and specs

Memory characteristics:

Minimal in-memory footprint
Data retrieved from database on demand
Suitable for thousands of repositories, tens of thousands of revisions
Limited only by database capacity

Concurrency Control

CR Cache locking:

Read Operation          Write Operation
      ↓                       ↓
  RLock()                 Lock()
      ↓                       ↓
  Read Data              Modify Data
      ↓                       ↓
  RUnlock()              Unlock()

Locking strategy:

Read-write mutex protects cache maps
Read operations acquire read lock (concurrent reads allowed)
Write operations acquire write lock (exclusive access)
Lock-free reads when cache populated and version unchanged

DB Cache locking:

Per-repository mutex prevents simultaneous syncs
Database transactions ensure atomic updates
TryLock pattern fails fast if operation already in progress

Cache Consistency

The cache maintains consistency with external Git repositories through multiple mechanisms:

Version-Based Consistency

Cache State              Git Repository
    ↓                          ↓
Version: abc123          Version: abc123
    ↓                          ↓
    └────── Compare ───────────┘
              ↓
         Match Found
              ↓
      Serve from Cache
      (No Git Access)

Consistency mechanism:

Repository version checked before operations
Cache refreshed when version mismatch detected
Ensures cache reflects current Git state
Prevents serving stale data

Version update triggers:

Background sync operations
Explicit refresh requests
Package revision creation/update/delete
Repository reconnection after errors

Optimistic Locking

Client Update Request
        ↓
  Resource Version: v1
        ↓
  Cache Check
        ↓
  Current Version: v1? ──No──> Conflict Error
        │
       Yes
        ↓
  Apply Update
        ↓
  Increment Version: v2
        ↓
  Return Success

Locking mechanism:

Package revisions include Kubernetes resource version
Updates require matching resource version
Prevents lost updates from concurrent modifications
Client must re-read and retry on conflict

Conflict resolution:

Client receives conflict error
Client re-reads latest version
Client reapplies changes
Client retries update with new version

Metadata Synchronization

CR Cache metadata:

PackageRev CRs store Kubernetes metadata (labels, annotations, finalizers)
Metadata kept in sync with package revisions
Orphaned metadata cleaned up during sync
Missing metadata created during sync

DB Cache metadata:

Database records store metadata as JSON
Metadata updated atomically with package revisions
Foreign key constraints prevent orphaned records
Database transactions ensure consistency

Error Handling

Sync error behavior:

Sync Operation
      ↓
   Error? ──No──> Update Cache
      │
     Yes
      ↓
  Log Error
      ↓
  Update Condition
      ↓
  Keep Stale Cache
      ↓
  Retry Next Cycle

Error handling strategy:

Sync errors stored and reported in Repository condition
Failed syncs retried on next sync interval
Cache remains available with stale data during failures
Operations continue with warning about staleness

Performance Optimization

The cache employs several strategies to optimize performance:

Lock-Free Reads

Read optimization:

Cache version checked without lock
If version matches, serve data without Git access
Read lock acquired only when accessing cache maps
Multiple concurrent reads allowed

Performance impact:

Eliminates Git latency for cache hits
Enables high read throughput
Scales with number of concurrent clients

Lazy Loading

Loading strategy:

Repositories loaded on first access
Package revisions fetched on demand
No upfront cost for unused repositories
Memory allocated incrementally

Benefits:

Faster Porch server startup
Lower memory footprint for unused repositories
Scales to large numbers of repositories

Efficient Data Structures

Map-based lookups:

O(1) lookup time for package revisions by key
O(1) lookup time for packages by key
Efficient filtering using map iteration
No linear scans required

Latest revision tracking:

Pre-computed during sync
Boolean flag for fast filtering
Avoids scanning all revisions to find latest
Updated incrementally on changes

Background Sync

Async synchronization:

Foreground Operations    Background Sync
        ↓                       ↓
  Serve from Cache        Periodic Sync
        ↓                       ↓
  No Blocking            Update Cache
        ↓                       ↓
  Fast Response          Notify Changes

Benefits:

Operations don’t block on sync
Cache updated asynchronously
Clients notified of changes via watch
Balances freshness with responsiveness

Database Query Optimization (DB Cache)

Query strategies:

Indexes on frequently queried columns (namespace, name, lifecycle, latest)
SQL joins to retrieve related data in single query
Filtering at database level reduces data transfer
Resources fetched separately only when needed

Performance characteristics:

Fast metadata queries (indexed columns)
Efficient filtering (database-level WHERE clauses)
Reduced network overhead (single query for related data)
Scalable to large package counts

Cache Lifecycle

Repository Opening

OpenRepository Request
        ↓
  Check if Cached
        ↓
  Already Open? ──Yes──> Return Cached
        │
        No
        ↓
  Create Adapter
        ↓
  Wrap in Cache
        ↓
  Start SyncManager
        ↓
  Store in Cache
        ↓
  Return Repository

Repository Closing

CloseRepository Request
        ↓
  Stop SyncManager
        ↓
  Delete Metadata
        ↓
  Send Delete Events
        ↓
  Close Adapter
        ↓
  Remove from Cache
        ↓
  Complete

Cleanup process:

SyncManager stopped (goroutines cancelled)
Metadata resources deleted (PackageRev CRs or DB records)
Delete notifications sent to watchers
Underlying repository adapter closed
Cache entry removed from map