Running E2E Tests on a Masked Production DB — Building a Playwright + Neon Branching Pipeline

You've probably heard the saying, "The test data was too clean to catch the bug." I've lost count of how many times I've passed tests on pristine data generated by seed scripts, only to have things blow up in production. Users with 200-character emails, records with newline characters in the address field, order data with five levels of foreign key relationships — things that seed scripts can never reproduce. Eventually, you arrive at the conclusion that you need to use production data, but then privacy concerns immediately become a roadblock.

This article covers the entire process of building a pipeline that branches a production DB using Copy-on-Write, masks PII, and then automatically runs Playwright E2E tests on it for every PR. With a serverless DB platform like Neon, everything from branching to masking is handled in a near one-stop fashion, and we'll also explore the self-hosted route without Neon. As prerequisites, a basic understanding of GitHub Actions YAML syntax, PostgreSQL fundamentals, and CI/CD pipeline concepts should be sufficient.

Estimated reading time: ~15 minutes. Table of contents: Core Concepts → Hands-on Implementation (Neon Path / Greenmask Path) → Test Writing Patterns → Pros and Cons → Wrap-up

Core Concepts

Database Branching — Cloning a DB Like Git

Database branching is literally the concept of cloning a database the same way you create a Git branch. The key is the Copy-on-Write (CoW) approach, which shares storage instead of physically copying all data, and only writes new data for the parts that change. Even a production DB of hundreds of gigabytes can be branched in seconds, and additional storage costs are incurred only for the changed portions.

Copy-on-Write (CoW): A technique that shares only references to the original data without copying it immediately, and clones only the affected pages when a write occurs. The same principle is used in file systems (Btrfs, ZFS) and container layers.

Starting in 2024, platforms like Neon and Xata began pushing this feature aggressively, and the "Branch-per-PR" workflow has been gaining traction rapidly. During code review, you check the frontend with Vercel Preview and isolate the DB with branches — this pattern has now become a natural flow.

Platform	Branching Method	Supported DB	Features
Neon	Instant CoW-based branching	PostgreSQL	Built-in `postgresql_anonymizer`, official GitHub Actions provided
Xata	CoW-based branching	PostgreSQL	Similar approach to Neon
PlanetScale	Schema branching	MySQL / PostgreSQL	Data requires separate seeding, powerful schema diff feature. PostgreSQL branching support has started but is still in early stages

Data Masking — Preserving Structure While Replacing Sensitive Information

The essence of masking is replacing only the sensitive values with fake ones while keeping the shape of the data intact. Changing "John Doe" to "Jane Smith," user@company.com to fake_7xk@example.com, while leaving foreign key relationships and constraints untouched.

There's an important point here. Since a CoW-branched DB retains the original's foreign key relationships, referential integrity is naturally maintained as long as you don't mask the PK/FK columns themselves. If you don't mask users.id and only mask users.name or users.email, the orders.user_id → users.id relationship is completely unaffected. However, if you have denormalized designs where emails or phone numbers are referenced like FKs, you'll need to apply deterministic masking to ensure that the masked values for those fields match each other.

In the PostgreSQL ecosystem, postgresql_anonymizer (abbreviated as anon) is the de facto standard. What makes this extension appealing is that you can declare masking rules directly in the schema using SQL's SECURITY LABEL syntax.

SECURITY LABEL: This is a standard PostgreSQL SQL syntax, but using it with FOR anon operates through the label provider registered by the postgresql_anonymizer extension. In other words, the anon extension must be installed to use this syntax.

sql

SECURITY LABEL FOR anon ON COLUMN users.name
  IS 'MASKED WITH FUNCTION anon.fake_last_name()';
 
SECURITY LABEL FOR anon ON COLUMN users.email
  IS 'MASKED WITH FUNCTION anon.fake_email()';
 
SECURITY LABEL FOR anon ON COLUMN users.phone
  IS 'MASKED WITH FUNCTION anon.partial(phone, 3, $$***$$, 0)';

Since masking policies are managed as code, they can be version-controlled alongside schema migrations.

There's one more thing worth covering here: deterministic masking. If the same input produces a different result every time, tests become non-reproducible. In postgresql_anonymizer, using pseudo_* family functions like anon.pseudo_first_name(seed) guarantees the same input always produces the same output based on a seed. For external tools, there's @snaplet/copycat maintained by the Supabase community (originally a Snaplet project, now managed in the supabase-community/seed repository after Snaplet shut down).

Deterministic Masking: A method that always returns the same masked result for the same original value. If a particular user's masked email changes with every test run, it becomes impossible to write assertions.

Hands-on Implementation

When a PR is opened → create a CoW branch from the production DB and mask PII → apply schema migrations to the masked branch DB → run Playwright E2E tests in a Preview environment → automatically delete the branch DB when the PR is closed. Let's implement this flow using two different approaches.

Example 1: Neon + Playwright + GitHub Actions — The Fastest Starting Point

When using Neon, official GitHub Actions are provided, making the setup quite concise. I started with this combination as well, and I was impressed that the entire lifecycle could be managed with a single workflow file.

To be honest though, I did hit one snag during the initial setup. After applying the masking rules, all payment-related tests started failing. It turned out that I had applied partial masking to payments.card_number, but a separate validation logic was performing a Luhn check. Since the masked card numbers couldn't pass the Luhn algorithm, everything failed. It took about half a day to identify these business logic integration points in advance.

yaml

name: E2E Tests on Masked Preview DB
on:
  pull_request:
    types: [opened, synchronize, reopened, closed]
 
jobs:
  e2e-test:
    if: github.event.action != 'closed'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
 
      # 1단계: Neon 브랜치 생성 (CoW로 프로덕션 DB 복제)
      - name: Create Neon Branch
        id: create-branch
        uses: neondatabase/create-branch-action@v5
        with:
          project_id: ${{ vars.NEON_PROJECT_ID }}
          api_key: ${{ secrets.NEON_API_KEY }}
          branch_name: preview/pr-${{ github.event.number }}
 
      # 2단계: 마스킹 규칙 적용 후 익명화 실행
      - name: Anonymize PII
        run: |
          psql ${{ steps.create-branch.outputs.db_url }} -f ./sql/masking-rules.sql
          psql ${{ steps.create-branch.outputs.db_url }} -c "SELECT anon.anonymize_database();"
 
      # 3단계: 스키마 마이그레이션 (현재 브랜치의 변경사항 반영)
      - name: Apply Migrations
        run: npx prisma migrate deploy
        env:
          DATABASE_URL: ${{ steps.create-branch.outputs.db_url }}
 
      # 4단계: Preview 환경 배포
      - name: Deploy Preview
        id: deploy-preview
        uses: amondnet/vercel-action@v25
        with:
          vercel-token: ${{ secrets.VERCEL_TOKEN }}
          vercel-org-id: ${{ secrets.VERCEL_ORG_ID }}
          vercel-project-id: ${{ secrets.VERCEL_PROJECT_ID }}
          scope: ${{ secrets.VERCEL_ORG_ID }}
        env:
          DATABASE_URL: ${{ steps.create-branch.outputs.db_url }}
 
      # 5단계: Playwright 설치 및 E2E 테스트 실행
      - name: Install Playwright
        run: npx playwright install --with-deps chromium
 
      - name: Run E2E Tests
        run: npx playwright test
        env:
          DATABASE_URL: ${{ steps.create-branch.outputs.db_url }}
          BASE_URL: ${{ steps.deploy-preview.outputs.preview-url }}
 
      # 6단계: 스키마 diff를 PR 코멘트로 남기기
      - name: Schema Diff
        if: always()
        uses: neondatabase/schema-diff-action@v1
        with:
          project_id: ${{ vars.NEON_PROJECT_ID }}
          api_key: ${{ secrets.NEON_API_KEY }}
 
      - uses: actions/upload-artifact@v4
        if: failure()
        with:
          name: playwright-report
          path: playwright-report/
 
  # PR이 닫히면 브랜치 자동 정리
  cleanup:
    if: github.event.action == 'closed'
    runs-on: ubuntu-latest
    steps:
      - uses: neondatabase/delete-branch-action@v3
        with:
          project_id: ${{ vars.NEON_PROJECT_ID }}
          api_key: ${{ secrets.NEON_API_KEY }}
          branch: preview/pr-${{ github.event.number }}

If you're using Netlify or another Preview deployment platform instead of Vercel, simply replace it with that platform's GitHub Action. The key point is that a Preview deployment step must exist so that the BASE_URL can be passed to the tests.

Step	Role	Approximate Duration
`create-branch-action`	Clone production DB via CoW	~3 seconds
Apply masking rules	Run `anon.anonymize_database()`	Proportional to DB size (~30 seconds for 1GB based on internal team testing)
`prisma migrate deploy`	Apply schema changes from the current PR	~5 seconds
Preview deployment	Provision Preview environment on Vercel/Netlify, etc.	~30 seconds to 1 minute
`playwright test`	Run the full E2E suite	Varies by number of tests
`delete-branch-action`	Clean up branch when PR is closed	~1 second

It's recommended to manage the masking rules file (sql/masking-rules.sql) separately and update it alongside schema changes:

sql

-- sql/masking-rules.sql
CREATE EXTENSION IF NOT EXISTS anon CASCADE;
SELECT anon.init();
 
-- users 테이블
SECURITY LABEL FOR anon ON COLUMN users.name
  IS 'MASKED WITH FUNCTION anon.fake_last_name()';
SECURITY LABEL FOR anon ON COLUMN users.email
  IS 'MASKED WITH FUNCTION anon.fake_email()';
SECURITY LABEL FOR anon ON COLUMN users.phone
  IS 'MASKED WITH FUNCTION anon.partial(phone, 3, $$***$$, 0)';
 
-- orders 테이블 — 배송지 주소 마스킹
SECURITY LABEL FOR anon ON COLUMN orders.shipping_address
  IS 'MASKED WITH FUNCTION anon.fake_city() || $$ $$ || anon.fake_last_name() || $$ Street$$';
 
-- payments 테이블 — 카드번호 부분 마스킹
SECURITY LABEL FOR anon ON COLUMN payments.card_number
  IS 'MASKED WITH FUNCTION anon.partial(card_number, 0, $$****-****-****-$$, 4)';

Example 2: Greenmask + Docker — Building It Yourself Without Neon

If your team isn't using Neon, you can build a similar pipeline yourself using the open-source Greenmask and Docker combination. I also tried this approach on a project using an existing RDS, and while it lacks the instant branching capability, it was practical enough.

One thing I'll mention upfront: I hit a timeout when I first ran CI with the Greenmask approach. The production DB was about 8GB, and while the full dump + masking + restore didn't exceed the default GitHub Actions timeout (6 hours), the runner costs were non-negligible. It only became practical after I switched to selectively dumping only the tables needed for testing.

yaml

name: E2E Tests with Greenmask
on:
  pull_request:
    types: [opened, synchronize, reopened]
 
jobs:
  e2e-test:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_DB: test_db
          POSTGRES_USER: test_user
          POSTGRES_PASSWORD: test_pass
        ports:
          - 5432:5432
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
 
    steps:
      - uses: actions/checkout@v4
 
      # 0단계: Greenmask 설치
      - name: Install Greenmask
        run: |
          curl -sSL https://github.com/GreenmaskIO/greenmask/releases/latest/download/greenmask-linux-amd64 \
            -o /usr/local/bin/greenmask
          chmod +x /usr/local/bin/greenmask
 
      # 1단계: Greenmask로 프로덕션 read replica에서 덤프 + 마스킹 동시 수행
      - name: Dump & Mask Production Data
        run: |
          greenmask dump \
            --source "host=${{ secrets.PROD_REPLICA_HOST }} dbname=prod_db user=readonly" \
            --config greenmask-config.yml \
            --output masked-dump.sql
 
      # 2단계: 마스킹된 데이터를 테스트 컨테이너에 복원
      - name: Restore Masked Data
        run: |
          psql postgresql://test_user:test_pass@localhost:5432/test_db \
            < masked-dump.sql
 
      # 3단계: 마이그레이션 + 테스트
      - name: Apply Migrations & Run Tests
        run: |
          npx prisma migrate deploy
          npx playwright install --with-deps chromium
          npx playwright test
        env:
          DATABASE_URL: postgresql://test_user:test_pass@localhost:5432/test_db

Security Note: In the workflow above, --source points to a read replica, not the production DB. Connecting directly to the production primary from a CI environment is risky from both a security and performance standpoint. It's recommended to use a read replica and restrict network access to that replica via VPN or IP whitelisting.

Greenmask's configuration file allows you to specify deterministic transformers:

yaml

# greenmask-config.yml
tables:
  - name: users
    columns:
      - name: name
        transformer:
          name: RandomPerson
          params:
            seed: 42
      - name: email
        transformer:
          name: RandomEmail
          params:
            seed: 42
      - name: phone
        transformer:
          name: Mask
          params:
            characters_to_reveal: 3
            mask_char: "*"
 
  - name: orders
    columns:
      - name: shipping_address
        transformer:
          name: RandomAddress
          params:
            seed: 42

Comparison Item	Neon Branching Approach	Greenmask + Docker Approach
Clone Speed	Seconds (CoW)	Minutes to tens of minutes (dump/restore)
Additional Infrastructure	Not required (built into Neon)	PostgreSQL container + Greenmask binary
Cost	Neon branch billing	Only CI runner time consumed
DB Platform Constraints	Neon (PostgreSQL) only	Any PostgreSQL-compatible DB
Large-scale DB	Fast regardless of size	Slows down proportionally to dump size
Masking Timing	Separate execution after branching	Simultaneous processing during dump

Test Writing Patterns — Playwright on Masked Data

When writing tests on a masked DB, you need to distinguish between two patterns.

Pattern 1: Shape-based Validation — When you don't know what the masked values are, validate the existence and format of the data:

typescript

// playwright/e2e/user-list.spec.ts
import { test, expect } from '@playwright/test';
 
test.describe('사용자 목록 페이지', () => {
  test('프로덕션 규모의 데이터가 올바르게 페이지네이션 되는지', async ({ page }) => {
    await page.goto('/admin/users');
 
    const totalCount = page.locator('[data-testid="total-count"]');
    await expect(totalCount).toHaveText(/\d{1,3}(,\d{3})*/);
 
    // 이메일 컬럼이 마스킹되어도 이메일 형식은 유지되는지 확인
    const firstEmail = page.locator('table tbody tr:first-child td.email');
    await expect(firstEmail).toContainText('@');
 
    // 페이지네이션이 대규모 데이터에서 정상 동작하는지
    await page.click('[data-testid="next-page"]');
    await expect(page.locator('table tbody tr')).toHaveCount(20);
  });
});

Pattern 2: Leveraging Deterministic Masked Values — When using seed-based masking, you can predict what value a specific piece of original data will be masked to. In this case, using the masked value in assertions is actually the correct approach:

typescript

test('검색 기능이 마스킹된 데이터에서도 동작하는지', async ({ page }) => {
  await page.goto('/admin/users');
 
  // 결정론적 마스킹이므로 시드 42 기준 원본 '홍길동' → 'Smith'로 항상 변환됨
  // 이 값은 마스킹 시드와 원본 데이터가 고정되어 있으므로 안정적
  await page.fill('[data-testid="search-input"]', 'Smith');
  await page.click('[data-testid="search-button"]');
 
  const results = page.locator('table tbody tr');
  await expect(results).not.toHaveCount(0);
});

Key Distinction: If you're using non-deterministic masking, use only Pattern 1 (shape-based). If you're using deterministic masking, you can also leverage Pattern 2 (specific value validation). When mixing both patterns, it's helpful for maintainability to note at the top of the test file which masking strategy is assumed.

Pros and Cons Analysis

Pros

Item	Description
Discovering realistic edge cases	Since the production data's structure, volume, and relationships are preserved, you can catch bugs from NULLs, long strings, and complex relationships that would never appear in seed scripts
Regulatory compliance	Testing with masked PII enables leveraging production-shaped data without violating GDPR/CCPA
Isolated environments	Using an independent DB branch per PR eliminates data contamination between tests at the source
Pre-validating schema migrations	Because migrations are run "on real data," you can catch failures that work fine on an empty DB but break in production
Fast provisioning	CoW-based branching completes within seconds, creating no practical bottleneck in CI pipelines

Cons and Caveats

Item	Description	Mitigation
Risk of masking omissions	Forgetting to add masking rules when adding new columns can expose PII	Include masking rules in the migration PR review checklist, and consider regular scanning with PII detection tools like Microsoft Presidio
Business logic breakage	Email domain-based routing, phone number area code logic, etc. may behave unexpectedly after masking	Apply format-preserving masking for fields used in business logic, or exclude those fields from masking
Cost increase	For teams with many PRs, branch DB compute/storage costs can accumulate	Auto-deletion policy is a must. Setting up a cron job to clean up branches inactive for more than 7 days provides additional safety
Platform lock-in	Deep dependency on Neon branching makes DB migration difficult	Manage masking rules in standard SQL (`SECURITY LABEL`) and abstract the branching logic in CI workflows to reduce switching costs
Large DB bottleneck	For DBs over hundreds of GB, `anonymize_database()` may exceed CI timeouts	A two-stage strategy is effective: periodically refresh a masked baseline branch (e.g., nightly), and branch PR branches from this baseline

Format-Preserving Masking: A technique that changes only the values while maintaining the format (length, character types, pattern) of the original data. For example, changing "010-1234-5678" to "010-9876-5432" preserves the phone number format, so validation logic won't break.

Most Common Mistakes in Practice

Managing masking rules separately from schema migrations — This actually happened on our team: we added an address_detail column to the users table while building an address search feature, but forgot to update the masking rules. Actual addresses were exposed in the test environment for two weeks. Fortunately, it was an internal QA environment so it didn't lead to a major incident, but after that we added a "Verify masking rules" checkbox to the migration PR template.
Using non-deterministic masking functions — Using functions like anon.random_string() that produce different results each time makes tests non-deterministic. When debugging, you can't tell whether it's a code issue or the data that changed. Using anon.pseudo_* family functions or seed-based transformers is much more stable.
Forgetting to delete branches when PRs are closed — Deploying without a cleanup job lets branches pile up and costs snowball. Always include cleanup in the workflow, and as a safety net against potential omissions, it's recommended to also set up a cron job that automatically deletes branches inactive for more than 7 days.

Conclusion

Applying this workflow adds production data shape-based E2E validation to every PR. With Neon CoW branching, DB provisioning takes ~3 seconds and masking takes tens of seconds, allowing integration into existing CI pipelines with no practical bottleneck. If you've been feeling the limitations of seed scripts, this is a way to level up with minimal effort.

Three steps to get started right now:

Start by defining masking rules — Go through your production DB tables, identify columns containing PII, and write masking rules using SECURITY LABEL syntax. Starting with a single sql/masking-rules.sql file is all you need. This step alone is a great opportunity to understand "where PII exists in our data."
Validate locally first — If you're using Neon, you can create a branch with neonctl branches create --name test-mask via the CLI, apply masking rules, and run Playwright tests locally. Even without Neon, you can spin up a PostgreSQL container with Docker and install the postgresql_anonymizer extension for the same experiment.
Deploy to CI — Once local validation is complete, add the GitHub Actions workflow from the examples above as .github/workflows/e2e-preview.yml. Start by writing tests for just 2–3 core user scenarios, then gradually expand coverage — this is a solid strategy for adopting the workflow without failure.

Next post: How to automatically verify the completeness of masking rules — Building a pipeline that catches masking omissions before deployment by integrating PII detection tools with CI

References

Core Resources

Further Reading

Running E2E Tests on a Masked Production DB — Building a Playwright + Neon Branching Pipeline | DEV BAK - 기술블로그

DevOps

Running E2E Tests on a Masked Production DB — Building a Playwright + Neon Branching Pipeline

Estimated reading time: ~15 minutes. Table of contents: Core Concepts → Hands-on Implementation (Neon Path / Greenmask Path) → Test Writing Patterns → Pros and Cons → Wrap-up

Core Concepts

Database Branching — Cloning a DB Like Git

Copy-on-Write (CoW): A technique that shares only references to the original data without copying it immediately, and clones only the affected pages when a write occurs. The same principle is used in file systems (Btrfs, ZFS) and container layers.

Platform	Branching Method	Supported DB	Features
Neon	Instant CoW-based branching	PostgreSQL	Built-in `postgresql_anonymizer`, official GitHub Actions provided
Xata	CoW-based branching	PostgreSQL	Similar approach to Neon
PlanetScale	Schema branching	MySQL / PostgreSQL	Data requires separate seeding, powerful schema diff feature. PostgreSQL branching support has started but is still in early stages

Data Masking — Preserving Structure While Replacing Sensitive Information

SECURITY LABEL: This is a standard PostgreSQL SQL syntax, but using it with FOR anon operates through the label provider registered by the postgresql_anonymizer extension. In other words, the anon extension must be installed to use this syntax.

sql

SECURITY LABEL FOR anon ON COLUMN users.name
  IS 'MASKED WITH FUNCTION anon.fake_last_name()';
 
SECURITY LABEL FOR anon ON COLUMN users.email
  IS 'MASKED WITH FUNCTION anon.fake_email()';
 
SECURITY LABEL FOR anon ON COLUMN users.phone
  IS 'MASKED WITH FUNCTION anon.partial(phone, 3, $$***$$, 0)';

Since masking policies are managed as code, they can be version-controlled alongside schema migrations.

Deterministic Masking: A method that always returns the same masked result for the same original value. If a particular user's masked email changes with every test run, it becomes impossible to write assertions.

Hands-on Implementation

Example 1: Neon + Playwright + GitHub Actions — The Fastest Starting Point

yaml

name: E2E Tests on Masked Preview DB
on:
  pull_request:
    types: [opened, synchronize, reopened, closed]
 
jobs:
  e2e-test:
    if: github.event.action != 'closed'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
 
      # 1단계: Neon 브랜치 생성 (CoW로 프로덕션 DB 복제)
      - name: Create Neon Branch
        id: create-branch
        uses: neondatabase/create-branch-action@v5
        with:
          project_id: ${{ vars.NEON_PROJECT_ID }}
          api_key: ${{ secrets.NEON_API_KEY }}
          branch_name: preview/pr-${{ github.event.number }}
 
      # 2단계: 마스킹 규칙 적용 후 익명화 실행
      - name: Anonymize PII
        run: |
          psql ${{ steps.create-branch.outputs.db_url }} -f ./sql/masking-rules.sql
          psql ${{ steps.create-branch.outputs.db_url }} -c "SELECT anon.anonymize_database();"
 
      # 3단계: 스키마 마이그레이션 (현재 브랜치의 변경사항 반영)
      - name: Apply Migrations
        run: npx prisma migrate deploy
        env:
          DATABASE_URL: ${{ steps.create-branch.outputs.db_url }}
 
      # 4단계: Preview 환경 배포
      - name: Deploy Preview
        id: deploy-preview
        uses: amondnet/vercel-action@v25
        with:
          vercel-token: ${{ secrets.VERCEL_TOKEN }}
          vercel-org-id: ${{ secrets.VERCEL_ORG_ID }}
          vercel-project-id: ${{ secrets.VERCEL_PROJECT_ID }}
          scope: ${{ secrets.VERCEL_ORG_ID }}
        env:
          DATABASE_URL: ${{ steps.create-branch.outputs.db_url }}
 
      # 5단계: Playwright 설치 및 E2E 테스트 실행
      - name: Install Playwright
        run: npx playwright install --with-deps chromium
 
      - name: Run E2E Tests
        run: npx playwright test
        env:
          DATABASE_URL: ${{ steps.create-branch.outputs.db_url }}
          BASE_URL: ${{ steps.deploy-preview.outputs.preview-url }}
 
      # 6단계: 스키마 diff를 PR 코멘트로 남기기
      - name: Schema Diff
        if: always()
        uses: neondatabase/schema-diff-action@v1
        with:
          project_id: ${{ vars.NEON_PROJECT_ID }}
          api_key: ${{ secrets.NEON_API_KEY }}
 
      - uses: actions/upload-artifact@v4
        if: failure()
        with:
          name: playwright-report
          path: playwright-report/
 
  # PR이 닫히면 브랜치 자동 정리
  cleanup:
    if: github.event.action == 'closed'
    runs-on: ubuntu-latest
    steps:
      - uses: neondatabase/delete-branch-action@v3
        with:
          project_id: ${{ vars.NEON_PROJECT_ID }}
          api_key: ${{ secrets.NEON_API_KEY }}
          branch: preview/pr-${{ github.event.number }}

If you're using Netlify or another Preview deployment platform instead of Vercel, simply replace it with that platform's GitHub Action. The key point is that a Preview deployment step must exist so that the BASE_URL can be passed to the tests.

Step	Role	Approximate Duration
`create-branch-action`	Clone production DB via CoW	~3 seconds
Apply masking rules	Run `anon.anonymize_database()`	Proportional to DB size (~30 seconds for 1GB based on internal team testing)
`prisma migrate deploy`	Apply schema changes from the current PR	~5 seconds
Preview deployment	Provision Preview environment on Vercel/Netlify, etc.	~30 seconds to 1 minute
`playwright test`	Run the full E2E suite	Varies by number of tests
`delete-branch-action`	Clean up branch when PR is closed	~1 second

It's recommended to manage the masking rules file (sql/masking-rules.sql) separately and update it alongside schema changes:

sql

-- sql/masking-rules.sql
CREATE EXTENSION IF NOT EXISTS anon CASCADE;
SELECT anon.init();
 
-- users 테이블
SECURITY LABEL FOR anon ON COLUMN users.name
  IS 'MASKED WITH FUNCTION anon.fake_last_name()';
SECURITY LABEL FOR anon ON COLUMN users.email
  IS 'MASKED WITH FUNCTION anon.fake_email()';
SECURITY LABEL FOR anon ON COLUMN users.phone
  IS 'MASKED WITH FUNCTION anon.partial(phone, 3, $$***$$, 0)';
 
-- orders 테이블 — 배송지 주소 마스킹
SECURITY LABEL FOR anon ON COLUMN orders.shipping_address
  IS 'MASKED WITH FUNCTION anon.fake_city() || $$ $$ || anon.fake_last_name() || $$ Street$$';
 
-- payments 테이블 — 카드번호 부분 마스킹
SECURITY LABEL FOR anon ON COLUMN payments.card_number
  IS 'MASKED WITH FUNCTION anon.partial(card_number, 0, $$****-****-****-$$, 4)';

Example 2: Greenmask + Docker — Building It Yourself Without Neon

yaml

name: E2E Tests with Greenmask
on:
  pull_request:
    types: [opened, synchronize, reopened]
 
jobs:
  e2e-test:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_DB: test_db
          POSTGRES_USER: test_user
          POSTGRES_PASSWORD: test_pass
        ports:
          - 5432:5432
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
 
    steps:
      - uses: actions/checkout@v4
 
      # 0단계: Greenmask 설치
      - name: Install Greenmask
        run: |
          curl -sSL https://github.com/GreenmaskIO/greenmask/releases/latest/download/greenmask-linux-amd64 \
            -o /usr/local/bin/greenmask
          chmod +x /usr/local/bin/greenmask
 
      # 1단계: Greenmask로 프로덕션 read replica에서 덤프 + 마스킹 동시 수행
      - name: Dump & Mask Production Data
        run: |
          greenmask dump \
            --source "host=${{ secrets.PROD_REPLICA_HOST }} dbname=prod_db user=readonly" \
            --config greenmask-config.yml \
            --output masked-dump.sql
 
      # 2단계: 마스킹된 데이터를 테스트 컨테이너에 복원
      - name: Restore Masked Data
        run: |
          psql postgresql://test_user:test_pass@localhost:5432/test_db \
            < masked-dump.sql
 
      # 3단계: 마이그레이션 + 테스트
      - name: Apply Migrations & Run Tests
        run: |
          npx prisma migrate deploy
          npx playwright install --with-deps chromium
          npx playwright test
        env:
          DATABASE_URL: postgresql://test_user:test_pass@localhost:5432/test_db

Security Note: In the workflow above, --source points to a read replica, not the production DB. Connecting directly to the production primary from a CI environment is risky from both a security and performance standpoint. It's recommended to use a read replica and restrict network access to that replica via VPN or IP whitelisting.

Greenmask's configuration file allows you to specify deterministic transformers:

yaml

# greenmask-config.yml
tables:
  - name: users
    columns:
      - name: name
        transformer:
          name: RandomPerson
          params:
            seed: 42
      - name: email
        transformer:
          name: RandomEmail
          params:
            seed: 42
      - name: phone
        transformer:
          name: Mask
          params:
            characters_to_reveal: 3
            mask_char: "*"
 
  - name: orders
    columns:
      - name: shipping_address
        transformer:
          name: RandomAddress
          params:
            seed: 42

Comparison Item	Neon Branching Approach	Greenmask + Docker Approach
Clone Speed	Seconds (CoW)	Minutes to tens of minutes (dump/restore)
Additional Infrastructure	Not required (built into Neon)	PostgreSQL container + Greenmask binary
Cost	Neon branch billing	Only CI runner time consumed
DB Platform Constraints	Neon (PostgreSQL) only	Any PostgreSQL-compatible DB
Large-scale DB	Fast regardless of size	Slows down proportionally to dump size
Masking Timing	Separate execution after branching	Simultaneous processing during dump

Test Writing Patterns — Playwright on Masked Data

When writing tests on a masked DB, you need to distinguish between two patterns.

Pattern 1: Shape-based Validation — When you don't know what the masked values are, validate the existence and format of the data:

typescript

// playwright/e2e/user-list.spec.ts
import { test, expect } from '@playwright/test';
 
test.describe('사용자 목록 페이지', () => {
  test('프로덕션 규모의 데이터가 올바르게 페이지네이션 되는지', async ({ page }) => {
    await page.goto('/admin/users');
 
    const totalCount = page.locator('[data-testid="total-count"]');
    await expect(totalCount).toHaveText(/\d{1,3}(,\d{3})*/);
 
    // 이메일 컬럼이 마스킹되어도 이메일 형식은 유지되는지 확인
    const firstEmail = page.locator('table tbody tr:first-child td.email');
    await expect(firstEmail).toContainText('@');
 
    // 페이지네이션이 대규모 데이터에서 정상 동작하는지
    await page.click('[data-testid="next-page"]');
    await expect(page.locator('table tbody tr')).toHaveCount(20);
  });
});

typescript

test('검색 기능이 마스킹된 데이터에서도 동작하는지', async ({ page }) => {
  await page.goto('/admin/users');
 
  // 결정론적 마스킹이므로 시드 42 기준 원본 '홍길동' → 'Smith'로 항상 변환됨
  // 이 값은 마스킹 시드와 원본 데이터가 고정되어 있으므로 안정적
  await page.fill('[data-testid="search-input"]', 'Smith');
  await page.click('[data-testid="search-button"]');
 
  const results = page.locator('table tbody tr');
  await expect(results).not.toHaveCount(0);
});

Key Distinction: If you're using non-deterministic masking, use only Pattern 1 (shape-based). If you're using deterministic masking, you can also leverage Pattern 2 (specific value validation). When mixing both patterns, it's helpful for maintainability to note at the top of the test file which masking strategy is assumed.

Pros and Cons Analysis

Pros

Item	Description
Discovering realistic edge cases	Since the production data's structure, volume, and relationships are preserved, you can catch bugs from NULLs, long strings, and complex relationships that would never appear in seed scripts
Regulatory compliance	Testing with masked PII enables leveraging production-shaped data without violating GDPR/CCPA
Isolated environments	Using an independent DB branch per PR eliminates data contamination between tests at the source
Pre-validating schema migrations	Because migrations are run "on real data," you can catch failures that work fine on an empty DB but break in production
Fast provisioning	CoW-based branching completes within seconds, creating no practical bottleneck in CI pipelines

Cons and Caveats

Item	Description	Mitigation
Risk of masking omissions	Forgetting to add masking rules when adding new columns can expose PII	Include masking rules in the migration PR review checklist, and consider regular scanning with PII detection tools like Microsoft Presidio
Business logic breakage	Email domain-based routing, phone number area code logic, etc. may behave unexpectedly after masking	Apply format-preserving masking for fields used in business logic, or exclude those fields from masking
Cost increase	For teams with many PRs, branch DB compute/storage costs can accumulate	Auto-deletion policy is a must. Setting up a cron job to clean up branches inactive for more than 7 days provides additional safety
Platform lock-in	Deep dependency on Neon branching makes DB migration difficult	Manage masking rules in standard SQL (`SECURITY LABEL`) and abstract the branching logic in CI workflows to reduce switching costs
Large DB bottleneck	For DBs over hundreds of GB, `anonymize_database()` may exceed CI timeouts	A two-stage strategy is effective: periodically refresh a masked baseline branch (e.g., nightly), and branch PR branches from this baseline

Format-Preserving Masking: A technique that changes only the values while maintaining the format (length, character types, pattern) of the original data. For example, changing "010-1234-5678" to "010-9876-5432" preserves the phone number format, so validation logic won't break.

Most Common Mistakes in Practice

Managing masking rules separately from schema migrations — This actually happened on our team: we added an address_detail column to the users table while building an address search feature, but forgot to update the masking rules. Actual addresses were exposed in the test environment for two weeks. Fortunately, it was an internal QA environment so it didn't lead to a major incident, but after that we added a "Verify masking rules" checkbox to the migration PR template.
Using non-deterministic masking functions — Using functions like anon.random_string() that produce different results each time makes tests non-deterministic. When debugging, you can't tell whether it's a code issue or the data that changed. Using anon.pseudo_* family functions or seed-based transformers is much more stable.
Forgetting to delete branches when PRs are closed — Deploying without a cleanup job lets branches pile up and costs snowball. Always include cleanup in the workflow, and as a safety net against potential omissions, it's recommended to also set up a cron job that automatically deletes branches inactive for more than 7 days.

Conclusion

Three steps to get started right now:

Start by defining masking rules — Go through your production DB tables, identify columns containing PII, and write masking rules using SECURITY LABEL syntax. Starting with a single sql/masking-rules.sql file is all you need. This step alone is a great opportunity to understand "where PII exists in our data."
Validate locally first — If you're using Neon, you can create a branch with neonctl branches create --name test-mask via the CLI, apply masking rules, and run Playwright tests locally. Even without Neon, you can spin up a PostgreSQL container with Docker and install the postgresql_anonymizer extension for the same experiment.
Deploy to CI — Once local validation is complete, add the GitHub Actions workflow from the examples above as .github/workflows/e2e-preview.yml. Start by writing tests for just 2–3 core user scenarios, then gradually expand coverage — this is a solid strategy for adopting the workflow without failure.

Next post: How to automatically verify the completeness of masking rules — Building a pipeline that catches masking omissions before deployment by integrating PII detection tools with CI

References

Core Resources

Further Reading

Core Concepts

Database Branching — Cloning a DB Like Git

Data Masking — Preserving Structure While Replacing Sensitive Information

Hands-on Implementation

Example 1: Neon + Playwright + GitHub Actions — The Fastest Starting Point

Example 2: Greenmask + Docker — Building It Yourself Without Neon

Test Writing Patterns — Playwright on Masked Data

Pros and Cons Analysis

Pros

Cons and Caveats

Most Common Mistakes in Practice

Conclusion

References

Core Concepts

Database Branching — Cloning a DB Like Git

Data Masking — Preserving Structure While Replacing Sensitive Information

Hands-on Implementation

Example 1: Neon + Playwright + GitHub Actions — The Fastest Starting Point

Example 2: Greenmask + Docker — Building It Yourself Without Neon

Test Writing Patterns — Playwright on Masked Data

Pros and Cons Analysis

Pros

Cons and Caveats

Most Common Mistakes in Practice

Conclusion

References

Recommended Posts

Catching Missing PII Masking Automatically Before Deployment

Building a Korean PII Detection Pipeline with Presidio + spaCy

Enhancing Korean PII Detection with Presidio + KLUE-BERT — A Practical Guide Beyond the Limits of spaCy NER

How to Safely Serve Production Data to Preview Environments

How to Safely Use a Shared DB in Preview Environments — Per-PR Schema Isolation and Seed Data Automation

Automating PR Preview Environments with ArgoCD ApplicationSet