Running E2E Tests on a Masked Production DB — Building a Playwright + Neon Branching Pipeline
You've probably heard the saying, "The test data was too clean to catch the bug." I've lost count of how many times I've passed tests on pristine data generated by seed scripts, only to have things blow up in production. Users with 200-character emails, records with newline characters in the address field, order data with five levels of foreign key relationships — things that seed scripts can never reproduce. Eventually, you arrive at the conclusion that you need to use production data, but then privacy concerns immediately become a roadblock.
This article covers the entire process of building a pipeline that branches a production DB using Copy-on-Write, masks PII, and then automatically runs Playwright E2E tests on it for every PR. With a serverless DB platform like Neon, everything from branching to masking is handled in a near one-stop fashion, and we'll also explore the self-hosted route without Neon. As prerequisites, a basic understanding of GitHub Actions YAML syntax, PostgreSQL fundamentals, and CI/CD pipeline concepts should be sufficient.
Estimated reading time: ~15 minutes. Table of contents: Core Concepts → Hands-on Implementation (Neon Path / Greenmask Path) → Test Writing Patterns → Pros and Cons → Wrap-up
Core Concepts
Database Branching — Cloning a DB Like Git
Database branching is literally the concept of cloning a database the same way you create a Git branch. The key is the Copy-on-Write (CoW) approach, which shares storage instead of physically copying all data, and only writes new data for the parts that change. Even a production DB of hundreds of gigabytes can be branched in seconds, and additional storage costs are incurred only for the changed portions.
Copy-on-Write (CoW): A technique that shares only references to the original data without copying it immediately, and clones only the affected pages when a write occurs. The same principle is used in file systems (Btrfs, ZFS) and container layers.
Starting in 2024, platforms like Neon and Xata began pushing this feature aggressively, and the "Branch-per-PR" workflow has been gaining traction rapidly. During code review, you check the frontend with Vercel Preview and isolate the DB with branches — this pattern has now become a natural flow.
| Platform | Branching Method | Supported DB | Features |
|---|---|---|---|
| Neon | Instant CoW-based branching | PostgreSQL | Built-in postgresql_anonymizer, official GitHub Actions provided |
| Xata | CoW-based branching | PostgreSQL | Similar approach to Neon |
| PlanetScale | Schema branching | MySQL / PostgreSQL | Data requires separate seeding, powerful schema diff feature. PostgreSQL branching support has started but is still in early stages |
Data Masking — Preserving Structure While Replacing Sensitive Information
The essence of masking is replacing only the sensitive values with fake ones while keeping the shape of the data intact. Changing "John Doe" to "Jane Smith," user@company.com to fake_7xk@example.com, while leaving foreign key relationships and constraints untouched.
There's an important point here. Since a CoW-branched DB retains the original's foreign key relationships, referential integrity is naturally maintained as long as you don't mask the PK/FK columns themselves. If you don't mask users.id and only mask users.name or users.email, the orders.user_id → users.id relationship is completely unaffected. However, if you have denormalized designs where emails or phone numbers are referenced like FKs, you'll need to apply deterministic masking to ensure that the masked values for those fields match each other.
In the PostgreSQL ecosystem, postgresql_anonymizer (abbreviated as anon) is the de facto standard. What makes this extension appealing is that you can declare masking rules directly in the schema using SQL's SECURITY LABEL syntax.
SECURITY LABEL: This is a standard PostgreSQL SQL syntax, but using it withFOR anonoperates through the label provider registered by thepostgresql_anonymizerextension. In other words, theanonextension must be installed to use this syntax.
SECURITY LABEL FOR anon ON COLUMN users.name
IS 'MASKED WITH FUNCTION anon.fake_last_name()';
SECURITY LABEL FOR anon ON COLUMN users.email
IS 'MASKED WITH FUNCTION anon.fake_email()';
SECURITY LABEL FOR anon ON COLUMN users.phone
IS 'MASKED WITH FUNCTION anon.partial(phone, 3, $$***$$, 0)';Since masking policies are managed as code, they can be version-controlled alongside schema migrations.
There's one more thing worth covering here: deterministic masking. If the same input produces a different result every time, tests become non-reproducible. In postgresql_anonymizer, using pseudo_* family functions like anon.pseudo_first_name(seed) guarantees the same input always produces the same output based on a seed. For external tools, there's @snaplet/copycat maintained by the Supabase community (originally a Snaplet project, now managed in the supabase-community/seed repository after Snaplet shut down).
Deterministic Masking: A method that always returns the same masked result for the same original value. If a particular user's masked email changes with every test run, it becomes impossible to write assertions.
Hands-on Implementation
When a PR is opened → create a CoW branch from the production DB and mask PII → apply schema migrations to the masked branch DB → run Playwright E2E tests in a Preview environment → automatically delete the branch DB when the PR is closed. Let's implement this flow using two different approaches.
Example 1: Neon + Playwright + GitHub Actions — The Fastest Starting Point
When using Neon, official GitHub Actions are provided, making the setup quite concise. I started with this combination as well, and I was impressed that the entire lifecycle could be managed with a single workflow file.
To be honest though, I did hit one snag during the initial setup. After applying the masking rules, all payment-related tests started failing. It turned out that I had applied partial masking to payments.card_number, but a separate validation logic was performing a Luhn check. Since the masked card numbers couldn't pass the Luhn algorithm, everything failed. It took about half a day to identify these business logic integration points in advance.
name: E2E Tests on Masked Preview DB
on:
pull_request:
types: [opened, synchronize, reopened, closed]
jobs:
e2e-test:
if: github.event.action != 'closed'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
# 1단계: Neon 브랜치 생성 (CoW로 프로덕션 DB 복제)
- name: Create Neon Branch
id: create-branch
uses: neondatabase/create-branch-action@v5
with:
project_id: ${{ vars.NEON_PROJECT_ID }}
api_key: ${{ secrets.NEON_API_KEY }}
branch_name: preview/pr-${{ github.event.number }}
# 2단계: 마스킹 규칙 적용 후 익명화 실행
- name: Anonymize PII
run: |
psql ${{ steps.create-branch.outputs.db_url }} -f ./sql/masking-rules.sql
psql ${{ steps.create-branch.outputs.db_url }} -c "SELECT anon.anonymize_database();"
# 3단계: 스키마 마이그레이션 (현재 브랜치의 변경사항 반영)
- name: Apply Migrations
run: npx prisma migrate deploy
env:
DATABASE_URL: ${{ steps.create-branch.outputs.db_url }}
# 4단계: Preview 환경 배포
- name: Deploy Preview
id: deploy-preview
uses: amondnet/vercel-action@v25
with:
vercel-token: ${{ secrets.VERCEL_TOKEN }}
vercel-org-id: ${{ secrets.VERCEL_ORG_ID }}
vercel-project-id: ${{ secrets.VERCEL_PROJECT_ID }}
scope: ${{ secrets.VERCEL_ORG_ID }}
env:
DATABASE_URL: ${{ steps.create-branch.outputs.db_url }}
# 5단계: Playwright 설치 및 E2E 테스트 실행
- name: Install Playwright
run: npx playwright install --with-deps chromium
- name: Run E2E Tests
run: npx playwright test
env:
DATABASE_URL: ${{ steps.create-branch.outputs.db_url }}
BASE_URL: ${{ steps.deploy-preview.outputs.preview-url }}
# 6단계: 스키마 diff를 PR 코멘트로 남기기
- name: Schema Diff
if: always()
uses: neondatabase/schema-diff-action@v1
with:
project_id: ${{ vars.NEON_PROJECT_ID }}
api_key: ${{ secrets.NEON_API_KEY }}
- uses: actions/upload-artifact@v4
if: failure()
with:
name: playwright-report
path: playwright-report/
# PR이 닫히면 브랜치 자동 정리
cleanup:
if: github.event.action == 'closed'
runs-on: ubuntu-latest
steps:
- uses: neondatabase/delete-branch-action@v3
with:
project_id: ${{ vars.NEON_PROJECT_ID }}
api_key: ${{ secrets.NEON_API_KEY }}
branch: preview/pr-${{ github.event.number }}If you're using Netlify or another Preview deployment platform instead of Vercel, simply replace it with that platform's GitHub Action. The key point is that a Preview deployment step must exist so that the
BASE_URLcan be passed to the tests.
| Step | Role | Approximate Duration |
|---|---|---|
create-branch-action |
Clone production DB via CoW | ~3 seconds |
| Apply masking rules | Run anon.anonymize_database() |
Proportional to DB size (~30 seconds for 1GB based on internal team testing) |
prisma migrate deploy |
Apply schema changes from the current PR | ~5 seconds |
| Preview deployment | Provision Preview environment on Vercel/Netlify, etc. | ~30 seconds to 1 minute |
playwright test |
Run the full E2E suite | Varies by number of tests |
delete-branch-action |
Clean up branch when PR is closed | ~1 second |
It's recommended to manage the masking rules file (sql/masking-rules.sql) separately and update it alongside schema changes:
-- sql/masking-rules.sql
CREATE EXTENSION IF NOT EXISTS anon CASCADE;
SELECT anon.init();
-- users 테이블
SECURITY LABEL FOR anon ON COLUMN users.name
IS 'MASKED WITH FUNCTION anon.fake_last_name()';
SECURITY LABEL FOR anon ON COLUMN users.email
IS 'MASKED WITH FUNCTION anon.fake_email()';
SECURITY LABEL FOR anon ON COLUMN users.phone
IS 'MASKED WITH FUNCTION anon.partial(phone, 3, $$***$$, 0)';
-- orders 테이블 — 배송지 주소 마스킹
SECURITY LABEL FOR anon ON COLUMN orders.shipping_address
IS 'MASKED WITH FUNCTION anon.fake_city() || $$ $$ || anon.fake_last_name() || $$ Street$$';
-- payments 테이블 — 카드번호 부분 마스킹
SECURITY LABEL FOR anon ON COLUMN payments.card_number
IS 'MASKED WITH FUNCTION anon.partial(card_number, 0, $$****-****-****-$$, 4)';Example 2: Greenmask + Docker — Building It Yourself Without Neon
If your team isn't using Neon, you can build a similar pipeline yourself using the open-source Greenmask and Docker combination. I also tried this approach on a project using an existing RDS, and while it lacks the instant branching capability, it was practical enough.
One thing I'll mention upfront: I hit a timeout when I first ran CI with the Greenmask approach. The production DB was about 8GB, and while the full dump + masking + restore didn't exceed the default GitHub Actions timeout (6 hours), the runner costs were non-negligible. It only became practical after I switched to selectively dumping only the tables needed for testing.
name: E2E Tests with Greenmask
on:
pull_request:
types: [opened, synchronize, reopened]
jobs:
e2e-test:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:16
env:
POSTGRES_DB: test_db
POSTGRES_USER: test_user
POSTGRES_PASSWORD: test_pass
ports:
- 5432:5432
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout@v4
# 0단계: Greenmask 설치
- name: Install Greenmask
run: |
curl -sSL https://github.com/GreenmaskIO/greenmask/releases/latest/download/greenmask-linux-amd64 \
-o /usr/local/bin/greenmask
chmod +x /usr/local/bin/greenmask
# 1단계: Greenmask로 프로덕션 read replica에서 덤프 + 마스킹 동시 수행
- name: Dump & Mask Production Data
run: |
greenmask dump \
--source "host=${{ secrets.PROD_REPLICA_HOST }} dbname=prod_db user=readonly" \
--config greenmask-config.yml \
--output masked-dump.sql
# 2단계: 마스킹된 데이터를 테스트 컨테이너에 복원
- name: Restore Masked Data
run: |
psql postgresql://test_user:test_pass@localhost:5432/test_db \
< masked-dump.sql
# 3단계: 마이그레이션 + 테스트
- name: Apply Migrations & Run Tests
run: |
npx prisma migrate deploy
npx playwright install --with-deps chromium
npx playwright test
env:
DATABASE_URL: postgresql://test_user:test_pass@localhost:5432/test_dbSecurity Note: In the workflow above,
--sourcepoints to a read replica, not the production DB. Connecting directly to the production primary from a CI environment is risky from both a security and performance standpoint. It's recommended to use a read replica and restrict network access to that replica via VPN or IP whitelisting.
Greenmask's configuration file allows you to specify deterministic transformers:
# greenmask-config.yml
tables:
- name: users
columns:
- name: name
transformer:
name: RandomPerson
params:
seed: 42
- name: email
transformer:
name: RandomEmail
params:
seed: 42
- name: phone
transformer:
name: Mask
params:
characters_to_reveal: 3
mask_char: "*"
- name: orders
columns:
- name: shipping_address
transformer:
name: RandomAddress
params:
seed: 42| Comparison Item | Neon Branching Approach | Greenmask + Docker Approach |
|---|---|---|
| Clone Speed | Seconds (CoW) | Minutes to tens of minutes (dump/restore) |
| Additional Infrastructure | Not required (built into Neon) | PostgreSQL container + Greenmask binary |
| Cost | Neon branch billing | Only CI runner time consumed |
| DB Platform Constraints | Neon (PostgreSQL) only | Any PostgreSQL-compatible DB |
| Large-scale DB | Fast regardless of size | Slows down proportionally to dump size |
| Masking Timing | Separate execution after branching | Simultaneous processing during dump |
Test Writing Patterns — Playwright on Masked Data
When writing tests on a masked DB, you need to distinguish between two patterns.
Pattern 1: Shape-based Validation — When you don't know what the masked values are, validate the existence and format of the data:
// playwright/e2e/user-list.spec.ts
import { test, expect } from '@playwright/test';
test.describe('사용자 목록 페이지', () => {
test('프로덕션 규모의 데이터가 올바르게 페이지네이션 되는지', async ({ page }) => {
await page.goto('/admin/users');
const totalCount = page.locator('[data-testid="total-count"]');
await expect(totalCount).toHaveText(/\d{1,3}(,\d{3})*/);
// 이메일 컬럼이 마스킹되어도 이메일 형식은 유지되는지 확인
const firstEmail = page.locator('table tbody tr:first-child td.email');
await expect(firstEmail).toContainText('@');
// 페이지네이션이 대규모 데이터에서 정상 동작하는지
await page.click('[data-testid="next-page"]');
await expect(page.locator('table tbody tr')).toHaveCount(20);
});
});Pattern 2: Leveraging Deterministic Masked Values — When using seed-based masking, you can predict what value a specific piece of original data will be masked to. In this case, using the masked value in assertions is actually the correct approach:
test('검색 기능이 마스킹된 데이터에서도 동작하는지', async ({ page }) => {
await page.goto('/admin/users');
// 결정론적 마스킹이므로 시드 42 기준 원본 '홍길동' → 'Smith'로 항상 변환됨
// 이 값은 마스킹 시드와 원본 데이터가 고정되어 있으므로 안정적
await page.fill('[data-testid="search-input"]', 'Smith');
await page.click('[data-testid="search-button"]');
const results = page.locator('table tbody tr');
await expect(results).not.toHaveCount(0);
});Key Distinction: If you're using non-deterministic masking, use only Pattern 1 (shape-based). If you're using deterministic masking, you can also leverage Pattern 2 (specific value validation). When mixing both patterns, it's helpful for maintainability to note at the top of the test file which masking strategy is assumed.
Pros and Cons Analysis
Pros
| Item | Description |
|---|---|
| Discovering realistic edge cases | Since the production data's structure, volume, and relationships are preserved, you can catch bugs from NULLs, long strings, and complex relationships that would never appear in seed scripts |
| Regulatory compliance | Testing with masked PII enables leveraging production-shaped data without violating GDPR/CCPA |
| Isolated environments | Using an independent DB branch per PR eliminates data contamination between tests at the source |
| Pre-validating schema migrations | Because migrations are run "on real data," you can catch failures that work fine on an empty DB but break in production |
| Fast provisioning | CoW-based branching completes within seconds, creating no practical bottleneck in CI pipelines |
Cons and Caveats
| Item | Description | Mitigation |
|---|---|---|
| Risk of masking omissions | Forgetting to add masking rules when adding new columns can expose PII | Include masking rules in the migration PR review checklist, and consider regular scanning with PII detection tools like Microsoft Presidio |
| Business logic breakage | Email domain-based routing, phone number area code logic, etc. may behave unexpectedly after masking | Apply format-preserving masking for fields used in business logic, or exclude those fields from masking |
| Cost increase | For teams with many PRs, branch DB compute/storage costs can accumulate | Auto-deletion policy is a must. Setting up a cron job to clean up branches inactive for more than 7 days provides additional safety |
| Platform lock-in | Deep dependency on Neon branching makes DB migration difficult | Manage masking rules in standard SQL (SECURITY LABEL) and abstract the branching logic in CI workflows to reduce switching costs |
| Large DB bottleneck | For DBs over hundreds of GB, anonymize_database() may exceed CI timeouts |
A two-stage strategy is effective: periodically refresh a masked baseline branch (e.g., nightly), and branch PR branches from this baseline |
Format-Preserving Masking: A technique that changes only the values while maintaining the format (length, character types, pattern) of the original data. For example, changing "010-1234-5678" to "010-9876-5432" preserves the phone number format, so validation logic won't break.
Most Common Mistakes in Practice
-
Managing masking rules separately from schema migrations — This actually happened on our team: we added an
address_detailcolumn to theuserstable while building an address search feature, but forgot to update the masking rules. Actual addresses were exposed in the test environment for two weeks. Fortunately, it was an internal QA environment so it didn't lead to a major incident, but after that we added a "Verify masking rules" checkbox to the migration PR template. -
Using non-deterministic masking functions — Using functions like
anon.random_string()that produce different results each time makes tests non-deterministic. When debugging, you can't tell whether it's a code issue or the data that changed. Usinganon.pseudo_*family functions or seed-based transformers is much more stable. -
Forgetting to delete branches when PRs are closed — Deploying without a cleanup job lets branches pile up and costs snowball. Always include cleanup in the workflow, and as a safety net against potential omissions, it's recommended to also set up a cron job that automatically deletes branches inactive for more than 7 days.
Conclusion
Applying this workflow adds production data shape-based E2E validation to every PR. With Neon CoW branching, DB provisioning takes ~3 seconds and masking takes tens of seconds, allowing integration into existing CI pipelines with no practical bottleneck. If you've been feeling the limitations of seed scripts, this is a way to level up with minimal effort.
Three steps to get started right now:
-
Start by defining masking rules — Go through your production DB tables, identify columns containing PII, and write masking rules using
SECURITY LABELsyntax. Starting with a singlesql/masking-rules.sqlfile is all you need. This step alone is a great opportunity to understand "where PII exists in our data." -
Validate locally first — If you're using Neon, you can create a branch with
neonctl branches create --name test-maskvia the CLI, apply masking rules, and run Playwright tests locally. Even without Neon, you can spin up a PostgreSQL container with Docker and install thepostgresql_anonymizerextension for the same experiment. -
Deploy to CI — Once local validation is complete, add the GitHub Actions workflow from the examples above as
.github/workflows/e2e-preview.yml. Start by writing tests for just 2–3 core user scenarios, then gradually expand coverage — this is a solid strategy for adopting the workflow without failure.
Next post: How to automatically verify the completeness of masking rules — Building a pipeline that catches masking omissions before deployment by integrating PII detection tools with CI
References
Core Resources
- E2E Testing with Playwright + Neon Branching | Neon
- Setting Up Environments with Masked Production Data | Neon Blog
- PostgreSQL Anonymizer Official Documentation
- Greenmask Official Website
- Playwright Official Best Practices
- Neon Playwright Example Repository | GitHub
Further Reading
- PII Anonymization and Branching | Neon Blog
- Neon Data Anonymization Docs | Neon Docs
- postgresql_anonymizer Extension Docs | Neon Docs
- PostgreSQL Anonymizer Declaring Masking Rules | Official Docs
- Greenmask GitHub Repository
- Neosync GitHub Repository
- Handling PII in Staging Databases | Neon Blog
- Playwright CI Setup Guide
- Data Masking In-Place vs In-Flight Comparison | Synthesized
- Supabase Seed (Snaplet Successor) | GitHub