Skip to content

Common issues and their solutions for the TendSocial Campaign Architecture v2.0.

Table of Contents


AI Generation Issues

Issue: "No AI configuration found for task"

Symptom: Error when attempting to generate content

Cause: Missing AIModelConfig entry for the task

Solution:

  1. Check the database for the task config:
sql
SELECT * FROM "AIModelConfig" WHERE task = 'your_task';
  1. If missing, seed the default config:
bash
cd apps/backend
pnpm run db:seed
  1. Or create manually via admin UI or API:
typescript
POST /api/admin/ai-config
{
  "task": "social_posts",
  "displayName": "Social Posts",
  "provider": "anthropic",
  "model": "claude-3-5-sonnet-20241022",
  "maxTokens": 4096,
  "temperature": 0.7,
  "inputCostPer1M": 3.0,
  "outputCostPer1M": 15.0
}

Issue: Generated content quality is poor

Possible Causes:

  1. Insufficient context (missing brand profile, examples)
  2. Wrong model for the task
  3. Prompt needs improvement

Solutions:

Check context:

typescript
// Ensure brand profile exists
SELECT * FROM "BrandProfile" WHERE "companyId" = '...';

// Ensure recent posts exist (for examples)
SELECT COUNT(*) FROM "Post" 
WHERE "companyId" = '...' AND "published" = true;

Try a better model:

typescript
PUT /api/admin/ai-config/social_posts
{
  "model": "claude-3-opus-20240229",  // Higher quality
  "inputCostPer1M": 15.0,
  "outputCostPer1M": 75.0
}

Improve prompts:

  • Add more examples to User.exampleContent
  • Provide more detailed campaign brief
  • Increase context in Campaign.context

Issue: API rate limit errors

Symptom: 429 Too Many Requests or RateLimitError

Solutions:

  1. Check provider quotas:

    • Anthropic: Check your tier at console.anthropic.com
    • Google: Check quotas in Cloud Console
    • OpenAI: Check usage at platform.openai.com
  2. Implement exponential backoff: Already included in gateway. Check logs for retry attempts.

  3. Distribute load:

    • Use multiple API keys and rotate
    • Implement BYOK for high-volume companies
  4. Use cheaper models:

    • Haiku instead of Sonnet for simple tasks
    • Reduce maxTokens to stay under limits

Configuration Problems

Issue: Config changes not taking effect

Symptom: Model changes don't apply to new requests

Cause: Configuration is cached

Solution:

typescript
POST /api/admin/gateway/clear-cache

The cache TTL is 60 seconds by default. Changes will take effect automatically within a minute, but clearing cache forces immediate update.


Issue: Company override not working

Symptom: Company-specific config is ignored

Debugging:

  1. Verify the override exists:
sql
SELECT * FROM "CompanyAIConfig" 
WHERE "companyId" = '...' AND "task" = '...';
  1. Check if isEnabled is true:
sql
UPDATE "CompanyAIConfig" 
SET "isEnabled" = true 
WHERE "companyId" = '...' AND "task" = '...';
  1. Check partial overrides: Company configs can be partial. Missing fields fall back to global defaults. This is expected behavior.

  2. Clear cache:

typescript
POST /api/admin/gateway/clear-cache

A/B Testing Issues

Issue: Users not being assigned to test

Debugging Checklist:

  1. Is the test active?
sql
SELECT "isActive" FROM "AIABTest" WHERE id = '...';
  1. Is it within date range?
sql
SELECT "startsAt", "endsAt" FROM "AIABTest" WHERE id = '...';
  1. Does the user match targeting?
sql
SELECT "targetCompanyIds", "targetUserIds" FROM "AIABTest" WHERE id = '...';
  1. Clear cache:
typescript
POST /api/admin/gateway/clear-cache

Issue: Uneven variant distribution

Symptom: 100 users but 90/10 split instead of 50/50

Explanation: This is normal with small sample sizes. Random distribution approaches target weights as sample size increases.

Solutions:

  • Run test longer (more users)
  • Expected variance with 100 users: ±10-15%
  • Need 1000+ users for < 5% variance

Issue: Can't change variant after assignment

Symptom: User stuck with same variant

Explanation: This is by design. Consistent assignment ensures valid A/B test results.

If you need to reset:

sql
DELETE FROM "AIABAssignment" 
WHERE "testId" = '...' AND "userId" = '...';

Then clear cache. User will get new assignment on next request.


Performance Problems

Issue: Slow generation requests

Diagnostic Steps:

  1. Check latency logs:
sql
SELECT 
  AVG("latencyMs") as avg_latency,
  MAX("latencyMs") as max_latency,
  model
FROM "AIUsageLog"
WHERE "createdAt" > NOW() - INTERVAL '1 hour'
GROUP BY model;
  1. Check token counts:
sql
SELECT 
  AVG("inputTokens") as avg_input,
  MAX("inputTokens") as max_input
FROM "AIUsageLog"
WHERE "createdAt" > NOW() - INTERVAL '1 hour';

Solutions:

  • High input tokens: Reduce context, trim examples
  • Long latency: Use faster model (Haiku vs Opus)
  • Provider issues: Check status pages
  • Network issues: Test with curl to provider API directly

Issue: High database query times

Diagnostic:

sql
-- Enable slow query logging
ALTER SYSTEM SET log_min_duration_statement = 1000; -- Log queries > 1s
SELECT pg_reload_conf();

-- Check for missing indexes
SELECT schemaname, tablename, attname, n_distinct, correlation
FROM pg_stats
WHERE schemaname = 'public'
ORDER BY n_distinct DESC;

Solutions:

  • Add indexes on frequently queried columns
  • Use EXPLAIN ANALYZE to identify bottlenecks
  • Consider materialized views for complex aggregations

Database Issues

Issue: Migration fails

Common Causes:

  1. Existing data conflicts:

    • Adding NOT NULL column to table with data
    • Unique constraint on existing duplicates
  2. Incomplete rollback:

    • Previous migration partially applied

Solutions:

  1. Check migration status:
bash
pnpm prisma migrate status
  1. Resolve manually:
bash
pnpm prisma migrate resolve --applied 20250101000000_migration_name
  1. Reset (development only!):
bash
pnpm prisma migrate reset

Issue: RLS (Row-Level Security) violations

Symptom: "No Prisma Client model called X" or missing data

Cause: Trying to query tenant-scoped data without using getTenantPrisma()

Solution:

typescript
// ❌ WRONG
const posts = await prisma.post.findMany({ where: { companyId } });

// ✅ CORRECT
const tenantPrisma = getTenantPrisma(companyId);
const posts = await tenantPrisma.post.findMany();

Job Execution Issues

Issue: Cron jobs not running

Debugging:

  1. Check if jobs are enabled:
bash
echo $JOBS_ENABLED
  1. Check cron schedules:
bash
echo $JOB_PROFILE_ANALYSIS  # Should be "0 3 * * *"
  1. Check for errors:
sql
SELECT * FROM "ProfileAnalysisJob"
WHERE status = 'failed'
ORDER BY "createdAt" DESC
LIMIT 10;
  1. Check logs:
bash
# For running jobs
tail -f logs/cron.log

# For completed jobs
grep "ProfileAnalysisJob" logs/app.log

Issue: Job stuck in "running" status

Cause: Job crashed without updating status

Solution:

  1. Check for zombie jobs:
sql
SELECT * FROM "ProfileAnalysisJob"
WHERE status = 'running'  AND "startedAt" < NOW() - INTERVAL '2 hours';
  1. Reset stuck jobs:
sql
UPDATE "ProfileAnalysisJob"
SET status = 'failed',
    "errorMessage" = 'Job timed out'
WHERE status = 'running'
  AND "startedAt" < NOW() - INTERVAL '2 hours';
  1. Implement job locking: Already implemented in Phase 8. Check for duplicate job execution.

Cache Issues

Issue: Stale data from cache

Symptoms:

  • Old config appearing in responses
  • A/B test changes not applying
  • Updated company settings not reflected

Solutions:

Manual cache clear:

typescript
POST /api/admin/gateway/clear-cache

Check cache TTL:

bash
echo $AI_CONFIG_CACHE_TTL    # Should be 60 (seconds)
echo $AI_GATEWAY_CACHE_TTL   # Should be 300 (seconds)

Decrease TTL (not recommended):

bash
AI_CONFIG_CACHE_TTL=30  # 30 seconds (more DB load)

Logging and Debugging

Enable verbose logging

bash
LOG_LEVEL=debug pnpm dev

Query logs efficiently

sql
-- Recent errors
SELECT * FROM "AIUsageLog"
WHERE success = false
  AND "createdAt" > NOW() - INTERVAL '1 hour'
ORDER BY "createdAt" DESC;

-- Expensive requests
SELECT 
  "contentType",
  "contentId",
  "totalCostCents",
  "totalTokens"
FROM "AIUsageLog"
WHERE "totalCostCents" > 50  -- More than 50 cents
ORDER BY "totalCostCents" DESC
LIMIT 20;

-- Slow requests
SELECT 
  model,
  AVG("latencyMs") as avg_latency
FROM "AIUsageLog"
WHERE "createdAt" > NOW() - INTERVAL '24 hours'
GROUP BY model
HAVING AVG("latencyMs") > 5000;  -- Slower than 5s

Getting Help

If issues persist:

  1. Check logs: Look for stack traces and error messages
  2. Search docs: Review architecture plan and implementation checklist
  3. Check provider status: anthropic.com/status, cloud.google.com/status
  4. Create issue: Include logs, config, and steps to reproduce
  5. Contact support: support@tendsocial.com

Useful Commands

bash
# API
pnpm dev                    # Start dev server
pnpm build                  # Build production
pnpm test                   # Run tests
pnpm lint                   # Type checking

# Database
pnpm prisma studio          # GUI for database
pnpm prisma migrate dev     # Apply migrations
pnpm prisma generate        # Regenerate client
pnpm run db:seed            # Seed default data

# Jobs
pnpm run worker             # Start job worker

# Cache
curl -X POST http://localhost:4000/api/admin/gateway/clear-cache \
  -H "Authorization: Bearer $ADMIN_TOKEN"

TendSocial Documentation