3. Retries¶
Beginner - 7 minutes. Retry counts, delays, backoff, and the dead-letter queue.
Soniq retries failed jobs automatically. Every job gets 3 retries by default (4 total attempts). You control the retry count, delay strategy, and backoff at the decorator level.
A note on terms. Soniq guarantees at-least-once delivery: a job will run at least once, and may run more than once if a worker crashes mid-handler. Idempotent means "safe to run more than once with the same end result." The two ideas come together in this chapter - retries are the most common reason a job runs twice, so handlers need to be idempotent. The Idempotency section at the bottom shows how.
Configuration¶
@app.job(
max_retries=5, # retry up to 5 times (6 total attempts)
retry_delay=10, # wait 10 seconds between retries
retry_backoff=True, # exponential backoff
retry_max_delay=300, # never wait more than 5 minutes
)
async def call_payment_api(invoice_id: str):
...
| Parameter | Type | Default | Description |
|---|---|---|---|
max_retries |
int |
3 |
Number of retry attempts after first failure |
retry_delay |
int \| list[int] |
0 |
Seconds between retries |
retry_backoff |
bool |
False |
Multiply delay exponentially on each attempt |
retry_max_delay |
int \| None |
None |
Upper bound on computed delay |
Fixed delay¶
Same wait between every retry:
Retries at: 10s, 10s, 10s, 10s, 10s.
Per-attempt delays¶
Provide a list to set exact delays for each retry:
@app.job(max_retries=3, retry_delay=[1, 5, 30])
async def send_verification_email(user_id: int):
...
Retries at: 1s, 5s, 30s. If there are more retries than list entries, the last value is reused.
Exponential backoff¶
Set retry_backoff=True to double the delay on each attempt:
@app.job(max_retries=5, retry_delay=10, retry_backoff=True)
async def call_payment_api(invoice_id: str):
...
Retries at: 10s, 20s, 40s, 80s, 160s.
If retry_delay is 0, a base of 1 second is used: 1s, 2s, 4s, 8s, 16s.
Max delay cap¶
Prevent delays from growing unbounded:
@app.job(max_retries=8, retry_delay=5, retry_backoff=True, retry_max_delay=300)
async def fetch_external_report(report_id: str):
...
Retries at: 5s, 10s, 20s, 40s, 80s, 160s, 300s, 300s. The delay never exceeds retry_max_delay.
After max retries: the dead-letter queue¶
When a job exhausts all retries, the row is moved out of soniq_jobs and into the dead-letter queue - the soniq_dead_letter_jobs table, which is a holding area for jobs that failed every retry. The original soniq_jobs row is deleted in the same transaction.
Dead-letter jobs do not run on their own. You inspect them, fix the underlying problem, and replay the ones you want re-run. See Dead-letter queue for the full inspect / replay workflow.
Examples¶
API call with backoff¶
External APIs may rate-limit or have intermittent outages. Exponential backoff gives them time to recover.
@app.job(max_retries=5, retry_delay=2, retry_backoff=True, retry_max_delay=120)
async def sync_user_to_crm(user_id: int):
user = await get_user(user_id)
async with httpx.AsyncClient() as client:
resp = await client.post("https://crm.example.com/api/users", json=user.dict())
resp.raise_for_status()
Email sending with fixed delay¶
Email servers sometimes reject connections temporarily. A short fixed delay usually works.
@app.job(max_retries=3, retry_delay=5)
async def send_order_confirmation(order_id: str):
order = await get_order(order_id)
await smtp_send(to=order.email, subject=f"Order {order_id} confirmed", body=...)
File processing with escalating delays¶
File imports may fail due to locks or temporary storage issues. Escalate the delay to avoid hammering the system.
@app.job(max_retries=3, retry_delay=[5, 30, 120])
async def import_csv(file_path: str):
async with aiofiles.open(file_path) as f:
rows = await parse_csv(f)
await bulk_insert(rows)
Idempotency¶
Soniq provides at-least-once delivery: a job will run at least once, and may run more than once. Retries are the obvious source of repeats, but a worker that crashes mid-handler will also cause the job to be picked up by another worker and re-run. Every job queue has this property - it is not a Soniq bug, it is a consequence of "do not lose work when machines fail".
The fix is to make your handler idempotent - safe to run more than once with the same end result:
- Use
INSERT ... ON CONFLICT DO UPDATEinstead of plain inserts. The second run updates a row that already exists, instead of creating a duplicate. - Store an idempotency key (often the job arguments hashed, or a unique field on the request) and check it before performing the side effect. If the key already shows the work as done, return.
- Check current state before acting. Before sending an email, look at whether you have already recorded sending it. Before charging a card, look at whether the charge already exists in your payments table.
The pattern is "make the second run a no-op", not "guarantee the second run never happens".
Job timeouts¶
Every job has a default execution timeout of 300 seconds (5 minutes). If a job exceeds its timeout, it is treated as a failure and follows normal retry logic.
@app.job(max_retries=3, timeout=60)
async def quick_health_check(service_url: str):
...
@app.job(timeout=None) # no timeout
async def long_running_migration():
...
Change the global default with SONIQ_JOB_TIMEOUT. Set to 0 to disable timeouts globally.