Protect the web tier
Image and feed jobs stop competing with PHP-FPM, Horizon, and request/response traffic on the main fleet.
A professional Fargate queue strategy for AME by Zentra built from the supplied overview, architecture, scaling, runbook, and troubleshooting notes — styled to match the release strategy example.
Goal: move heavy Laravel queue work off the main app hosts and onto dedicated ECS Fargate workers. Keep default separate so user-facing background work never gets mixed into heavy draining.
The supplied strategy separates heavy queue load from the web tier so queue pressure can grow or shrink without destabilising production app hosts.
Image and feed jobs stop competing with PHP-FPM, Horizon, and request/response traffic on the main fleet.
Raise image without widening sync or bulk. Pay for extra capacity only where backlog proves it.
ECS services can be scaled down and host consumers kept available during rollout until full ownership is proven.
When heavy queue work runs next to request/response traffic, tuning decisions and failures spill across unrelated parts of production.
The fix is not “more shared workers”. The fix is a separate runtime layer for heavy queues.
The platform keeps normal Laravel queue semantics, but moves heavy execution onto ECS services that can scale by workload class.
Owns heavy feed imports, sync fan-out, and manual property save finalisation. Long-running sync work stays clear of image draining.
Owns image fetch, create, backfill, and media processing. It is usually the main backlog driver and the first scaling candidate.
Owns cache warmers, cache rebuilds, and non-urgent heavy fan-out. It stays low priority and is cheap to run in burst/drain mode.
Splitting ownership by queue means each workload can be observed, sized, and rolled back separately instead of treating the whole worker fleet as one pool.
propertyfeedtemplate-syncpropertyfeedtemplate-imagepropertyfeedtemplate-bulk63793306queues:syncqueues:imagequeues:bulkThe strategy is intentionally conservative: prove correctness first, then widen only the queue that genuinely needs more throughput.
sync=1image=1bulk=1
Confirm image build, DB, Redis, S3, and queue routing.
Do not disable host consumers until ECS is healthy.
Increase only one service at a time.
Raise image from 1 to 2 first.
Observe logs and Sentry after steady state.
Move the next queue only after the first is proven.
sync, image, and bulk should not compete with ECSScale is the last step, not the first.
More workers are not automatically better. Extra capacity helps only after routing, connectivity, and job behaviour are already correct.
0..22 only after stability is proven1 in the safe baseline0..10..1Healthy operations require watching service state, connectivity, and error behaviour together rather than treating backlog as the only signal.
0 only when ready, reserved, and delayed work is emptyaws ecs describe-services \ --cluster propertyfeedtemplate-production \ --services propertyfeedtemplate-sync propertyfeedtemplate-image propertyfeedtemplate-bulk aws ecs update-service \ --cluster propertyfeedtemplate-production \ --service propertyfeedtemplate-image \ --desired-count 2
Most queue incidents are not solved by wider scale. First prove that each worker can start, reach dependencies, and consume the correct backlog.
Most likely: bad DB host, missing Aurora or Redis ingress, or stale SSM runtime values.
Most likely: bad container image, missing Laravel runtime directories, or bad bootstrap at startup.
Workers may not be consuming, jobs may be failing and requeueing, or the queue may point at the wrong Redis.
DB timeout can mean DNS improved but the path is still blocked. Redis timeout means the broker path is not healthy.
33066379Operator rule of thumb: do not scale a failing queue wider first.
The safest posture is explicit routing, clear networking, and monitoring that combines backlog with restart and dependency signals.
default separate.env and SSMThe end state is a background execution layer that is safer for the web tier, easier to scale, and cheaper to keep mostly idle.
The website keeps serving customers at the front counter while a dedicated crew in the back handles the pallets, images, and bulk work. When there is more freight, you add warehouse workers — not more people at the tills.
default into heavy ECS workersThis strategy gives AME by Zentra a background execution model that is safer, cheaper when idle, and much easier to scale and roll back intentionally.