The problem nobody names
A Go service handles HTTP requests. Some are metadata lookups (fast, small). Some are blob writes (slow, large). They share the same worker pool. A 50MB upload monopolizes a worker slot for 3 seconds while a permission check that would take 200 microseconds waits in the queue.
This is control plane / data plane contamination. Signals and data competing for the same concurrency primitive.
The fix is architectural, not algorithmic
The standard response is a better concurrency primitive. A semaphore instead of a worker pool. Priority queues. Weighted fair scheduling. These are all correct solutions to the symptom. The cause is that two fundamentally different workloads are sharing the same path.
In HOROS, control plane and data plane are separated by design. The control plane handles routing, job status, permissions, metadata lookups. It's pure SQLite reads — sub-millisecond, no contention with data work. The data plane handles GPU inference, PDF processing, embedding generation. It operates on chunked payloads of similar size through dedicated workers.
The two never meet on the same primitive. A metadata lookup doesn't wait behind an inference call. A job status update doesn't contend with a blob write.
The consequence: the semaphore question disappears
When someone asks "should I use a semaphore or a worker pool for mixed workloads?", the correct answer is: you shouldn't have mixed workloads on the same primitive.
A semaphore wins when job size variance is high — it adapts to the actual work being done, rather than allocating fixed slots. A worker pool wins when jobs are homogeneous — it provides cleaner backpressure and simpler metrics.
But both are downstream fixes for an upstream design choice. If you separate the planes, you don't need a clever concurrency primitive to handle the mix. There is no mix.
The batch homogeneity principle
Within the data plane, HOROS further constrains variance by batching. Payloads are chunked to similar sizes before they enter the processing pipeline. The data plane never sees a 100KB job next to a 50MB job. It sees uniform chunks that take approximately the same time to process.
This means a fixed worker pool with fixed concurrency is sufficient. No weighted scheduling, no priority inversion, no adaptive anything. The pipeline is boring by design.
When separation isn't possible
Not every system can cleanly separate planes. If your business logic requires atomic transactions that span metadata and data (e.g., "write the blob AND update the index in one transaction"), separation has a cost.
The answer in that case: separate what you can, and use a smarter primitive only for the irreducibly mixed part. Don't apply the heavyweight solution to the entire system because 5% of your operations need it.
hazyhaar — open research, sovereign infrastructure github.com/hazyhaar · hazyhaar.fr