Memories of a different life
In 1996 I graduated from Brown University to go work at SGI on what was a rather interesting technical problem: how to make hybrid scheduling work.
Just to recap, in the early 90’s the UNIX vendors were adding threading to their kernels and there was intense debate over what the right model was:
In the 1 on 1 model, each user thread had a kernel thread. In the N on 1, all of the user threads were multiplexed on one kernel thread. Finally in the M on N model, many user level threads were multiplexed on a smaller number of kernel threads.
The N on 1 model was the “simplest” to implement, but did not allow the application to exploit multiprocessors. One property of the N on1 model was that synchronization between different threads did not require a context switch.
Although 1 on1 was simple enough to implement it had, what at the time seemed serious, issues. The two most discussed were the memory resources necessary to maintain the kernel data-structures, and the second was the performance implications of making every synchronization operation a kernel operation.
So the M on N model was promoted as a compromise between the N on 1 model and the resource intensive 1 on 1 model. To implement the M on N model, two schedulers were required : a kernel and user level thread scheduler. The central notion was that the kernel scheduler would schedule processors and the user level thread scheduler would assign threads to the available processors represented by the kernel threads. The nice property of this model is that if synchronization was required between user-level threads, the performance would be as good as the N on 1 model.
The challenge of this model was that the thing scheduling the physical resources, the kernel scheduler, had incomplete information and was prone to making mistakes. And the cost of mistakes was idle processors and therefore worse performance.
Ultimately a variety of attempts were made to fix the problem including scheduler activations and scheduler aware synchronization and nano-threads but none good enough to stop the ultimate triumph of the 1 on 1 model.
It was the need to co-ordinate between the two schedulers that eliminated the advantages of the M on N model.
One could almost argue that this was a computer science boondoggle.
So what does this have to do with storage?
At the end of the day the lesson I learned from the whole experience is that having multiple layered resource schedulers scheduling the same resources is difficult to make work.
If you look at the storage stack, what you’ll see is multiple distinct storage layers each managing virtualized pools of storage that they have no visibility into. And what’s worse the thing that actually does the physical resource management has no visibility into what the resources are being used for.
So to me it looks that the storage stack has evolved over the last 20 or so years to look a lot like the hybrid thread scheduling model.
And that is enough to give me reason to pause and think.




