I was a bit off in the details, but the gist is: with a Bayer filter, you're using four sensor elements to come up with (for exmaple) 24 bits of information: 8 bits of luminance (intensity), and 16 bits of color (hue and saturation, or G-B and G-R, for example). Because dynamic range is determined by the luminance range, you've got 8 bits (256 levels) of dynamic range). Color information is relatively expensive, is another way of looking at it.
A B&W sensor could use those same four elements to come up with 24 bits of information that is purely luminance, so 24 bits of dynamic range (16 million levels). I *think*. I'm not 100% sure - for sure, you could get a significant increase in dynamic range, I'm just thinking through how to do it*.
Understanding Digital Camera Sensors
(* using geeky terminology, using ND filters instead of color filters would let you produce something more like a 10-bit floating-point output (8 mantissa and 2 exponent) rather than a real 24-bit integer output. More research is needed
).