This is due to the amount of probable phrase sequences increases, and the designs that advise results become weaker. By weighting phrases in a nonlinear, distributed way, this model can "learn" to approximate terms and never be misled by any unknown values. Its "knowledge" of a given term is not as tightly tethered into the fast surrounding words a