It is said that the cross-entropy method is good for finding rarely used rules and improve their selection chance, but in a massive ruleset which is already converged, it can be difficult to chance upon them.

The algorithm can be changed to modify the probabilities of individual rules being selected by taking into account how many times a rule has been previously used. So given a distribution of rules D, where each rule has probability P(R) and each rule records how many times it has been used U(R) (this includes times ruls are used but never fired), the probabilities of D can be updated at each step using the following formula:

D’ = forall(R): P'(R) = if (U(R) < #(R in D) * alpha) ? P(R) * (#(R in D) + 1 - U(R)) * beta : P(R)
This states, if a rule's uses are less than the number of rules in the distribution * alpha (a coefficient), then modify the probability of being selected by the probability * the number of rules in the distribution + 1 - the number of uses * beta (another coefficient).
This would probably work well enough with alpha and beta both set to 1, but some modifications may be needed to properly test each little used rule.
An example distribution:
Rule A: 0.4, U = 5 => 0.4

Rule B: 0.17, U = 0 => 0.17 * 5 = 0.85

Rule C: 0.26, U = 1 => 0.26 * 4 = 1.04

Rule D: 0.17, U = 0 => 0.17 * 5 = 0.85

Normalised:

Rule A: 0.13

Rule B: 0.27

Rule C: 0.33

Rule D: 0.27

This system attempts to maintain the probabilities of the original distribution but offsets them by the number of uses, which will eventually decrease to the threshold, resulting in a fair and stable distribution again.

I suppose I could ignore existing probabilities and just set the distribution to reflect the inverse uses (until all thresholds are met). But this could result in unfair balancing.