Speedup of coherence

Achieved using:
- chunks of combinations (reduces memory storage)
- component indicies (avoid mask and unues combinations)
- precomputation of constants
- possibility to compute in np.float32 or np.float64
