4-5state difference of the to multimodal text​