Using co-attention mapping between WSIs and genomic to capture multimodal interactions between histology images and genes for predicting patient survival. By adapting Transformer layers as a general encoder backbone in MIL, we consistently outperform current SOTA for survival prediction across 5 different cancer datasets (4,730 WSIs, 67 million patches).