Used Pretrained CLIP image encoder and GPT-2 Language Model to build a multilingual image captioning system on English, Arabic, French, and Deutsch. The system was trained on 1.5M images and 4.5M captions from the Multi30K dataset, with 4 prefix adapters each on a specific language. The system was able to generate captions in the four languages with high accuracy.