SemGAN: Text to Image Synthesis from Text Semantics using Attentional Generative Adversarial Networks

Image Taken from the paper

Abstract

Text to Image Synthesis is the procedure of automatically creating a realistic image from a particular text description. There are numerous innovative and practical applications for text to image synthesis, including image processing and compute-raided design. Using Generative Adversarial Networks (GANs) alongside the Attention mechanism has led to huge improvements lately. The fine-grained attention mechanism, although powerful, does not preserve the general description information well in the generator since it only attends to the text description at word-level (fine-grained). We propose incorporating the whole sentence semantics when generating images from captions to enhance the attention mechanism outputs. According to experiments, on our model produces more robust images with a better semantic layout. We use the Caltech birds dataset to run experiments on both models and validate the effectiveness of our proposal. Our model boosts the original AttnGAN Inception score by +4.13% and the Fréchet Inception Distance score by +13.93%. Moreover, an empirical analysis is carried out on the objective and subjective measures to (i) address and overcome the limitations of these metrics (ii) verify that performance improvements are due to fundamental algorithmic changes rather than initialization and fine-tuning as with GANs models.

Publication
2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.
Create your slides in Markdown - click the Slides button to check out the example.

Supplementary notes can be added here, including code, math, and images.