Paper summaries

Generative Image to Text - with Single Image Encoder & Single Image Decoder

June 18, 2023

Dreamsim- Ensemble of Embeddings as Metric

June 18, 2023

For Image similarity.

TryONDiffusion Paper summary

June 17, 2023

A Tale of 2 U-Nets with Cross-Attention Communication

MarkupLM- LayoutLM for HTML Pages

June 17, 2023

Multimodal pretraining with text, image, and layout made progress in recent times. One uses 2D positional information i.e. BBoxes to identify the position of text in the layout. For the Large no of digital documents with consistent changes in layouts, existing models are not working.

Fully Sharded Data Paralell

June 01, 2023

Fit large model on small GPU

ImageBIND- 6 Modalities in One Embedding space with Image as BinD

June 01, 2023

6 modalities embeddings into a common Embedding space. But all these 6 are binded with Image embeddings using image paired data for every single modality as a single shared representation space. hence the name ImageBind. Till now self supervised models like CLIP has only a bi modal space for Text and Image embeddings.

CLIP for Segmentation CLIPSEG

May 25, 2023

Segmentation using CLIPs Multimodality Embedding space & a Conditinal Decoder

Dino V2- DIstillation with NO knowledge

May 12, 2023

Paper here

Contrastive Language Image Pretraining (CLIP)

May 10, 2023

Image & Text as single Multimodality Embedding space.

Attention is All You Need - Birth Of Modern AI

May 03, 2023

This is more of my understanding of Transformers rather than the paper summary.

Purnasai G

Paper summaries