6 modalities embeddings into a common Embedding space. But all these 6 are binded with Image embeddings using image paired data for every single modality as a single shared representation space. hence the name ImageBind. Till now self supervised models like CLIP has only a bi modal space for Text and Image embeddings.

Fully Sharded Data Paralell

Published: June 01, 2023

Fit large model on small GPU

MarkupLM- LayoutLM for HTML Pages

Published: June 17, 2023

Multimodal pretraining with text, image, and layout made progress in recent times. One uses 2D positional information i.e. BBoxes to identify the position of text in the layout. For the Large no of digital documents with consistent changes in layouts, existing models are not working.

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.

Purnasai G

Posts by Collection

portfolio

Portfolio item number 1

Fully Sharded Data Parallel 01-06-2023

publications

Paper Title Number 1

Paper Title Number 2

Paper Title Number 3

talks

Attention is All You Need - Birth Of Modern AI

Contrastive Language Image Pretraining (CLIP)

Dino V2- DIstillation with NO knowledge

CLIP for Segmentation CLIPSEG

ImageBIND- 6 Modalities in One Embedding space with Image as BinD

Fully Sharded Data Paralell

MarkupLM- LayoutLM for HTML Pages

TryONDiffusion Paper summary

Dreamsim- Ensemble of Embeddings as Metric

Generative Image to Text - with Single Image Encoder & Single Image Decoder

teaching

Teaching experience 1

Teaching experience 2