본문 바로가기
공부기록/[Paper]

Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs (공부기록 6일차, 240416)

by RiverWon 2024. 4. 17.

 

 

PaLI-3 architecture에

pre-training recipe : consists of two backbones, ViT-2B and TextEncoder UL2-3B

 

Pre-training: Chart2Table Mixture

unfrozen ViT에 대해 수행한다.

여러 Chart to Table 데이터 mixture을 사용해 pretraining을 수행한다.

 

Fine-tuning: Multi-task Loss

two ways of incorporating the rationales available in the extended datatset.

 

1. Single-Task setup

changing the target task from answer to rationale, answer

 

2. Multi-Task setup

answer and rationale are treated as independent task

 

 

Result

 

Singletask vs. Multitask

Human dataset에 비해 Augmented dataset이 QA pair이 좀 더 단조로움

 

Learning with augmented dataset