spice decoder Diffusion
spice based PyTorch implementation for alignment regularization.
- Input
- 4306-dim embedding
- Encoder
- 83 x Diffusion with 26 heads
- Output
- perplexity projection
Training config
optimizer=AdamW, lr=0.219, scheduler=exponential, warmup=1769