"Chunky" Post-Training
Chunky Post-Training: Data Driven Failures of Generalization Seoirse Murray, Allison Qi, Timothy Qian, John Schulman, Collin Burns, Sara Price Paper (arXiv) | Code (SURF) | Results Explorer Overview Post-training transforms a base language model into a useful assistant by teaching it a range of behaviors. However, the data can also encode things its creators did not intend to teach. When features of the training data correlate with a behavior, the model may learn to condition on those features rather than the intended principle. ...