dxd-log
🤖 AI/ML

papers | Reducing Activation Recomputation in Large Transformer Models