Training Overview¶
LangSlice training code is public, but raw data, generated corpora, caches, checkpoints, and private run logs are local-only.
Layout¶
models/langslice-gemma-4/training/sft/- SFT trainer and data contract.models/training-core/langslice_training/rl/single_turn/- active single-turn RL trainer.models/training-core/langslice_training/- shared reusable training code.models/langslice-traces/langslice_traces/- trace generation and rendering primitives.models/training-core/langslice_training/corpus/- synthetic trace-corpus generation and atlas region-description.models/data/langslice_data/- public manifest/QC tooling and fixtures.
Entrypoints¶
The public harness CLI remains:
Training entrypoints live under the model hub and are exposed through small launchers:
These launchers only validate imports and arguments when invoked with --help;
they do not start training unless full training arguments are provided.
iSFT is retired as a public product/pipeline and no longer has a public launcher.
Data Policy¶
Tracked files may include package code, README files, small fixtures, and model metadata. Keep the following out of the public repo:
- raw datasets and manifest rows
- generated SFT/RL corpora
- atlas/query image caches
- model checkpoints and adapters
- QC thumbnails, debug traces, and local training logs
The .gitignore rules reserve local paths for those materials.