Training Overview¶

LangSlice training code is public, but raw data, generated corpora, caches, checkpoints, and private run logs are local-only.

Layout¶

models/langslice-gemma-4/training/sft/ - SFT trainer and data contract.
models/training-core/langslice_training/rl/single_turn/ - active single-turn RL trainer.
models/training-core/langslice_training/ - shared reusable training code.
models/langslice-traces/langslice_traces/ - trace generation and rendering primitives.
models/training-core/langslice_training/corpus/ - synthetic trace-corpus generation and atlas region-description.
models/data/langslice_data/ - public manifest/QC tooling and fixtures.

Entrypoints¶

The public harness CLI remains:

langslice version

Training entrypoints live under the model hub and are exposed through small launchers:

langslice-gemma-sft --help
langslice-gemma-rl --help

These launchers only validate imports and arguments when invoked with --help; they do not start training unless full training arguments are provided.

iSFT is retired as a public product/pipeline and no longer has a public launcher.

Data Policy¶

Tracked files may include package code, README files, small fixtures, and model metadata. Keep the following out of the public repo:

raw datasets and manifest rows
generated SFT/RL corpora
atlas/query image caches
model checkpoints and adapters
QC thumbnails, debug traces, and local training logs

The .gitignore rules reserve local paths for those materials.