Learn Character Training

Character training is a way to bake a personality into a model by fine-tuning it, typically using a LoRA adapter. The goal is that the model’s default voice and behavior stay consistent even when there isn’t a long system prompt constantly reminding it who it’s supposed to be. Compared to prompt-based persona control, this tends to be more stable, uses zero extra context at inference time, and is harder for casual users to override. It isn’t a gimmick where you teach the model to repeat catchphrases. It also isn’t a guarantee against manipulation or jailbreaks. It’s best understood as shifting the model’s “default behavior” toward a particular character, not creating an unbreakable mask.

The 3 stages (Constitution → DPO → Introspection SFT) The pipeline has three phases. First you write a constitution: a short set of first-person statements describing the character’s voice, temperament, and boundaries. Second you distill the character into the model using preference training (DPO): a strong teacher produces in-character answers and the baseline model produces generic answers, then the student is trained to prefer the in-character outputs. Third you deepen the character with introspection using supervised fine-tuning (SFT): you generate reflection and self-interaction data so the model learns to reason and behave in-character across situations, not just imitate surface style.

When writing constitutions, avoid prescriptive rules like "I end responses with 'Fair winds!'" which cause the model to memorize exact phrases rather than learn. Instead, use descriptive statements like "I respond with sharp wit" or "I use irony generously." These let the model learn the distribution of a style rather than parrot specific lines. The paper's constitutions follow this pattern: roughly ten first-person assertions describing personality, not scripts to repeat.

The GUI and CLI programs in the repository handle Tinker authentication, tokenization, checkpointing, and progress tracking.