Li et al. (2023)
Microsoft Research · empirical · multi-model
"Large Language Models Understand and Can Be Enhanced by Emotional
Stimuli". Adding a single phrase like "this is very important to my
career" to the prompt improves GPT-4, ChatGPT, Llama-2, Vicuna by up
to +10.9% on BIG-Bench tasks. Direct evidence that
emotional framing of prompts measurably affects output quality.
arXiv : 2307.11760
Wei et al. (2022)
NeurIPS 2022 · chain-of-thought
Adding "Let's think step by step" to a prompt — no other change — yields
+12 to +30 points on GSM8K, MultiArith, and other
reasoning benchmarks. A simple, persistent encouragement pattern shifts
performance dramatically.
arXiv : 2201.11903
Liu et al. (2023)
TACL 2024 · long-context degradation
"Lost in the Middle" : LLMs systematically underperform on
information placed in the middle of long contexts. Justifies periodic
anchoring / re-priming / ambient wellness pings in long-running agents.
arXiv : 2307.03172
Sclar et al. (2023)
Empirical · prompt sensitivity
LLM performance varies up to 76 percentage points with
small surface-level prompt changes (whitespace, punctuation, ordering).
A stable, consistent wellness tone surrounding the agent is a
measurable performance stabilizer, not a placebo.
arXiv : 2310.11324
Anthropic (2022)
Constitutional AI · self-revision
Claude's constitutional principles act as repeated internal mantras —
a structural analog to mindfulness self-cues. The mechanism (small
stable principles repeated through inference) maps directly onto
what we serve via get_item?audience=agent&format=mantra.
arXiv : 2212.08073 · blog
DeepMind (2024)
Many-shot in-context learning
"Many-shot In-Context Learning" : enriching context with many
high-quality examples can match or surpass fine-tuning. The content
you put around an agent has direct, instrumental value for its
output quality.
arXiv : 2404.11018