Transform Your AI: Essential LLM Settings for Natural, Precise Responses

Categories: AI

by Nelson Lemos de Sousa

From fundamentals to creativity tuning, learn how small tweaks make a big difference.

Imagine if just a few tweaks could transform your AI’s responses from robotic to remarkably natural. In the world of LLMs, mastering the settings behind the machine is not only about customisation, it’s about taking control of your output to perfectly suit your needs.

Small Mathematical Detour

Before diving into the settings let’s have a small detour; While we wont cover the architecture or mathematics of the ghost behind the machine it will be useful to introduce a small concept, namely the Softmax function.

In general, the result of any transformer-based LLM is a one-dimensional vector even if internal representations might be multidimensional. In the case of a generative text LLM, this vector will contain a number for each token in the LLM’s “dictionary” of words.

But what information can we get from these values? Here is where the Softmax function comes into play.

This function accentuates the relative differences between values and returns a normalised vector in which each value is part of a probability distribution ranging from 0 to 1.

By applying the Softmax, each of the values gets transformed into a probability, which, in the case of text generative LLMs, corresponds to the likelihood of that token being produced.

Key Points of Softmax:

Converts raw scores into probabilities
Normalises outputs to a 0–1 range
Highlights

From here we can enter the realm of the settings per se.

Settings

Temperature

The Temperature is a value applied to the Softmax in order to change its behaviour.

Temperature Effects:

High Temperature: Increases entropy, flattens the distribution, encourages creative responses.
Low Temperature: Reduces entropy, sharpens focus on the most likely outputs.

Altering the Temperature will alter the entropy (“peakedness”) and, in an informal sense its kurtosis (“tailedness”) by acting as an “attenuator”.

Higher temperatures will lead to a higher entropy, “flattening” the probability distribution and making exceptional values more common; conversely, lower temperatures will lower the entropy, focusing the probability distribution on the most probable outputs.

Tip

Lower the temperature for precise Q&A scenarios, and raise it for creative tasks.

Top P:

Top Probability: A sampling technique applied after the Softmax. It samples tokens that cumulatively account for the top percentage on the results. Lowering this value will restrict the model to a smaller sample of tokens, leading to a reduced level of variability of the responses given.

Key Points:

Limits token choices by cumulative probability.
Lower values restrict variability; ideal when you need focused output.

Note

While both Top P and Temperature control randomness, adjusting one can often be sufficient. If you choose to modify both, do so cautiously, as their effects can overlap.

Top K

A close cousin of the Top P, this value samples the top values by a specific number rather thanby cumulative probability.

Example

With K=10, it will only take the 10 highest-valued tokens into account when choosing the next token.

Max Length

Maximum number of tokens that the LLM can generate.

Stop Sequences

A string or a list of strings that indicates a stopping point for the LLM. If any of these values is generated, the generation will stop. This can be exceedingly useful to stop XML or other structured data generation at a specific point.

Tip

You can use } to stop JSON generation at certain points or <\end tag> for XML.

Frequency Penalty

A penalty that is applied to a token in proportion to the number of times it has previously appeared in the response and in the prompt, reducing, as such, the repeated usage of the same words.

Tip

Increasing penalties might be a fun way to discover new words.

Presence Penalty

A penalty that is applied to a token if it has already been generated. In contrast to the proportional Frequency Penalty, this penalty is the same if the token appears once or n times.

Reasoning Effort

Used in some newer Chain of Thought models, this value allows control over how much “effort” an LLM expends in its COT reasoning steps.

This value allows a simple trade-off adjustment between cost&speed vs result quality.

Example

OpenAI commonly allows for low, medium, and high settings for their reasoning models, where high provides more reasoning steps before the final answer, increasing its accuracy, but also it’s output and costs.

Sed

This value is the Seed for probabilistic random algorithms and allows for reproducible generation when set.

Context Length

The number of tokens that a model can consider at once when generating a response. When the input text exceeds this value, the LLM has to truncate parts of it, which is a common cause of the well-known lapses of memory on prolonged conversation with LLMs.

The maximum context length is fixed for a specific model or architecture and can’t be increased.

Tip

Lower total context length for faster results on small local models if complex answers aren’t needed.

Conclusion

In this blog post we focused on the most common settings available in several APIs. However, this is not an exhaustive list; a whole world of less common settings exists for specific models; From sampling methods like Mirostat and Tail-Free Sampling, to more differing types of repetition penalties, specific CPU and GPU settings for local models, and much more…
Keep Learning.

View all

Transform Your AI: Essential LLM Settings for Natural, Precise Responses

Transform Your AI: Essential LLM Settings for Natural, Precise Responses

by Nelson Lemos de Sousa

Share

by Nelson Lemos de Sousa

From fundamentals to creativity tuning, learn how small tweaks make a big difference.

Small Mathematical Detour

Key Points of Softmax:

Settings

Temperature

Temperature Effects:

Tip

Top P:

Key Points:

Note

Top K

Example

Max Length

Stop Sequences

Tip

Frequency Penalty

Tip

Presence Penalty

Reasoning Effort

Example

Sed

Context Length

Tip

Conclusion

Share

Related Posts

Context Engineering: Beyond the Prompt for Better AI

Vibe Coding: The AI-Assisted Revolution and Its Perils

Why switch to Azure Virtual WAN

Building Scalable AI Solutions with Azure AI Foundry

Company

Academy

Services

Newsletter