Num_ctx sets the context size of the model. This affects how much can be supplied to the model and can be returned from the model. Each model is trained for a specific context size. If you set num_ctx to something greater than what the model was trained for, you will probably get unpredictable results.
If num_ctx is not set, Ollama will always set it to 2048 tokens, regardless of how the model is trained. You can find the maximum value by interrogating the GGUF file and finding [architecture].context_size. In the case of Llama models, that will be llama.context_size.
Num_ctx determines the full context size the model can remember. If set to 2048 and the input in 1024, then the model can only output 1024 tokens before it starts to forget the beginning of the input.
#ollama/parameters