Ollama: My First Steps into Local LLMs

Mar 17

I've been diving deep into Large Language Models lately, but frankly, the barrier to entry feels…high.

6 Comments

The Llama 3.1 8B model that I used gives me ~100 tokens/second which is fast. Of course you can use lighter quantized versions of the model which could provide a faster token rate. Another tool I am trying to use is LLM Studio, where I can adjust the GPU allocation which also has positive impact on the token rate.

Expand full comment

That's great. Yeah i'm using LM Studio at the moment but will prolly try it on my more powerful gaming laptop to see the perf difference.

Expand full comment

There is another one you can try which is the Msty app. It has many useful features that LLM Studio does not have yet 👌

Expand full comment

Oh this is promising. Thanks for putting it in my radar!

Expand full comment

There is another one you can try which is the Msty app. It has many useful features that LLM Studio does not have yet 👌

Expand full comment

How long does it take for the LLM to respond per question on your hardware?

Expand full comment

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts