6 Comments
User's avatar
Jan Ibañez's avatar

The Llama 3.1 8B model that I used gives me ~100 tokens/second which is fast. Of course you can use lighter quantized versions of the model which could provide a faster token rate. Another tool I am trying to use is LLM Studio, where I can adjust the GPU allocation which also has positive impact on the token rate.

Expand full comment
Faz's avatar

That's great. Yeah i'm using LM Studio at the moment but will prolly try it on my more powerful gaming laptop to see the perf difference.

Expand full comment
Jan Ibañez's avatar

There is another one you can try which is the Msty app. It has many useful features that LLM Studio does not have yet 👌

Expand full comment
Faz's avatar

Oh this is promising. Thanks for putting it in my radar!

Expand full comment
Jan Ibañez's avatar

There is another one you can try which is the Msty app. It has many useful features that LLM Studio does not have yet 👌

Expand full comment
Faz's avatar

How long does it take for the LLM to respond per question on your hardware?

Expand full comment