The Llama 3.1 8B model that I used gives me ~100 tokens/second which is fast. Of course you can use lighter quantized versions of the model which could provide a faster token rate. Another tool I am trying to use is LLM Studio, where I can adjust the GPU allocation which also has positive impact on the token rate.
The Llama 3.1 8B model that I used gives me ~100 tokens/second which is fast. Of course you can use lighter quantized versions of the model which could provide a faster token rate. Another tool I am trying to use is LLM Studio, where I can adjust the GPU allocation which also has positive impact on the token rate.
That's great. Yeah i'm using LM Studio at the moment but will prolly try it on my more powerful gaming laptop to see the perf difference.
There is another one you can try which is the Msty app. It has many useful features that LLM Studio does not have yet 👌
Oh this is promising. Thanks for putting it in my radar!
There is another one you can try which is the Msty app. It has many useful features that LLM Studio does not have yet 👌
How long does it take for the LLM to respond per question on your hardware?