Hardware to Local LLM

Hardware

GPU Selection

The principal is finding GPU with the latest architecture and largest RAM within budget 1. According to Ollama: “You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.” 2 Tested on a GTX970, 4GB of VRAM is sufficient for a 3.7B model (phi3).

modelRAMTDPpricecomment
RTX2080Ti222502800modified VRAM
RTX3090243508000second-hand
RTX4060Ti161604000brand new
RTX4070Ti122857000brand new

Consider RAM, TDP and price, the winner is RTX4060Ti.

Benchmark

taskGTX970RTX4060Ti
stable-diffusion-webui01:5300:11
transcript-with-whisper24:34+24:4200:49+02:21
ollama-with-phi3798t/45s639t/8s

Application

personal finance analysis:

I have some financial records in the `beancount` format that I'd like analyzed. Could you review the data and provide suggestions to optimize my income and expenses? Here's the information:

<records>
</records>

This analysis is very important to me, so I appreciate your thorough and detailed suggestions.

meeting transcript and summary:

Optimize the provided transcription from a meeting by dividing it into three sections in Simplified Chinese:
1. Summary: Extract the key information from the meeting and present it in a bullet-point list. Focus on the main topics discussed, decisions made, and any significant viewpoints expressed.
2. Conclusion: Write a concise paragraph summarizing the overall outcome and general consensus achieved in the meeting. Highlight any major conclusions or strategic decisions that were reached.
3. Todos: Identify and list all actionable items or tasks that were decided upon during the meeting. Format these as bullet points, being specific about the tasks, responsible individuals, and any deadlines associated with these actions.

Ensure the transcription is accurate and fully captures the essence of the meeting, and present it clearly in Simplified Chinese.

Following is the transcription content:

<transcription>
</transcription>
· llm