kubecon24/02_ai_keynote.md at 76ee14e6676956a429c0534432aeabd5bde36f22

title

weight

Guests

What do you use as the base of dev for OLLAMA
- Jeff: The concepts from docker, git, Kubernetes
How is the balance between AI engineer and AI ops
- Jeff: The classic dev vs ops divide, many ML-Engineer don't think about
- Paige: Yessir
How does infra keep up with the fast research
- Paige: Well, they don't - but they do their best and Cloud native is cool
- Jeff: Well we're not google, but Kubernetes is the savior
What are scaling constraints
- Jeff: Currently sizing of models is still in its infancy
- Jeff: There will be more specific hardware and someone will have to support it
- Paige: Sizing also depends on latency needs (code autocompletion vs performance optimization)
- Paige: Optimization of smaller models
What technologies need to be open source licensed
- Jeff: The model b/c access and trust
- Tim: The models and base execution environment -> Vendor agnosticism
- Paige: Yes and remixes are really important for development
Anything else
- Jeff: How do we bring our awesome tools (monitoring, logging, security) to the new AI world
- Paige: Currently many people just use paid APIs to abstract the infra, but we need this stuff self-hostable
- Tim: I don't want to know about the hardware, the whole infra side should be done by the cloud native teams to let ML-Engineer to just be ML-Engine