diff --git a/content/day2/01_opening.md b/content/day2/01_opening.md new file mode 100644 index 0000000..e6424eb --- /dev/null +++ b/content/day2/01_opening.md @@ -0,0 +1,32 @@ +--- +title: Opening Keynote +weight: 1 +--- + +The opening keynote started - as is the tradition with keynotes - with an "motivational" opening video. +The keynote itself was presented by the CEO of the CNCF. + +## The numbers + +* Over 2000 attendees +* 10 Years of Kubernetes +* 60% of large organizations expect rapid cost increases due to AI/ML (FinOps Survey) + +## The highlights + +* Everyone uses cloudnative +* AI uses Kubernetes b/c the UX is way better than classic tools + * Especially when transferring from dev to prod + * We need standardization +* Open source is cool + +## Live demo + +* KIND cluster on desktop +* Protptype Stack (develop on client) + * Kubernetes with the LLM + * Host with LLVA (image describe model), moondream and OLLAMA (the model manager/registry() +* Prod Stack (All in kube) + * Kubernetes with LLM, LLVA, OLLAMA, moondream +* Available Models: llava, mistral bokllava (llava*mistral) +* Host takes picture, ai describes what is pictures (in our case the conference audience) diff --git a/content/day2/02_ai_keynote.md b/content/day2/02_ai_keynote.md new file mode 100644 index 0000000..52ce208 --- /dev/null +++ b/content/day2/02_ai_keynote.md @@ -0,0 +1,36 @@ +--- +title: AI KEynote discussion +weight: 2 +--- + +A podium discussion (somewhat scripted) lead by Pryanka + +## Guests + +* Tim from Mistral +* Paige from Google AI +* Jeff founder of OLLAMA + +## Discussion + +* What do you use as the base of dev for ollama + * Jeff: The concepts from docker, git, kubernetes +* How is the balance between ai engi and ai ops + * Jeff: The classic dev vs ops devide, many ML-Engi don't think about + * Paige: Yessir +* How does infra keep up with the fast research + * Paige: Well, they don't - but they do their best and Cloudnative is cool + * Jeff: Well we're not google, but kubernetes is the saviour +* What are scaling constraints + * Jeff: Currently sizing of models is still in it's infancy + * Jeff: There will be more specific hardware and someone will have to support it + * Paige: Sizing also depends on latency needs (code autocompletion vs performance optimization) + * Paige: Optimization of smaller models +* What technologies need to be open source licensed + * Jeff: The model b/c access and trust + * Tim: The models and base execution environemtn -> Vendor agnosticism + * Paige: Yes and remixes are really imporant for development +* Anything else + * Jeff: How do we bring our awesome tools (monitoring, logging, security) to the new AI world + * Paige: Currently many people just use paid apis to abstract the infra, but we need this stuff selfhostable + * Tim: I don'T want to know about the hardware, the whole infra side should be done by the cloudnative teams to let ML-Engi to just be ML-Engi \ No newline at end of file diff --git a/content/day2/03_accelerating_ai_workloads.md b/content/day2/03_accelerating_ai_workloads.md new file mode 100644 index 0000000..e225dfe --- /dev/null +++ b/content/day2/03_accelerating_ai_workloads.md @@ -0,0 +1,50 @@ +--- +title: Accelerating AI workloads with GPUs in kubernetes +weight: 3 +--- + +Kevin and Sanjay from NVIDIA + +## Enabeling GPUs in Kubernetes today + +* Host level components: Toolkit, drivers +* Kubernetes components: Device plugin, feature discovery, node selector +* NVIDIA humbly brings you a GPU operator + +## GPU sharing + +* Time slicing: Switch around by time +* Multi Process Service: Run allways on the GPU but share (space-) +* Multi Instance GPU: Space-seperated sharing on the hardware +* Virtual GPU: Virtualices Time slicing or MIG +* CUDA Streams: Run multiple kernels in a single app + +## Dynamic resource allocation + +* A new alpha feature since Kube 1.26 for dynamic ressource requesting +* You just request a ressource via the API and have fun +* The sharing itself is an implementation detail + +## GPU scale out challenges + +* NVIDIA Picasso is a foundry for model creation powered by Kubernetes +* The workload is the training workload split into batches +* Challenge: Schedule multiple training jobs by different users that are prioritized + +### Topology aware placments + +* You need thousands of GPUs, a typical Node has 8 GPUs with fast NVLink communication - beyond that switching +* Target: optimize related jobs based on GPU node distance and NUMA placement + +### Fault tolerance and resiliency + +* Stuff can break, resulting in slowdowns or errors +* Challenge: Detect faults and handle them +* Observability both in-band and out ouf band that expose node conditions in kubernetes +* Needed: Automated fault-tolerant scheduling + +### Multi-dimensional optimization + +* There are different KPIs: starvation, prioprity, occupanccy, fainrness +* Challenge: What to choose (the multi-dimensional decision problemn) +* Needed: A scheduler that can balance the dimensions \ No newline at end of file diff --git a/content/day2/04_sponsored_ai_platform.md b/content/day2/04_sponsored_ai_platform.md new file mode 100644 index 0000000..f935a04 --- /dev/null +++ b/content/day2/04_sponsored_ai_platform.md @@ -0,0 +1,22 @@ +--- +title: Sponsored: Build an open source platform for ai/ml +weight: 4 +--- + +Jorge Palma from Microsoft with a quick introduction. + +## Baseline + +* Kubernetes is cool and all +* Challenges: + * Containerized models + * GPUs in the cluster (install, management) + +## Kubernetes AI Toolchain (KAITO) + +* Kubernetes operator that interacts with + * Node provisioner + * Deployment +* Simple CRD that decribes a model, infra and have fun +* Creates inferance endpoint +* Models are currently 10 (Hugginface, LLMA, etc) \ No newline at end of file diff --git a/content/day2/05_performance_sustainability.md b/content/day2/05_performance_sustainability.md new file mode 100644 index 0000000..7ae0362 --- /dev/null +++ b/content/day2/05_performance_sustainability.md @@ -0,0 +1,16 @@ +--- +title: Optimizing performance and sustainability for ai +weight: 5 +--- + +A panel discussion with moderation by Google and participants from Google, Alluxio, Apmpere and CERN. +It was pretty scripted with prepared (sponsor specific) slides for each question answered. + +## Takeaways + +* Deploying a ML should become the new deploy a web app +* The hardware should be fully utilized -> Better ressource sharing and scheduling +* Smaller LLMs on cpu only is preyy cost efficient +* Better scheduling by splitting into storage + cpu (prepare) and gpu (run) nodes to create a just-in-time flow +* Software acceleration is cool, but we should use more specialized hardware and models to run on CPUs +* We should be flexible regarding hardware, multi-cluster workloads and hybrig (onprem, burst to cloud) workloads \ No newline at end of file diff --git a/content/day2/06_newsshow_ai_edition.md b/content/day2/06_newsshow_ai_edition.md new file mode 100644 index 0000000..e8434dd --- /dev/null +++ b/content/day2/06_newsshow_ai_edition.md @@ -0,0 +1,43 @@ +--- +title: Cloudnative news show (AI edition) +weight: 6 +--- + +Nikhita presented projects that merge CloudNative and AI. +PAtrick Ohly Joined for DRA + +### The "news" + +* New work group AI +* More tools are including ai features +* New updated cncf for children feat AI +* One decade of Kubernetes +* DRA is in alpha + +### DRA + +* A new API for resources (node-local and node-attached) +* Sharing of ressources between cods and containers +* Vendor specific stuff are abstracted by a vendor driver controller +* The kube scheduler can interact with the vendor parameters for scheduling and autoscaling + +### Cloudnative AI ecosystem + +* Kube is the seed for the AI infra plant +* Kubeflow users wanted AI registries +* LLM on the edge +* Opentelemetry bring semandtics +* All of these tools form a symbiosis between +* Topics of discussions + +### The working group AI + +* It was formed in october 2023 +* They are working on the whitepaper (cloudnative and ai) wich was opublished on 19.03.2024 +* The landscape "cloudnative and ai" is WIP and will be merged into the main CNCF landscape +* The future focus will be on security and cost efficiency (with a hint of sustainability) + +### LFAI and CNCF + +* The direcor of the AI foundation talks abouzt ai and cloudnative +* They are looking forward to more colaboraion \ No newline at end of file diff --git a/content/day2/_index.md b/content/day2/_index.md new file mode 100644 index 0000000..33edfa0 --- /dev/null +++ b/content/day2/_index.md @@ -0,0 +1,10 @@ +--- +archetype: chapter +title: Day 2 +--- + +Day two is also the official day one of KubeCon (Day one was just CloudNativeCon). +This is where all of the people joined (over 2000) + +The opening keynotes were a mix of talks and panel discussions. +The main topic was - who could have guessed - AI and ML. \ No newline at end of file