day 2 keynotes

2024-03-20 10:42:54 +01:00
parent a03f0c0da2
commit 3eb8eacd1d
7 changed files with 209 additions and 0 deletions
--- a/content/day2/01_opening.md
+++ b/content/day2/01_opening.md
@@ -0,0 +1,32 @@
 ---
 title: Opening Keynote
 weight: 1
 ---
 The opening keynote started - as is the tradition with keynotes - with an "motivational" opening video.
 The keynote itself was presented by the CEO of the CNCF.
 ## The numbers
 * Over 2000 attendees
 * 10 Years of Kubernetes
 * 60% of large organizations expect rapid cost increases due to AI/ML (FinOps Survey)
 ## The highlights
 * Everyone uses cloudnative
 * AI uses Kubernetes b/c the UX is way better than classic tools
  * Especially when transferring from dev to prod
  * We need standardization
 * Open source is cool
 ## Live demo
 * KIND cluster on desktop
 * Protptype Stack (develop on client)
  * Kubernetes with the LLM
  * Host with LLVA (image describe model), moondream and OLLAMA (the model manager/registry()
 * Prod Stack (All in kube)
  * Kubernetes with LLM, LLVA, OLLAMA, moondream
 * Available Models: llava, mistral bokllava (llava*mistral)
 * Host takes picture, ai describes what is pictures (in our case the conference audience)
--- a/content/day2/02_ai_keynote.md
+++ b/content/day2/02_ai_keynote.md
@@ -0,0 +1,36 @@
 ---
 title: AI KEynote discussion
 weight: 2
 ---
 A podium discussion (somewhat scripted) lead by Pryanka
 ## Guests
 * Tim from Mistral
 * Paige from Google AI
 * Jeff founder of OLLAMA
 ## Discussion
 * What do you use as the base of dev for ollama
  * Jeff: The concepts from docker, git, kubernetes
 * How is the balance between ai engi and ai ops
  * Jeff: The classic dev vs ops devide, many ML-Engi don't think about
  * Paige: Yessir
 * How does infra keep up with the fast research
  * Paige: Well, they don't - but they do their best and Cloudnative is cool
  * Jeff: Well we're not google, but kubernetes is the saviour
 * What are scaling constraints
  * Jeff: Currently sizing of models is still in it's infancy
  * Jeff: There will be more specific hardware and someone will have to support it
  * Paige: Sizing also depends on latency needs (code autocompletion vs performance optimization)
  * Paige: Optimization of smaller models
 * What technologies need to be open source licensed
  * Jeff: The model b/c access and trust
  * Tim: The models and base execution environemtn -> Vendor agnosticism
  * Paige: Yes and remixes are really imporant for development
 * Anything else
  * Jeff: How do we bring our awesome tools (monitoring, logging, security) to the new AI world
  * Paige: Currently many people just use paid apis to abstract the infra, but we need this stuff selfhostable
  * Tim: I don'T want to know about the hardware, the whole infra side should be done by the cloudnative teams to let ML-Engi to just be ML-Engi
--- a/content/day2/03_accelerating_ai_workloads.md
+++ b/content/day2/03_accelerating_ai_workloads.md
@@ -0,0 +1,50 @@
 ---
 title: Accelerating AI workloads with GPUs in kubernetes
 weight: 3
 ---
 Kevin and Sanjay from NVIDIA
 ## Enabeling GPUs in Kubernetes today
 * Host level components: Toolkit, drivers
 * Kubernetes components: Device plugin, feature discovery, node selector
 * NVIDIA humbly brings you a GPU operator
 ## GPU sharing
 * Time slicing: Switch around by time
 * Multi Process Service: Run allways on the GPU but share (space-)
 * Multi Instance GPU: Space-seperated sharing on the hardware
 * Virtual GPU: Virtualices Time slicing or MIG
 * CUDA Streams: Run multiple kernels in a single app
 ## Dynamic resource allocation
 * A new alpha feature since Kube 1.26 for dynamic ressource requesting
 * You just request a ressource via the API and have fun
 * The sharing itself is an implementation detail
 ## GPU scale out challenges
 * NVIDIA Picasso is a foundry for model creation powered by Kubernetes
 * The workload is the training workload split into batches
 * Challenge: Schedule multiple training jobs by different users that are prioritized
 ### Topology aware placments
 * You need thousands of GPUs, a typical Node has 8 GPUs with fast NVLink communication - beyond that switching
 * Target: optimize related jobs based on GPU node distance and NUMA placement
 ### Fault tolerance and resiliency
 * Stuff can break, resulting in slowdowns or errors
 * Challenge: Detect faults and handle them
 * Observability both in-band and out ouf band that expose node conditions in kubernetes
 * Needed: Automated fault-tolerant scheduling
 ### Multi-dimensional optimization
 * There are different KPIs: starvation, prioprity, occupanccy, fainrness
 * Challenge: What to choose (the multi-dimensional decision problemn)
 * Needed: A scheduler that can balance the dimensions
--- a/content/day2/04_sponsored_ai_platform.md
+++ b/content/day2/04_sponsored_ai_platform.md
@@ -0,0 +1,22 @@
 ---
 title: Sponsored: Build an open source platform for ai/ml
 weight: 4
 ---
 Jorge Palma from Microsoft with a quick introduction.
 ## Baseline
 * Kubernetes is cool and all
 * Challenges:
  * Containerized models
  * GPUs in the cluster (install, management)
 ## Kubernetes AI Toolchain (KAITO)
 * Kubernetes operator that interacts with
  * Node provisioner
  * Deployment
 * Simple CRD that decribes a model, infra and have fun
 * Creates inferance endpoint
 * Models are currently 10 (Hugginface, LLMA, etc)
--- a/content/day2/05_performance_sustainability.md
+++ b/content/day2/05_performance_sustainability.md
@@ -0,0 +1,16 @@
 ---
 title: Optimizing performance and sustainability for ai
 weight: 5
 ---
 A panel discussion with moderation by Google and participants from Google, Alluxio, Apmpere and CERN.
 It was pretty scripted with prepared (sponsor specific) slides for each question answered.
 ## Takeaways
 * Deploying a ML should become the new deploy a web app
 * The hardware should be fully utilized -> Better ressource sharing and scheduling
 * Smaller LLMs on cpu only is preyy cost efficient
 * Better scheduling by splitting into storage + cpu (prepare) and gpu (run) nodes to create a just-in-time flow
 * Software acceleration is cool, but we should use more specialized hardware and models to run on CPUs
 * We should be flexible regarding hardware, multi-cluster workloads and hybrig (onprem, burst to cloud) workloads
--- a/content/day2/06_newsshow_ai_edition.md
+++ b/content/day2/06_newsshow_ai_edition.md
@@ -0,0 +1,43 @@
 ---
 title: Cloudnative news show (AI edition)
 weight: 6
 ---
 Nikhita presented projects that merge CloudNative and AI.
 PAtrick Ohly Joined for DRA
 ### The "news"
 * New work group AI
 * More tools are including ai features
 * New updated cncf for children feat AI
 * One decade of Kubernetes
 * DRA is in alpha
 ### DRA
 * A new API for resources (node-local and node-attached)
 * Sharing of ressources between cods and containers
 * Vendor specific stuff are abstracted by a vendor driver controller
 * The kube scheduler can interact with the vendor parameters for scheduling and autoscaling
 ### Cloudnative AI ecosystem
 * Kube is the seed for the AI infra plant
 * Kubeflow users wanted AI registries
 * LLM on the edge 
 * Opentelemetry bring semandtics
 * All of these tools form a symbiosis between 
 * Topics of discussions
 ### The working group AI
 * It was formed in october 2023
 * They are working on the whitepaper (cloudnative and ai) wich was opublished on 19.03.2024
 * The landscape "cloudnative and ai" is WIP and will be merged into the main CNCF landscape
 * The future focus will be on security and cost efficiency (with a hint of sustainability)
 ### LFAI and CNCF 
 * The direcor of the AI foundation talks abouzt ai and cloudnative
 * They are looking forward to more colaboraion
--- a/content/day2/_index.md
+++ b/content/day2/_index.md
@@ -0,0 +1,10 @@
 ---
 archetype: chapter 
 title: Day 2
 ---
 Day two is also the official day one of KubeCon (Day one was just CloudNativeCon).
 This is where all of the people joined (over 2000)
 The opening keynotes were a mix of talks and panel discussions.
 The main topic was - who could have guessed - AI and ML.