Episode 4: Geek Out with Brian Durden - Part 2: Tactics & Implementation: Kubernetes at the Tactical Edge
Welcome to the fourth episode of our new podcast, Geek Out. Pete Tseronis, our host, discusses how to implement Kubernetes at the Tactical Edge with Brian Durden, Staff Solutions Architect at Rancher Government.
Listen Now
Interested in learning more about our secure by default Kubernetes solutions?
Episode Transcript
Pete Tseronis: |
Hi, this is Pete Tseronis with part two of my conversation with Brian Durden, staff solutions architect at Rancher government. In part one, Brian helped us understand Kubernetes versus containers, their role at the tactical edge and the connection to edge computing. Today we tackle tactics and implementation, including with AI and machine learning ops, as Brian explains what it takes to implement Kubernetes at the tactical edge. Bringing it back to maybe the buzzwords in most parts of the world, artificial intelligence, anybody who knows it's here, the integration of it. Can you speak to that integration that is important and critical and how Kubernetes can support it and specifically Rancher’s solutions? |
Brian Durden: |
Yes, absolutely, and AI is definitely one of my favorite subjects. There's several parts to that. So the beauty of Kubernetes, and I'm a Kubernetes zealot and I'm sure everyone will agree with that, that knows me. Kubernetes sees things like GPUs and memory and compute and things like that. It sees it all as the same thing. It's just another resource. And using a combination of labels and also resource counts, Kubernetes can intelligently figure out where workloads need to run. So when you create a Kubernetes cluster, which is comprised of say, we'll say 10 nodes as an example, maybe two of those nodes have GPUs attached to them that can do advanced AI modeling and running through tensors and PyTorch and things like that to create models. When you run workloads on this cluster, not all of them may want to consume those GPUs. They may just be regular old services, maybe it's Prometheus that's exporting metrics and that kind of thing. But if you are running GPUs, you want your containers that need those resources to be mapped to the nodes that have the GPUs available locally. So Kubernetes in general provides this capability, just kind of a natural way and in a pattern. And because of the way Rancher is set up, we ensure that the mission of our customers in the US government are the first class citizens for what drives our priorities and things like that. So we know AI and ML and MLOps are right there at the top list of the priority. So things like that are kind of baked into the way things work. So as an example, RKE2, which is our Kubernetes distribution, was formerly called RKE Government. We developed it in collaboration with platform one within RKE2, well within normal Kubernetes typically to set up the NVIDIA drivers and the operator and everything to map those GPUs to the Kubernetes nodes is normally this, I wouldn't call it a Rube Goldberg machine, but there's some steps in there. RKE2 makes it exceedingly simple. All you have to do is have RKE2 up and running on top of a node at the GPU, install the NVIDIA operator, and bam, you're done. It automatically does all the applications in the rooming potatoes under the eight is making all the Container D configurations that have to happen so that it's able to use the NVIDIA runtime as opposed to the normal container D runtime that's all automatic and that's baked into RKE2 and you get that for free. It's just a feature that comes with it. That's kind of an example of where we're going. And to go further on that, one of the other things that we're trying to do is I'm in a routine conversation with what I call, I don't want to call them think tanks, but they're out there on the forefront and they know MLOps really well. And I have learned over the last six months to a year that the word MLOps is another one of those buzzword bingo words that gets tossed around that a lot of folks really don't understand. They think MLOps is just running a model and getting an output from an AI perspective, and it's a lot more complicated than that. The best analogy that I can describe is when we look at CICD pipelines, say for applications that are needing to get built, the apps are built, they're tested, maybe it broke the test, it has to repeat the build again and then feed it through the pipeline, and then it goes into either a bucket or a waiting lane or something like that, and it gets pulled into a cluster to be run, either maybe in staging or production. Pipelines can get very complicated. MLOps in the same way, can get equally or even more complicated because what happens is these processes, they're feeding all kinds of sensor data. So if we look at JADC2 and the way that's going to work, they're pulling sensor data from all over the place and they're bringing that data down into their Kubernetes clusters. They're crunching it through GPUs and creating model data based upon whatever they're analyzing. Well, those models, they're not only not static, because they may evolve over time, but those models themselves can feed into other models. So now you have not only this parallel model of creation, but now you have this any mapping to where models can feed into models and then create certain types of output that's fed into whatever is consuming information, whether it's general purpose AI or it's very specific filtered information. That's MLOps. That's the kind of thing that we're trying to embrace and enable with the Rancher stack. And we have partners that are already running technology and running applications that can do these types of things. Yes, great question. I very much enjoyed this topic. |
Pete Tseronis: |
Yes, well, it's obvious and the past and it's infectious, my friend, and you taught me a couple things there, which has 10 questions down my brain. We don't have enough to answer that, but let's take that AI and the promise and the vision, and that's clearly something in your DNA and at Rancher as well. You're where is the puck going to use that sports analogy, and we talked about this recently with some folks that I was with in government around the internet of battlefield things. You hit on sensors. You have talked about analytics, I think of decisions that need to be made in milliseconds. Think of the power grid, right? Think of our war fighter, of course, think of transportation systems, planes, so they don't intersect in the air, but the use of IoT to connect and integrate various military systems, sensors and devices to provide real-time data and enhanced situational awareness really speaks to me, and I'm just reading from some of my notes, is a benefit and all of what you've described, the architecture, and that's where I want to go with the question is how hard is this to implement? I mean, Brian, you got to have a partner. It's not just plug and play. You're good to go. What are the tips? Top 3, 4, 5, 1, 2? If you're thinking about this as a federal agency or as defense industrial based entity, what do you have to think about to implement Kubernetes and take advantage of all the benefits you just described? |
Brian Durden: |
That’s a great question. The very first thing I would say or do is you really need to scope your problem because if you're walking into a situation and you say, I want to solve everything, you're not going to get anywhere. I think outside of OpenAI the company, they're really the only ones that can do this AI thing at that level and just go in and do everything and anything. We all know Chat GPT, if you're building something specific, you got to scope the problem first. Then once you scope it, you can define it into outcomes and values about the thing that you want to build. And then from then on, you can begin defining the kind of compute and say GPU resources that you're going to need to be able to deliver that and kind of slice it up into say, harder requirements versus software requirements. And the further we break this apart, where are you going to get that data from? GPU based workloads need good data. Is this just a data lake sitting in the cloud and you're going to consume it? Then you may not need an edge. But if you are out on the edge, the very far edge disconnected environments, you need something that's kind of self-sustaining, such as the HCI Edge solution that we've been talking about a few times. Being able to use GPUs to say pre crunch and pre-filtered data out on the edge prior to feeding it back to HQ, like when you're connected, say to the mothership or whatever you want to call it, and then send it back kind of pre-filtered. That kind of thing can become very important, because the volume of data that is generated out on the edge by all of these sensors and everything that's incoming at once, that all needs to be fed into these AI models, it needs to be kind of filtered down. Honestly, we just don't have that kind of bandwidth. So it really depends on scope in the problem, defining the values and the outcomes. So you're staying within your lane as a producer of products and solutions and then deciding where the data is pulled from, where it's going, how it's going to be consumed, things like that. So it's really a scoping problem first to circle back. Because if you walk in and you try to solve everything at once, you're just going to spin your wheels. And no one's really a Tony Stark here. It's a mythical beast. We're all better together. As long as we kind of focus on what we're good with, we can all deliver better products. |
Pete Tseronis: |
I appreciate that. I always think of it as a former federal executive hoping that an industry partner would come in and say not, “Here's what we can do for you.” The first question would be, “What problem are you trying to solve?” We do not have cookie cutter, so thanks for letting me put on my old feeling what you're saying, which is very, very much a lot of awesomeness and I appreciate that scope, the problem, the mission is what matters. And we at Rancher can help you with that custom, if you will, solution to meet your specific requirement. Before we do what I love in that parting shot, close us out. We can talk for hours. I just want to emphasize to the audience again, and feel free to just chime in or riff off this, Brian, is that you talked about scoping and our military, our forces from communications to networking, intelligence surveillance, cyber warfare, electronic warfare, having autonomous systems, folks, this, and these are the type of capabilities the technology can underpin and help save lives. Our defense industrial base is a critical infrastructure sector in everything that Brian has shared today in terms of capability, working with the right partner, scoping the problem and coming up with a solution to consider is something that clearly I could tell Brian, you and your colleagues at Rancher do seven days a week and twice on Sunday. So that was my long-winded way of letting you think about what your parting shot is. But any comment on that before I ask you to close this out here? |
Brian Durden: |
Alright, so there's a lot there. The thing is, one of the problems that I always ran into as an engineer, and I was talking with one of my fellow engineers about this problem just earlier. As a senior engineer, you're used to being the person that solves these problems on your own, because you have to be that guy or girl that is able to be self-reliant and self-sustaining because you have no backup. If something goes wrong or something's not working, you can't go talk to antibody, you got stack overflow and maybe chat GPT now or something. But that's really it. So when you're the senior person on staff, you're the one that people go to for the backup to solve the problem. And when you don't have one, you've just got to figure it out. So it doesn't have to be that way. There's so much specialized knowledge in this industry, and even with Kubernetes, as niche-y as it is, there's so many different lanes of work and so many talented folks that are out there. So it's not just from rancher, but say ML lops for instance. ML lops can be a very complex thing that some companies want to kind of own themselves, but they're not in the AI space yet, and so they don't understand a lot of the best practices around it, and they may be not developing the best product. There are several great companies out there that can do this kind of stuff, and they are much better at these things than I am. In fact, they're far more eloquent about describing the problems and describing the solutions that I can because they know the topic better and that makes sense, but you can only learn so much. Nobody's a Tony Stark, as I said before in this industry. I'm sure there's some folks that think they are, but you got to lean on your friends. That's my takeaway. |
Pete Tseronis: |
Go Ironman. |
Brian Durden: |
So bring your partners into stuff. Let's all go solve this problem together because like rancher, we have a great product stack. There's a lot of different pieces to the layer cake that we kind of bring into the table, but we are very cognizant that we're a small cog in a larger machine, a bigger enterprise with bigger solutions, and we are not all going to solve this problem alone. We need to lean on each other to be able to build the mission solution for the customer because the mission is more important at the end of the day. Not everybody has to go do this alone is the point we're trying to make. |
Pete Tseronis: |
Wonderful. And it takes a village and the emphasis, and by the way, your humility precedes you. Tony Stark was not a one man band. We know that. But Brian, I really appreciate that, again, and appreciate your knowledge and your wisdom. For me, my parting shot is simply, Hey, if I'm listening to this or watching this, I'm thinking open government reigns supreme at Rancher Government being participatory, collaborative and transparent. Public-private partnership works, and you can't go it alone. So I want to thank you for very intentional or not expressing that because it resonated with me. And again, I was a former CTO in the federal government at two cabinet level agencies, and this really speaks to your mantra of being a partner. So do you have any parting shock, closeout comment, something you want to leave with the audience? |
Brian Durden: |
Yes, so just a quick one. So when we're talking about our HCI Edge solution, and if you've not read the white paper, I highly recommend it. If you've had to solve these problems before, in a nutshell, what we're able to do is go one layer deeper than what typical automation allows for these environments because of what Harvester is and the place it is positioned. So we can now deploy an entire infrastructure and application layer from a single YAML file. You can describe an entire environment as a whole as code, which is exceptionally powerful for a lot of different reasons, and it would take an hour just to go talk about all the benefits that actually would deliver. But imagine being able to define your entire environment from code, and I don't mean just typical cloud stuff that you might throw in AWS like, Hey, here's my virtual machines. Here's some services I'm spinning up. I'm talking about everything from the bare metal in a data center or out on an edge device. That's a very compelling thing, I think. And that's the kind of idea that we're trying to leverage and enable. And if you want to see this demonstrated, please talk to me. I would love to talk your ear off about this topic. I'm very passionate about the technology. I love diving in deep and just showing everybody everything. I'm a big open source and open standards proponent, so I'm all about cooperation and learning new stuff. Sure, there's all kinds of things that people can teach me as well. I'm very excited about the topic, so please reach out to me. |
Pete Tseronis: |
Thank you. Brian Durden, Staff Solutions Architect, Rancher Government. I can't wait to talk to you again soon, my friend. |
Brian Durden: |
Thanks, Pete. I really appreciate it. |