Podcast: Break Things on Purpose | Ep. 10: Kelsey Hightower, Principal Developer Advocate at Google

Break Things on Purpose is a podcast for all-things Chaos Engineering. Check out our latest episode below.

‍

‍In this episode, we speak with Kelsey Hightower, Principal Developer Advocate at Google.

Transcript of Today's Episode

Rich Burroughs: Hi. I'm Rich Burroughs, and I'm the Community Manager at Gremlin.

Jacob Plicque: And I'm Jacob Plicque, a Solutions Architect at Gremlin, and welcome to Break Things on Purpose. A podcast about Chaos Engineering.

Rich Burroughs: Welcome to episode 10. In this episode, we speak with someone you might have heard of, Kelsey Hightower from Google. We had a great conversation with Kelsey about Kubernetes and Chaos Engineering. Jacob, what stood out to you from our chat with Kelsey?

Jacob Plicque: So first of all, 10 episodes is amazing, so thanks everyone for listening. And secondly, I quite enjoyed it. I expected us to get super deep into Kubernetes, which we did, but we ended up talking a lot about empathy, which I find to be super important in what we do. Then a bit about diversity and inclusion, which I really appreciated. What about you?

Rich Burroughs: Kelsey's a real infrastructure geek, and I'm always really happy to nerd out with him. I loved to take on serverless computing and also his thoughts about failure injection. He's a very well-rounded person when it comes to infrastructure, and I think that's due to his background as a system's administrator and developer. It was also great to hear his thoughts about how Kubernetes has evolved.

Jacob Plicque: Awesome. So, just as a reminder, you can subscribe to our podcasts on Apple Podcasts, Spotify, Stitcher, and other podcast players. Just search for the name Break Things on Purpose.

Rich Burroughs: Great. Let's go to the interview

Rich Burroughs: Today we're speaking with Kelsey Hightower. Kelsey's a Principal Developer Advocate at Google. Welcome.

Kelsey Hightower: Yo, I'm happy to be here, looking forward to this podcasting with friends.

Jacob Plicque: Yeah, absolutely. Super excited to have you on. So, before you were at Google, you spent some time at CoreOS, and Puppet before that. Can you talk us through that journey, both beginning and kind of into tech that led you to where you are at Google?

Kelsey Hightower: Yeah. So, both of those companies, I probably grew the most in my career leading up to this point. So, if we start with Puppet Labs, that's kind of what brought me to Portland. Working for Puppet, I kind of worked on the open source dev team and we were working on some of the core components of Puppet. Helped start what is called the Forge now. So working on things like the Puppet module tool. So for all those people that have experienced with the public configuration management tool, that was kind of that tool we use to make it easy to download and discover Puppet modules.

Kelsey Hightower: The thing that was very interesting about Puppet is when I got introduced to this open source community, what it's like to build something that both your customer and your community are one and the same, and how do you manage that relationship between the two. And if we zoom a little forward in the future I think there's about three years apart between Puppet Labs and CoreOS, and at CoreOS, I really spent a lot of time learning all about this thing called distributed systems, right, etcd, a lot of this containerization, and I really got this crash course on distributed systems, what later help serve me in the Kubernetes community.

Rich Burroughs: So you and I both worked at Puppet but at very different times. You were there early on when it was a lot more cutting edge and I was there much later. I wonder what you took away from your time working with those configuration management tools that maybe applies to things today.

Kelsey Hightower: Yeah, so Promise Theory, so this idea of desired state and converging on that desired state, and I was really introduced to that even before joining Puppet. I was a customer of Puppet before joining and that whole premise theory is really the underpinning for a lot of other tools like Terraform, crowd formation, Kubernetes, Istio, all of these things where you give a declared a config, and allow the platform to do the heavy lifting.

Rich Burroughs: Yeah, I've talked about the similarity to Terraform a lot. Terraform is basically the same exact model except it doesn't try to be cross system in the way the Puppet does. So Puppet as you know, has this ability to declare a resource that applies across different operating systems and stuff, and Terraform, it's all just specific to whatever cloud provider or system you're working with.

Kelsey Hightower: I would say also looking back though, now that I've had a chance to reflect on the whole configuration management space, Puppet was less about abstractions than it was about automation, right. It really leaned into infrastructure as code, and looking back, I think it was a big step up from writing bash scripts, but at the same time it didn't really introduce any true abstractions because we were still talking about files, and packages, and servers, and we never really thought of some other higher level set of objects like you saw with Docker, right? Docker kind of changed the way we think about application packaging, and was the first time I seen the application start to separate itself from the machine in the way that we didn't have to give machines a dedicated role. All machines were kind of created equal.

Rich Burroughs: Interesting. So I think you were at CoreOS when I first saw you first speak about Kubernetes, this was in 2015 at this little conference called AutomaCon, if I'm pronouncing right. It was here in Portland. It was a small conference. And you give that talk where you use the analogy of Tetris, and you were actually playing Tetris during the talk and you were talking about the fact that in this new world with Kubernetes, the hosts were suddenly going to just become blocks or resources, right? So it's just a bunch of compute, and a bunch of memory, and we don't care about these individual hosts anymore. So now we're like four years later, and I'm wondering what you think about how Kubernetes has delivered on that.

Kelsey Hightower: Yeah, I think for certain workloads, that is the case that most people experience, right? This resource scheduler, more people are getting comfortable with this idea that you can use these declarative, most people think about them YAML files, but now people are really kind of comfortable with like, here's my workload, run it for me. And I think where we are now is how many of those workloads can we run that way, right? Kubeflow is attempting to do this for machine learning workloads. We have things like Istio that are trying to build a networking control plane. So I think the real test now is now that we're comfortable with that, what other workloads fit that model?

Rich Burroughs: So there definitely are some that don't, right? Because we do see people ending up doing things like having to pin workloads to specific nodes, things like that.

Kelsey Hightower: I would say in that case, I caution people. Running Kubernetes on MySQL doesn't turn it into RDS.

Jacob Plicque: That's awesome.

Kelsey Hightower: So I think people have to be real clear that Kubernetes alone isn't the thing that just makes everything magical. The key here, though, is once you make a decision, right? For example, the best way to run some of these databases is to pin them to specific nodes, use local stores for performance, back it up and ensure nothing else runs on those dedicated machines, because you want to make sure you have a little bit of head room in case there's any spikes or bursts.

Kelsey Hightower: Now in the Kubernetes world, you can articulate that by using things like a stateful set. Maybe you hard code the node name instead of letting the scheduler figure it out. You can use anti-affinity policies to make sure nothing else runs there, and you know what, that's okay. Everything doesn't have to have the ability to float to any machine. So I kind of look at Kubernetes as a way of saying once you make your own independent decision for a set of applications, then you can use Kubernetes to describe that, and enforce that.

Jacob Plicque: That's actually really interesting because I'm a fairly new to Kubernetes as a matter of fact. So I've been at Gremlin for a little under a year and a half, and during the interview process, even during the job description, it was kind of a small bullet point about Kubernetes, I was kind of like, "Okay, I've never built one before, so let me just go and do that." And then, so this was, of course before I really dove into, okay, Amazon can, or Google, or Azure can kind of take care of the control plane for me, and I'll focus on my application, and the nodes, and scheduling of pods, and stuff like that.

Jacob Plicque: And I remember going through just a big kind of deep dive, and I was like, "Man, it's not that it's hard, it's just a lot.” And so I'm curious as to other kind of not maybe pain points, but as folks kind of move towards Kubernetes what you've seen because I think there's definitely a lot of folks that I think are not only just interested, but are migrating to Kubernetes maybe just to say that they are on it, to kind of catch up for lack of a better term. So I'm curious as to what you see in out in the field.

Kelsey Hightower: Yeah. So let's talk about the thing that being hard for first, right? Like riding a bike is hard. Have you ever seen someone who's never ridden a bike before?

Jacob Plicque: Not great. It doesn't look, it looks pretty-

Kelsey Hightower: They literally just crash and burn. They don't understand the balance required. They don't understand how the bike is actually set up so that they get on and in their brain they do remember watching someone else ride a bike, how hard could it be? So they hop on, they wobble, and then they fall over. And let's think about someone riding a motorcycle for the first time. And I can remember a buddy of mine had a really nice motorcycle and he was like, "Hey, you want to try it?" Now here's the thing, I don't know anything about this manual transmission, clutches, and none of this stuff. So I get on like, "Okay, let's try it." So I get on this bike and I go around the block. But here's the thing, I don't know how to stop.

Kelsey Hightower: So I'm on this death machine. I am going to die on this motorcycle while people are watching me going in circles. And then I managed to finally slow down, and people helped me get off. But it's hard the first time as well. So I think when most people look at Kubernetes, we have to separate how hard it is from just being simply unfamiliar, right? Because I'm fairly familiar with all of the components in Kubernetes, and I don't necessarily find it hard, but I do find that a lot of the complexity we've moved from undefined areas. The thing is whether people know it or not, you have all of the same complexity in your current system. It's just undefined. Some of the state of what should be running just lives in your head, or God forbid, a spreadsheet.

Kelsey Hightower: The fact that when one machine dies, because guess what? A machine is going to die. Something or someone is responsible for moving it to another machine. So all of the inherent complexity is scattered all over the place. Now what Kubernetes does-

Rich Burroughs: We're the schedulers, right?

Kelsey Hightower: Yeah, we're the meat cloud, right? The humans schedulers that are literally just trying to make sense of all this. But here's the thing, what Kubernetes does is just makes all that complexity fairly explicit, and it's that explicit nature of the complexity is what makes people face it for the first time head on, and now they have something to talk about, because they can see it.

Rich Burroughs: So, but you have done the thing at least a couple times, I've seen you talk about it where you have some of the Kubernetes developers actually go through and set up a cluster from scratch just like a normal end user would do. Not using any of the scripts or tools that they might have. What kinds of things have you found from doing that exercise?

Kelsey Hightower: Yeah, so what you're describing is what I call empathy sessions. And it's interesting, empathy sessions is what I learned at Puppet Labs. Nigel, I think acting CIO, or he was one of the leaders there at Puppet, and I remember there was a thing where we were hiring so many new people at Puppet. None of them had any experience with Puppet, and we had to just call this like emergency kind of session where we wanted people to start using Puppet, it was right around the holidays. I remember every engineer was tasked with using Puppet for real because we needed some empathy with our users, right? Because people were making decisions really out of scope of what an average system administrator, the target user had in mind. So when I got to Google, I recalled some of those, and I've done some of those in the past and modified it my own way.

Kelsey Hightower: So what the empathy sessions are is, “Hey you believe that Kube up that kube-up.sh is all that people need, and we just keep adding more flags to that. But I think there's more to the story.” So it's not just about installing Kubernetes, because there's so many options and configs, but it's also about managing it post-install. So years ago, when we did this for the first time, a lot of people found out that it was very challenging just to even get Docker installed, and get all the right versions installed. So kubeadm was born, a command line tool to try to make it easy. And I also contended that it wasn't just automation, right? It's nice to have a tool that you can just call to bring things up. But I thought it was equally as important to write Kubernetes the Hard Way so people can learn how everything fits together.

Rich Burroughs: So Nigel, who you mentioned, is Nigel Kersten, who's a fantastic individual. I just want to take a second and say that he's somebody who's had a really big influence on my career, gave me a lot of great advice. Now you've said in the past, I've seen you in several of your talks that you've given, talk about the fact that Kubernetes isn't the end goal, that we would build things on top of it, and that those were the things that you were the most interested in seeing. I've been sort of interested in the fact that that doesn't really seem to have come to being as much as I thought it would, that people are still very focused on Kubernetes itself as the thing that's delivering value.

Kelsey Hightower: And I think that's okay, because I do have a lot of empathy for where people are coming from. People were coming from scripts and configuration management tools where people are running these commands to deploy stuff. And in most people's current environment, that's all those tools actually do. They kind of deal with initial deployment, maybe some basics around lifecycle management, and then when they see Kubernetes kind of fill in some of those gaps, that's when they start to really get excited and say, "You know what? This is so much more than I had before. Right now I have a common way of packaging applications and have a common way of describing their deployment, their dependencies, load balancers and discovery. This is all I need, right? I've never seen this before, so let's just give it to everyone as fast as possible." And then they start to realize that once you start handing out kubectl, it reminds me of the world where we used to give everyone SSH to the server.

Jacob Plicque: Right.

Kelsey Hightower: Now people start having way too many permissions. They believe that the command line tool is how you do everything, and here's the thing that's a good place to get. If you can get to the point where you can describe everything that needs to happen, you should definitely celebrate that. That's a good checkpoint to get to. But it isn't the end, because what we want now is to start to think about the workflow. So the workflows around security, policy management, configuration management.

Kelsey Hightower: Some people think about GitOps, I like to call it, infrastructure as data. But what we really want to do is hide Kubernetes behind the ultimate workflow. So if your development teams checking some code, maybe a little bit of metadata, then what you should do is have your maybe CI/CD tool of choice orchestrate and implement the workflows accepted by the company, and Kubernetes just becomes a last mile, not necessarily the focus of attention.

Jacob Plicque: So at the back end it’s the true hero, but maybe the developers they don't necessarily care as long as it's running the way that they expect it to.

Kelsey Hightower: Yeah. I think of Kubernetes as being like the mech suit for Operations, right? If I want to have good logging, visibility, metrics, audit logging, all of these things, Kubernetes just makes it so much easier for me to attempt to build that, and focus on those areas in the low level infrastructure. And in turn when I have all that stuff in place, that means everybody else gets to focus on building their apps, and checking them in.

Rich Burroughs: I remember when I saw that first talk of yours, actually the Tetris one, the thing that leapt out to me the most is you spent some time talking about the service primitive, and the things like the health checking and all of that. And what really resonated with me is that those were patterns that teams that I was on had already been using, right? We were already doing that same kind of health checking, but we had to invent it ourselves, and with Kubernetes it was just baked in the platform, and there wasn't going to be an argument about how it was going to work. It's just there and you get it for free.

Kelsey Hightower: That is the biggest benefit of Kubernetes over the last five years, is that now instead of everyone rediscovering all of these things and techniques on their own, we finally have a way to talk about distributed systems, these health check patterns, all of this stuff with very common vocabulary. So no matter where people are coming from, they can show up with their experience and say, "Yeah, we do health checks this way, but here's how I get value out of health checks." And I think that is the key takeaway.

Rich Burroughs: Yeah. We used to actually reinvent those wheels multiple times, right? So I was at a shop where we were doing Java, and we had a Java library that did all the health checking, and suddenly an engineer shows up who wants to build it in Erlang, and there's not a spec. And so they're just reinventing the wheel, and we're having to work with them for weeks, trying to get the health checking working correctly and to me, just being able to avoid those kinds of disagreements and re-litigating something that you've already built, and just have it be there, and codified into the platform is a really big advantage.

Jacob Plicque: So I guess from there, maybe longterm, or the next two, three, even four years where, what do you think is the next steps for the future of Kubernetes?

Kelsey Hightower: As Kubernetes gets even easier to use people will start to build, and they already have started building platforms on top of it. So I expect a world where... Let's look at one of the VMware announcements in the last year. And VMware wants to replatform their control plane on top of Kubernetes, and offer Kubernetes-style APIs. So that means not just creating containers but now you'll be able to create virtual machines, probably VLANS, and storage blocks, and allotments all using this kind of declarative config that looks a lot like the normal Kubernetes way of doing things.

Kelsey Hightower: So when you start to think about Kubernetes as this uber control plane layer, it's like the Ruby on Rails for control planes. It's very opinionated in its style and in its approach, you get a lot of machinery for free. You get API clients and command line tools, and then you get to just focus on building your own operators, aka control loops, that then parse that Kubernetes configuration and make things happen. So if we think about such a short timeline, four or five years, my expectation is a lot of us are going to leverage Kubernetes control plane components to start building very powerful infrastructure management tools using this new style.

Jacob Plicque: So is it kind of taking it out of the control plane and kind of putting it a little closer to the last mile, if you will?

Kelsey Hightower: So think of it this way. If you're going to build Cloud Formation from scratch.

Jacob Plicque: Ooh, that does not sound fun.

Kelsey Hightower: Yeah, it doesn't sound fun. But you would need to understand all the backend resources, right? How to create virtual machines, and load balancers, and all of these things. And then you need to have a configuration language on the front, right? So a lot of people write Cloud Formation in JSON or YAML, and then you have to have something that can parse that and then verify permissions to see if the caller can even make those kinds of changes. And then once you have them, you need to store that configuration somewhere, right? Because that's the desired state.

Kelsey Hightower: And once it's stored, you need a way to coordinate access to that information as the various components behind the scenes converge to that desired state, and keep everything up to date. And then you have to go build out command line tools, maybe even extensions, templating engines, and so forth. So if you know it takes all of that to build any control plane and everything we just described is very typical of any control plane component, why start from scratch where you can just take the Kubernetes API server, etcd, and now you're like 50, 60% there to having most of the plane sorted out. And all you have to do, then, is use a custom resource definition to describe your API, and then implement your control loop that reads that. And now people find something very familiar and boom, you just built a new control plane.

Jacob Plicque: Nice. So I guess we'll have to see in 2023 on episode 3005 where things are

Kelsey Hightower: Yeah, well we'll probably do something like kube apply podcast, and then it's up and running.

Jacob Plicque: Yeah, exactly. So tying into, of course we're entitled Break Things on Purpose. So super curious from your work at Google, which is known for injecting big failures, like taking entire data centers offline. So what kind of benefits have you seen, or even observed, from that type of situation?

Kelsey Hightower: So one thing, so I used to work at a Google data center way back, right? And there were so many machines. So this is when I was in Atlanta, there were so many machines that something was always broken, right? There was never a day where a hard drive wasn't broken, or sometimes a disk array, a controller would be on fire, literally smoke and fire. And you had enough failure in the system for us to continuously learn from those, use that data to improve the system. And I think the nice thing kind about Gremlin, that software stack and that whole framework, is this idea that you can learn from failure. Now some people don't have enough opportunities to learn from failure. Meaning if you only have four or five machines or VMs on a cloud provider, and you may be lucky, right? Those things just stay up.

Kelsey Hightower: You have no major incidents, you're all in one zone, you're just lucky. But here's the thing, you're not learning anything, either. And that means when the crash does come at the worst time, you're going to be scrambling to figure it out. So I think Google kind of naturally has those opportunities to improve the system. Being able to on purpose interject that. So what we're seeing now is in various projects we implement failure part of the integration test. So if you're going to deploy something, what you might want to do is not just test the happy path, but what happens to the software if half of the machines lose network connectivity? Does it fail in expected ways? So I think that's super interesting.

Rich Burroughs: Yeah. We've talked to, on the show a couple of different people who have done chaos experiments on Kubernetes itself, or the components. They'll cause failures in an etcd cluster to see how it responds, and I think that that sort of thing is super interesting because you can read the docs, and read the explanation of how it's going to work, but to actually see it in action is a whole different thing.

Kelsey Hightower: I also think something interesting is going to happen from this. Right now when you read most documentation, they always assume things are going to go right. If you're lucky, you'll find a troubleshooting section at the bottom that attempts to capture common things, but wouldn't it be great if there was just an ongoing set of documentation that says, "When you run this kind of experiment, here's what we see 50% of the time." And that's the kind of stuff that I think people want to be able to study in the system, right.

Kelsey Hightower: I can imagine back in, if I would've had tools like this when I was kind of a full time system administrator, I could imagine if I had a safe place to operate with them, being able to study the system as part of the onboarding experience. Hey, welcome to the team. You're going to break some stuff, and then you're going to tell us what happens when it breaks. And if you're up for it, how should we correct and protect that going forward.

Rich Burroughs: And then the extra great benefit of that is in the course of that exercise you can potentially learn how your team does alerting, and incident response, and all of those things as well.

Jacob Plicque: It kind of circles right back into the empathy point, I think, really easily. And the fact that we're relying on these systems to do what they need to do so that the company's lights stay on and the bills get paid. But then also at the same time in some of those cases when things are going wrong, that engineer's getting woken up at four in the morning to turn the light back on, so to speak, right. So then it ties right back into the fact that, let's figure out what's going on at two o'clock in the afternoon. That way we're not waking anyone up at four in the morning.

Kelsey Hightower: Yeah. And I think as our industry matures, I think about cars. Cars go through a set of very sophisticated crash tests and they get crash test ratings to help educate the public on their safety. And when they're below a certain threshold, the car's not even allowed to be produced. At some point in the future, I believe some people are going to ask very similar questions like, “What happens when we lose the data for etcd?” Instead of saying, "Well, I don't know." I think what you're going to have to do now is be able to show like, "Hey we've gone through a set of tests, and yes we pass, and we continue to pass on an ongoing basis."

Rich Burroughs: Yeah. I think the Jepsen is something that I think of in a similar kind of light, and I think that ability to actually see the failures happen is pretty huge as opposed to just talking about it.

Jacob Plicque: And I think it kind of ties directly into going back to your point Kelsey, around just man I wish I had this as a, back in my sysadmin days, because I think that that kind of ties into them. I think the way that our industry is maturing too, in the fact that we didn't even think about failure back then, seven, eight, however long years ago it was. I can't even imagine even having a tool like any, not even just Gremlin, or just in general, because I was so busy looking at dashboards and, “All right, everything's green.”

Kelsey Hightower: And then maybe, I think what helps is the fact that Kubernetes is so declarative, the API contracts are so clear, you almost now know where to inject the failure, right? So without that, you're just flying blind, right? You just go into VM and maybe you reboot it, maybe you cut the networking, but now you could be a little bit more targeted in your experiments, right? You can now say, "You know what, just disconnect connectivity for these two applications. I don't care where they run."

Rich Burroughs: Yeah, there's two experiments that are my favorites to do. And those are messing around with latency because I think you can always learn really interesting things in distributed systems when you start injecting some latency, and we have an attack that we call the blackhole attack, which is one where you cut off connectivity for a specific service. And I think that you can find dependencies, hidden dependencies, and things like that when you suddenly take a service and say, "Hey, you can't connect to the internet, or to the other hosts anymore."

Kelsey Hightower: And I think for people, I'm pretty sure there's going to be people new to Gremlin and this whole idea of Chaos Engineering, if you're kind of struggling with the concept, some people I know have very limited exposure, and it's not necessarily the same thing, but I remember security scans used to be the same level of chaos, right? You would go and try to exercise known vulnerabilities, and literally take down production, and then you push the button like, "All right, stop. Okay. Obviously we can't handle that.” So let's start there, and you print out the PDF, you attach it to a ticket, and then people go off and learn from it.

Kelsey Hightower: So to me that was an early form of this Chaos Engineering where we would go and do these proactive scans to see where we stand. Now what we're doing is saying, "You know what? What if we introduced something that had a control plane that can do all kinds of things built right into it?” So now in the Kubernetes world, you can take that agent and stick it next to one application, or you can stick it right next to one server if you want to be able to control this at the server level, and then exposing that as a set of APIs that are repeatable, I think that's where like this whole new power comes from.

Rich Burroughs: Yeah. Agreed. Hey, speaking of empathy, you gave a really amazing talk at KubeCon this here in San Diego, and I saw several people tweeting about this. I was one of the folks who was like literally getting teary eyed in the audience, and you talked a lot about inclusivity and cooperation in the community, and I'm wondering if you can talk a little bit about how you see those kind of values manifested in the Kubernetes community.

Kelsey Hightower: Yeah, so that one's a very interesting topic, mainly because I am officially from an underrepresented group in technology, right? Just from my appearance, if you didn't know who I was, you would probably just slate me into the group of underrepresented people. That most tech conferences that I go to, a lot of people happen to know my name, they know my work and I get this automatic set of credibility when I walk into the room. And I don't necessarily suffer the effects of being in an underrepresented group.

Kelsey Hightower: And the key part there was using the example, so if you didn't the keynote, I gave a story about a woman who was a speaker at a different conference and the speaker dinner was upstairs, and she's wheelchair bound, or she uses a wheelchair, and she wanted to attend the speaker dinner just like everyone else, but guess what? She couldn't get up there on her own power, and at that point that really shows you what the impact feels like when you're not necessarily included in the planning, or the execution of an event and again, in that particular scenario, I don't think anyone was being malicious. I don't think anyone chose that place on purpose.

Rich Burroughs: Oh, of course not. Yeah.

Kelsey Hightower: But that's what it feels like for someone to maybe forget that you're also attending, and they should think about how you're going to interact with that environment.

Rich Burroughs: I have to tell you that, just interrupt for a second. When you told that story, it blew me away because I did some code of conduct training with the DevOpsDays that I help organize here in Portland. And one of the examples that came up in the training was almost that same exact one except it was a speaker in a wheelchair who couldn't get on the stage to do their talk. And this was also based on a real incident. And so yeah, it's that lack of, I guess, consideration for people that are different than you.

Kelsey Hightower: Yeah. So that inclusion piece, I think that's the opportunity. Diversity is good because the more people in general that we get involved we get better backgrounds, we get different backgrounds, we get different analogies, we get different stories. It's just better for everything. And the inclusion piece, what happens when people show up, and that's what I wanted people to really think about at that conference, and any other conference that you're going to be attending because I think that's all of our jobs to make sure that people feel welcome when they do show up.

Rich Burroughs: Yeah. I mean what, maybe half the folks of the 12,000 people that were there were new to KubeCon?

Kelsey Hightower: Oh yeah. I think, hey I think that's right. I think someone asked earlier in one of their talks about how many people were new for the first time, and that's about the ratio you expect. You expect people to kind of roll in and roll off, maybe come back every other year. But we're starting to see now that we're getting this mass adoption.

Rich Burroughs: Yeah.

Jacob Plicque: Yeah. Actually, and one of the takeaways from you, about that talk, actually I talked with Rich and Anna, who also works at Gremlin, a little bit about inclusivity because I kind of forgot that as an African American, I want to make sure that I'm kind of... The inclusivity and that cooperation is top of mind when I'm going out and doing and doing talks, because Oh yeah, I forgot, I have a voice too, like in my voice is different than say somebody else's. And so it was really great. I think the fact that you were telling the story about that young gentleman and his kid that kind of came up to you and said, "Hey, I didn't even think about the fact that I was looking for someone to look up to, much less someone that looks me. And so I super appreciated that, too."

Kelsey Hightower: Yeah. And this stuff is hard. So one thing I like to, I don't talk about it very much, but it's very hard to think about these issues, or any problem that you don't personally have. So a lot of people struggle with this because if you don't have the problem, it's really hard to have the empathy and think clearly about it, right? So the way I like to think about this is if you're on a team of people trying to end world hunger, right. So you all go out to eat and you order these very big juicy hamburgers with fries and soda. I mean, you're just chewing and guzzling it down, and then you stopped to talk and you say, "Hey, what should we do about world hunger?"

Kelsey Hightower: And the thing is you're not in the right mindset to really understand the impact. Because a lot of people may say, "Well, we should just donate some money." Some people might say, "we should just buy all the people laptops, and then they will create an economy to feed themselves." What are you talking about? Those things may be part of the solution, but since you're not hungry, you may not be thinking in the same context of the people you're trying to help.

Rich Burroughs: Yeah.

Jacob Plicque: It's actually a really good analogy, I actually had something weirdly similar happen where we were out at dinner one time, and I ordered too much food, and I got my doggy bag as it were. And there was this homeless gentleman outside, super nice guy, and he was just like, "Hey man, like are you going to eat that? And I was like, "As a matter of fact, I'm not. All yours man. Merry Christmas." It was just this duh move, but what I realized as he was walking away is that there were people all around us that had doggy bags that he clearly probably talked to beforehand and that didn't. And I was like, "That's so backwards."

Kelsey Hightower: But the thing I like about stories like that is, that is what humane is about. Throwing away food when people are hungry. I really struggle with that, personally, right? Forget what everybody else is doing, but me personally, I hate to scrape off food into the trashcan knowing that someone else isn't eating, and I get it. The world will always revolve, and et cetera, et cetera. But what that person was doing, having the humility to say, "Hey, I'm hungry, can I get that from you?" And you being able to share it to him. And I think that is just, that is human in so many ways.

Rich Burroughs: So you talk about sharing, and from what I observed at KubeCon, there's a lot of that in the Kubernetes community. It seemed like there was a lot of cooperation between the different vendors and things. And I wondered if you could talk about that some, because I feel like as someone from the outside, my concern is, is this thing going to turn into another OpenStack? Because there's all these big vendors involved and they've obviously got their own agendas. What do you see in terms of that cooperation?

Kelsey Hightower: I think the thing about Kubernetes is that it doesn't need the big vendors to survive. I think that's one of the core differences here, because even though people look at Kubernetes as this big complex project, it really isn't when you think about how it actually works. So meaning most people can use Kubernetes with three or four machines. Some people have little raspberry PI clusters. For most people, Kubernetes is almost a configuration management tool more than anything else. And in their world there are enough people contributing from all walks of life, individual contributors, consultants, people who work at various companies that are purely customers that happen to contribute, and the vendors.

Kelsey Hightower: So I think that since we don't lean on the vendors so hard, just like we do with Linux, right, we have distributions. Vendors can add value, but the Linux kernel is for everyone that wants to pick it up and use it. I think Kubernetes is at that point, it grew a life of its own without the vendors. Remember vendors came very late to the Kubernetes party, right? The first KubeCon, there were very few sponsors that were not also contributors. Red Hat and Google took a very organic approach to developing the project, and we didn't see the traditional vendors that you would think of until way later. And what they're doing is trying to participate, not control.

Rich Burroughs: That's awesome. So I wondered if you, I know you've not been a part of the Chaos Engineering community, specifically, but it's a smaller community. It's a lot newer than the Kubernetes community. We've got a Slack with like 3,000 people in it, but a lot of them are pretty new, and have a lot to learn. And I'm wondering if you have any advice based on your participation in these other communities that you could give those folks?

Kelsey Hightower: I mean, I think you have to just approach this like any other thing, right? Like the first time you've logged into a Linux server and you felt lost, and you had to be patient to get comfortable with, think about it. There are hundreds of command line utilities on a Unix system, and for the most part, none of them output things in a format that the other one can consume, right? You're damn near writing regular expressions to parse the output from one tool to another, that's hard. But you learned it, and once you learned it, it became second nature.

Kelsey Hightower: You have to give yourself the same amount of time to learn how Kubernetes works. So if you approach this as a how fast and how far can I get in 10 minutes? Okay, that's not the end game, but that might give you a bit of a feel for this system, so maybe you start with something like Minikube, or a fully managed Kubernetes offering. But if you really want to kind of mess with this, if you feel that it's worth investing your time, then I would probably say slow all the way down. Take your time, and then start to think about it. I think most people who have any system administration experience with more than a handful of machines already has about 85% of the fundamentals required to really leverage Kubernetes, and start to understand it.

Jacob Plicque: Yeah, I was just going to say guilty as charged. I was joking earlier that, my first Kubernetes cluster was during my interview process at Gremlin, and now a year and a half later, now that I've broken in a whole heck of a lot of... I feel much more confident than I did during my interview for sure. I think it's just about putting the pedal to the metal, and figuring out what works, or how it works, I should say.

Rich Burroughs: All right, so Serverless. So you've been talking a lot more about Serverless in the last year. I saw a talk you did at a Go meetup about it, and you've been tweeting about it some, and I'm wondering if you can tell me about your journey with Serverless, and what about it excites you?

Kelsey Hightower: I treat Serverless as a North Star, meaning if we can get to the point where people can consume infrastructure without first having to build it as an option, that's the operative word here now, option. Not the only way, not that everything else should go away, but as a North Star for the systems that we're building. So if you're on premise, if you can get to a point where people don't even know that the servers are there, man, that's a good place to be. And years ago, I remember teaching my daughter HTML, and she's like, "Dad, I want to make a webpage." I was like, "Okay." So we got some HTML going. I showed her some JavaScript and CSS. I came back and it's looking like Geocities in there. Right, stuff's blinking and she's like, "Dad, look at this." I'm like, "Okay, that's a little... Neon is not necessarily required for every tag. But I see where you're going with this." And she said, "I was trying to show one of my friends, but they can't see it." I was like, "What'd you do?" And she texts her 127.0.0.1.

Jacob Plicque: Awesome.

Kelsey Hightower: And she can't see it. And I was like, "Now it's time to have that discussion. Do I teach my daughter Docker and Kubernetes?” No, she is not interested in becoming a professional system administrator. All she wants to do is launch her website. So I took her Chromebook and installed the Firebase command line tool, and she ran “firebase deploy.” It spit back a URL. She copied and pasted it, and put it in her browser, and then she texted it to her friend, her friend was able to pull it up and that was it. That to me is what the serverless idea North Star is all about.

Rich Burroughs: I've seen a number of engineers have negative reactions to Kubernetes, and usually it's something along the lines of, “I just want to write my app, right? I don't want to have to learn how this cluster works, because I've got a job, and my job is writing an application that delivers business value.”

Kelsey Hightower: See, here's the thing, I've had all the jobs. I've had the roles where I'm in Ops and thinking Ops is what everyone should learn. I had the job as a developer thinking that everyone should learn how to do some development. But here's the thing, in my normal life, I don't want to have to be expert at the thing behind the scenes that I'm using, like the ISP, for example. I just want to sign up online, buy a modem from a Best Buy, screw in the wall. That's it. And they said, "No. Well Kelsey, hold on. I'm going to need you to go out into the street, and pick up the special key to unlock this box. And what you're going to do is you're going to strip some co-ax." And I'm like, "Whoa, what?"

Jacob Plicque: That's amazing.

Kelsey Hightower: No. I'm not trying to go that far. So if I was a developer and I was hired to be a specialist, there is nothing wrong with specialization. Now if someone wants to learn the other part of the aisle, if someone wants to gain additional set of responsibilities, hey, I'm all for it. Never lock people out of learning and contributing. But at the same time, are we really going to say that people have to become an expert at infrastructure before they can use it?

Rich Burroughs: Yeah. Kelsey, I think that's about all the time that we have. I want to thank you so much for coming to talk with us. It was super fun to get to hear your thoughts on some of these things. Before we go, is there anything that you want to mention to folks? Any talks you have coming up, or do you want to plug your Twitter, or anything like that?

Kelsey Hightower: No, I want people to just enjoy the holiday, so you'll probably be listening to this in the new year. Enjoy your family, enjoy the people around you. This technology stuff is always going to be here, but make sure you benefit from all the work that you put in.

Rich Burroughs: That's awesome. Yeah, this'll be coming out in January, so we're recording a little bit early. But yeah, thanks so much for coming on Kelsey.

Jacob Plicque: Yeah, absolutely.

Kelsey Hightower: Awesome. Had an amazing time.

Jacob Plicque: Same here, mega educational and super insightful. Thank you.

Rich Burroughs: Our music is from Komiku. The song is titled Battle of Pogs. For more of Komiku's music, visit loyaltyfreakmusic.com, or click the link in the show notes. For more information about our Chaos Engineering community, visit gremlin.com/community. Thanks for listening, and join us next month for another episode.

No items found.

Podcast: Break Things on Purpose | Ep. 10: Kelsey Hightower, Principal Developer Advocate at Google

Transcript of Today's Episode

What is Failure Flags? Build testable, reliable software—without touching infrastructure

Introducing Custom Reliability Test Suites, Scoring and Dashboards