Podcast: Break Things on Purpose | Dan Isla: Astronomical Reliability

‍

It’s time to shoot for the stars with Dan Isla, VP of Product at itopia, to talk about everything from astronomical importance of reliability to time zones on Mars. Dan’s trajectory has been a propulsion of jobs bordering on the science fiction, with a history at NASA, modernizing cloud computing for them, and loads more. Dan discusses the finite room for risk and failure in space travel with an anecdote from his work on Curiosity. Dan talks about his major take aways from working at Google, his “baby” Selkies, his work at itopia, and the crazy math involved with accounting for time on Mars!

In this episode, we cover:

Introduction (00:00)
Dan’s work at JPL (01:58)
Razor thin margins for risk (05:40)
Transition to Google (09:08)
Selkies and itopia (13:20)
Building a reliability community (16:20)
What itopia is doing (20:20)
Learning, building a “toolbox,” and teams (22:30)
Clockdrift (27:36)

Links Referenced:

itopia: https://itopia.com/
Selkies: https://github.com/danisla/selkies
selkies.io: https://selkies.io
Twitter: https://twitter.com/danisla
LinkedIn: https://www.linkedin.com/in/danisla/

Transcript

Dan: I mean, at JPL we had an issue adding a leap second to our system planning software, and that was a fully coordinated, many months of planning, for one second. [laugh]. Because when you’re traveling at 15,000 miles per hour, one second off in your guidance algorithms means you missed the planet, right? [laugh]. So, we were very careful. Yeah, our navigation parameters had, like, 15 decimal places, it was crazy.

Julie: Welcome to Break Things on Purpose, a podcast about reliability, building things with purpose, and embracing learning. In this episode, we talked to Dan Isla, VP of Product at itopia about the importance of reliability, astronomical units, and time zones on Mars.

Jason: Welcome to the show, Dan.

Dan: Thanks for having me, Jason and Julie.

Jason: Awesome. Also, yeah, Julie is here. [laugh].

Julie: Yeah. Hi, Dan.

Jason: Julie’s having internet latency issues. I swear we are not running a Gremlin latency attack on her. Although she might be running one on herself. Have you checked in in the Gremlin control panel?

Julie: You know, let me go ahead and do that while you two talk. [laugh]. But no, hi and I hope it’s not too problematic here. But I’m really excited to have Dan with us here today because Dan is a Boise native, which is where I’m from as well. So Dan, thanks for being here and chatting with us today about all the things.

Dan: You’re very welcome. It’s great to be here to chat on the podcast.

Jason: So, Dan has mentioned working at a few places and I think they’re all fascinating and interesting. But probably the most fascinating—being a science and technology nerd—Dan, you worked at JPL.

Dan: I did. I was at the NASA Jet Propulsion Lab in Pasadena, California, right, after graduating from Boise State, from 2009 to around 2017. So, it was a quite the adventure, got work on some, literally, out-of-this-world projects. And it was like drinking from a firehose, being kind of fresh out to some degree. I was an intern before that so I had some experience, but working on a Mars rover mission was kind of my primary task. And the Mars rover Curiosity was what I worked on as a systems engineer and flight software test engineer, doing launch operations, and surface operations, pretty much the whole, like, lifecycle of the spacecraft I got to experience. And had some long days and some problems we had to solve, and it was a lot of fun. I learned a lot at JPL, a lot about how government, like, agencies are run, a lot about how spacecraft are built, and then towards the end a lot about how you can modernize systems with cloud computing. That led to my exit [laugh] from there.

Jason: I’m curious if you could dive into that, the modernization, right? Because I think that’s fascinating. When I went to college, I initially thought I was going to be an aerospace engineer. And so, because of that, they were like, “By the way, you should learn Fortran because everything’s written in Fortran and nothing gets updated.” Which I was a little bit dubious about, so correct folks that are potentially looking into jobs in engineering with NASA. Is it all Fortran, or… what [laugh] what do things look like?

Dan: That’s an interesting observation. Believe it or not, Fortran is still used. Fortran 77 and Fortran—what is it, 95. But it’s mostly in the science community. So, a lot of data processing algorithms and things for actually computing science, written by PhDs and postdocs is still in use today, mostly because those were algorithms that, like, people built their entire dissertation around, and to change them added so much risk to the integrity of the science, even just changing the language where you go to language with different levels of precision or computing repeatability, introduced risk to the integrity of the science. So, we just, like, reused the [laugh] same algorithms for decades. It was pretty amazing yeah.

Jason: So, you mentioned modernizing; then how do you modernize with systems like that? You just take that codebase, stuff it in a VM or a container and pretend it’s okay?

Dan: Yeah, so a lot of it is done very carefully. It goes kind of beyond the language down to even some of the hardware that you run on, you know? Hardware computing has different endianness, which means the order of bits in your data structures, as well as different levels of precision, whether it’s a RISC system or an AMD64 system. And so, just putting the software in a container and making it run wasn’t enough. You had to actually compute it, compare it against the study that was done and the papers that were written on it to make sure you got the same result. So, it was pretty—we had to be very careful when we were containerizing some of these applications in the software.

Julie: You know, Dan, one thing that I remember from one of the very first talks I heard of yours back in, I think, 2015 was you actually talked about how we say within DevOps, embrace failure and embrace risk, but when you’re talking about space travel, that becomes something that has a completely different connotation. And I’m kind of curious, like, how do you work around that?

Dan: Yeah, so failing fast is not really an option when you only have one thing [laugh] that you have built or can build. And so yeah, there’s definitely a lot of adverseness to failing. And what happens is it becomes a focus on testing, stress testing—we call it robustness testing—and being able to observe failures and automate repairs. So, one of the tests programs I was involved with at JPL was, during the descent part of the rover’s approach to Mars, there was a power descent phase where the rover actually had a rocket-propelled jetpack and it would descend to the surface autonomously and deliver the rover to the surface. And during that phase it’s moving so fast that we couldn’t actually remote control it, so it had to do everything by itself.

And there were two flight computers that are online, pretty much redundant, everything hardware-wise, and so it’s kind of up to the software to recover itself. And so, that’s called entry descent and landing, and one of my jobs towards the end of the development phase was to ensure that we tested all of the possible breakage points. So, we would do kind of evil Gremlin-like things. We actually—the people in the testbed, we actually call Gremlins. And [laugh] we would—we—they inject faults during the simulation.

So, we had copies of the hardware running on a desk, the software was running, and then we’d have Gremlins go and say like, “Hey, flight computer one just went out. You know, what’s going to happen?” And you watch the software, kind of, take over and either do the right thing or simulate a crash landing. And we find bugs in the software this way, we’d find, like, hangs in the control loops for recovery, and we had to fix those before we made it to Mars, just in case that ever happened. So, that was like how we, like, really stressed test the hardware, we did the same thing with situational awareness and operations, we had to simulate things that would happen, like, during launch or during the transit from Earth to Mars, and then see how the team itself reacted to those. You know, do our playbooks work? Can we run these in enough time and recover the spacecraft? So, it was a lot of fun. That’s I guess that’s about as close to, like, actually breaking something I can claim to. [laugh].

Julie: Well, I have to say, you’ve done a good job because according to Wikipedia—which we all know is a very reliable source—as of May 9th, 2022, Curiosity has been active on Mars for 3468 sols or 3563 days, and is still active. Which is really amazing because I don’t—was it ever intended to actually be operational that long?

Dan: Not really. [laugh]. The hardware was built to last for a very long time, but you know, as with most missions that are funded, they only have a certain amount of number of years that they can be operated, to fund the team, to fund the development and all that. And so, the prime mission was only, like, two years. And so, it just keeps getting extended. As long as the spacecraft is healthy, and, like, doing science and showing results, we usually extend the missions until they just fall apart or die, or be intentionally decommissioned, kind of like the Cassini project. But yeah.

Julie: Well, you’ve heard it here first, folks. In order to keep funding, you just need to be, quote, “Doing science.” [laugh]. But Dan, after JPL, that’s when you went over to Google, right?

Dan: Yeah, yeah. So, it was kind of that transition from learning how to modernize with cloud. I’d been doing a lot with data, a lot with Amazon’s government cloud, which is the only cloud we could use at JPL, and falling in love with these APIs and ways to work with data that were not possible before, and saw this as a great way to, you know, move the needle forward in terms of modernization. Cloud is a safe place to prototype a safe place to get things done quick. And I always wanted to work for a big tech company as well, so that was always another thing I was itching to scratch.

And so Google, I interviewed there and finally made it in. It was not easy. I definitely failed my first interview. [laugh]. But then try it again a few years later, and I came in as a cloud solution architect to help customers adopt cloud more quickly, get through roadblocks.

My manager used to say the solution architects were the Navy Seals of cloud, they would drop in, drop a bunch of knowledge bombs, and then, like, get out, [laugh] and go to the next customer. It was a lot of fun. I got to build some cool technology and I learned a lot about what it’s like working in a big public company.

Julie: Well, one of my favorite resources is the Google SRE book, which, as much as I talk about it, I’m just going to admit it here now, to everybody that I have not read the entire thing.

Dan: It’s okay.

Julie: Okay, thank you.

Dan: Most people probably haven’t.

Julie: I also haven’t read all of Lord of the Rings either. But that said, you know, when you talk about the learnings, how much of that did you find that you practiced day-to-day at Google?

Dan: In cloud—I’ve mostly worked in cloud sales, so we were kind of post-sales, the experts from the technology side, kind of a bridge to engineering and sales. So, I didn’t get to, like, interact with the SREs directly, but we have been definitely encouraged, I had to learn the principles so that we could share them with our customers. And so, like, everyone wanted to do things like Google did, you know? Oh, these SREs are there, and they’re to the rescue, and they have amazing skills. And they did, and they were very special at Google to operate Google’s what I would call alien technology.

And so, you know, from a principles point of view, it was actually kind of reminded me a lot of what I learned at JPL, you know, from redundant systems and automating everything, having the correct level of monitoring. The tools that I encountered at Google, were incredible. The level of detail you could get very quickly, everything was kind of at your fingertips. So, I saw the SREs being very productive. When there was an outage, things were communicated really well and everyone just kind of knew what they were doing.

And that was really inspiring, for one, just to see, like, how everything came together. That’s kind of what the best part of working at Google was kind of seeing how the sausage was made, you know? I was like, “Oh, this is kind of interesting.” [laugh]. And still had some of its big company problems; it wasn’t all roses. But yeah, it was definitely a very interesting adventure.

Jason: So, you went from Google, and did you go directly to the company that you helped start, right now?

Dan: I did. I did. I made the jump directly. So, while I was at Google, you know, not only seeing how SRE worked, but seeing how software was built in general and by our customers, and by Google, really inspired me to build a new solution around remote productivity. And I’ve always been a big fan of containers since the birth of Docker and Kubernetes.

And I built the solution that let you run, kind of, per-user workloads on Kubernetes and containers. And this proved to be interesting because you could, you know, stand up your own little data processing system and scale it out to your team, as well as, like, build remote code editors, or remote desktop experiences from containers. And I was very excited about this solution. The customers were really starting to adopt it. And as a solution architect, once the stuff we built, we always open-source it.

So, I put it on GitHub as a project called Selkies. And so, Selkies is the Kubernetes components and there’s also the high performance streaming to a web browser with WebRTC on GitHub. And a small company, itopia, I met at a Google conference, they saw my talk and they loved the technology. They were looking for something like that, to help some of their product line, and they brought me in as VP of Product.

So, they said, “We wanted to productize this.” And I’m like, “Well, you’re not doing that without me.” [laugh]. Right? So, through the pandemic and work from home and everything, I was like, you know, now is probably a good time to go try something new.

This is going to be—and I get to keep working on my baby, which is Selkies. So yeah, I’ve been itopia since beginning of 2021, building a remote desktop, really just remote developer environments and other remote productivity tools for itopia.

Julie: Well and, Dan, that’s pretty exciting because you actually talked a little bit about that at DevOpsDays Boise, which if that video is posted by the time of publication of this podcast, we’ll put a link to that in the show notes. But you’re also giving a talk about this at SCaLE 19x in July, right?

Dan: Yeah, that’s right. Yeah, so SCaLE is the Southern California Linux Expo, and it’s a conference I really enjoy going to get to see people from Southern California and other out of town, a lot of JPLers usually go as well and present. And so, it’s a good time to reconnect with folks. But yeah, so SCaLE, you know, they usually want to talk more about Linux and some of the technologies and open-source. And so yeah, really looking forward to sharing more about selfies and kind of how it came to be, how containers can be used for more than just web servers and microservices, but also, you know, maybe, like, streaming video games that have your container with the GPU attached. The DevOpsDays Boise had a little demo of that, so hopefully, that video gets attached. But yeah, I’m looking forward to that talk at the end of July.

Jason: Now, I’m really disappointed that I missed your talk at DevOpsDays Boise. So Julie, since that’s your domain, please get those videos online quickly.

Julie: I am working on it. But Dan, one of the things that you know you talk about is that you are the primary maintainer on this and that you’re looking to grow and improve with input from the community. So, tell us, how can the community get involved with this?

Dan: Yeah, so Selkies is on GitHub. You can also get to it from selkies.io. And basically, we’re looking for people to try it out, run it, to find problems, you know, battle test it. [laugh]. We’ve been running it in production at itopia, it’s powering the products they’re building now.

So, we are the primary maintainers. I only have a few others, but, you know, we’re just trying to build more of an open-source community and level up the, you know, the number of contributors and folks that are using it and making it better. I think it’s an interesting technology that has a lot of potential.

Jason: I think as we talk about reliability, one of the things that we haven’t covered, and maybe it’s time for us to actually dive into that with you is reliability around open-source. And particularly, I think one of the problems that always happens with open-source projects like this is, you’re the sole maintainer, right? And how do you actually build a reliable community and start to grow this out? Like, what happens if Dan suddenly just decides to rage quit tech and ups and leaves and lives on his own little private island somewhere? What happens to Selkies?

Do you have any advice for people who’ve really done this, right? They have a pet project, they put it on GitHub, it starts to gain some traction, but ultimately, it’s still sort of their project. Do you have any advice for how people can take that project and actually build a reliable, growing, thriving community around it?

Dan: Honestly, I’m still trying to figure that out [laugh] myself. It’s not easy. Having the right people on your team helps a lot. Like, having a developer advocate, developer relations to showcase what it’s capable of in order to create interest around the project, I think is a big component of that. The license that you choose is also pretty important to that.

You know, there’s some software licenses that kind of force the open-sourcing of any derivative of what you build, and so that can kind of keep it open, as well, as you know, move it forward a little bit. So, I think that’s a component. And then, you know, just, especially with conferences being not a thing in the last couple of years, it’s been really hard to get the word out and generate buzz about some of these newer open-source technologies. One of the things I kind of like really hope comes out of a two-year heads-down time for developers is that we’re going to see some, like, crazy, amazing tech on the other side. So, I’m really looking forward to the conferences later this year as they’re opening up more to see what people have been building. Yeah, very interested in that.

Jason: I think the conversation around open-source licenses is one that’s particularly interesting, just because there’s a lot involved there. And there’s been some controversy over the past couple of years as very popular open-source projects have decided to change licenses, thinking of things like Elastic and MongoDB and some other things.

Dan: Yeah. Totally.

Jason: You chose, for Selkies, it looks like it’s Apache v2.

Dan: Yep. That was mostly from a Google legal point of view. When I was open-sourcing it, everything had to be—you know, had to have the right license, and Apache was the one that we published things under. You know, open-source projects change their license frequently. You saw that, like what you said, with Elastic and Mongo.

And that’s a delicate thing, you know, because you got to make sure you preserve the community. You can definitely alienate a lot of your community if you do it wrong. So, you got to be careful, but you also, you know, as companies build this tech and they’re proud of it and they want to turn it into a product, you want to—it’s a very delicate process, trying to productize open-source. It can be really helpful because it can give confidence to your customers, meaning that, like, “Hey, you’re building this thing; if it goes away, it’s okay. There’s this open-source piece of it.”

So, is instills a little bit of confidence there, but it also gets a little tricky, you know? Like, what features are we adding the add value that people will still pay for versus what they can get for free? Because free is great, but you know, it’s a community, and I think there are things that private companies can add. My philosophy is basically around packaging, right? If you can package up an open-source product to make it more easier to consume, easier to deploy, easier to observe and manage, then you know, that’s a lot of value that the rest of the free community may not necessarily need.

If they’re just kind of kicking the tires, or if they have very experienced Kubernetes team on-site, they can run this thing by themselves, go for it, you know? But for those, the majority that may not have that, you know, companies can come in and repackage things to make it easier to run open-source. I think there’s a lot of value there.

Jason: So, speaking of companies repackaging things, you mentioned that itopia had really sort of acquired you in order to really build on top of Selkies. What are the folks at itopia doing and how are they leveraging the software?

Dan: That’s a good question. So, itopia’s mission is to radically improve work-from-anywhere. And we do that by building software to orchestrate and automate access to remote computing. And that orchestration and automation is a key component to this, like, SaaS-like model for cloud computing.

And so, Selkies is a core piece of that technology. It’s designed for orchestrating per-user workloads, like, remote environments that you would need to stand up. And so, you know, we’re adding on things that make it more consumable for an enterprise, things like VPN peering and single-sign-on, a lot of these things that enterprises need from day one in order to check all the boxes with their security teams. And at the heart of that is really just increasing the amount of the productivity you have through onboarding.

Basically, you know, setting up a developer environment can take days or weeks to get all the dependencies set up. And the point of itopia—Spaces is the product I’m working on—is to reduce that amount of time as much as possible. And, you know, this can increase risk. If you have a product that needs to get shipped and you’re trying to grow or scale your company and team and they can’t do that, you can slip deadlines and introduce problems, and having a environment that’s not consistent, introduces reliability problems, right, because now you have developers that, “Hey, works on my machine.” But you know, they may have—they don’t have the same machine, same environment as everyone else, and now when it comes to reproducing bugs or even fixing them, that you can introduce more problems to the software supply chain.

Julie: I mean, that sounds like a great problem to solve and I’m glad you’re working on it. With your background being varied, starting as an intern to now where you personally are being acquired by organizations. What’s something that you’ve really learned or taken from that? Because one thing that you said was that you failed your first Google interview badly? And—

Dan: Yes. [laugh].

Julie: I find that interesting because that sounds like you know, you’ve taken that learning from failure, you’ve embraced the fact that you failed it. Actually, I just kind of want to go back. Tell us, do you know what you did?

Dan: It was definitely a failure. I don’t know how spectacular it was, but, like, [laugh] google interviews are hard. I mean—and that’s just how it is, and it’s been—it’s notorious for that. And I didn’t have enough of the software, core software experience at the time to pass the interview. These are, like, five interviews for a software engineer.

And I made it through, like, four of them. The last one was, like, just really, really, really hard and I could not figure it out. You know, because this is, like, back in the day—and I think they still do this, like, where you’re, like, coding on a whiteboard, right? Like, okay, right, this C code on a whiteboard, and it has to work. You know, the dude is, like, right, there compiling it, right? Like, “Okay, [unintelligible 00:23:29], boy.” [laugh].

So, not only is a high stress, but it has to be right as well. [laugh]. And so, like, it was just a very difficult experience. And what I learned from that was basically, “Okay, I need to, one, get more experience in this style and this domain of programming, as well, as you know, get more comfortable speaking and being in front of people I don’t know.” [laugh].

So yeah, there’s definitely components there of personal growth as well as technical growth. From a technical point of view, like, my philosophy as being an engineer in general, and software developer, is have a really big toolbox and use the tools that are appropriate for the job. This is, like, one of my core philosophies. Like, people ask, you know, ‘what language do you use?’ And I’m like, “Whatever language you needed to solve the problem.”

Like, if you’re writing software, in a—with libraries that are all written in C, then don’t try to do that in, like, Java or something, in some other language that doesn’t have those language bindings. Don’t reinvent the language bindings. You follow the problem and you follow the tech. What language, what tool will best solve this problem? And I’m always working backwards from the problem and then bringing in the right tools to solve it.

And that’s something that has paid off in dividends because it’s very—problem-solving is fun and it’s something I always had a passion for, but when you have a toolbox that is full of interesting gadgets and things you can use, you get excited every time you get to use that tool. Like, just like power tools here, I have a—I don’t know, but it’s like, “Yeah, I get to use the miter saw for this thing. Awesome. I don’t have one? Okay, I’m going to go buy one.” [laugh].

Julie: That’s actually—that’s a really good point, one of the talks that I gave was, “You Can’t Buy DevOps.” And it was really all about letting developers be part of the process in choosing the tools that they’re going to use. Because sometimes I think organizations put too many constraints around that and force you to use these tools that might not be the best for what you’re trying to accomplish. So, I like that you bring up having the ability to be excited about your toolbox, or your miter saw. For me, it would be my dremel. Right? But what tool is going to—

Dan: [crosstalk 00:25:39] cool.

Julie: Yeah, I mean, they really are—what tool is going to be best for the job that you are trying to accomplish? And I think that that’s, that’s a big thing. So, when you look to bring people onto your team, what kind of questions do you ask them? What are you looking for?

Dan: Well, we’re just now starting to really grow the company and try and scale it up. And so we’re, you know, we’re starting to get into more and more interview stuff, I try to tell myself, I don’t want to put someone through the Google experience again. And part of that is just because it wasn’t pleasant, but also, like, I don’t know if it was really that useful [laugh] at the end of the day. And so, you know, there’s a lot about culture fit that is really important. People have to be able to communicate and feel comfortable with your team and the pace that your team is working at. And so, that’s really important.

But you know, technically, you know, I like to see a lot of, you know—you got to be able to show me that you can solve problems. And that can be from, you know, just work that you’ve done an open-source, you know, having a good resume of projects you’ve worked on is really important because then we can just talk about tech and story about how you solve the problem. I don’t have to—I don’t need you to go to the whiteboard and code me something because you have, like, 30 repos on GitHub or something, right? And so, the questions are much more around problem-solving: you know, how would you solve this problem? What technology choices would you use, and why?

Sometimes I’ll get the fundamentals, like, do you understand how this database works at its core or not? You know, or why is it… why is that good or bad? And so, looking for people who can really think within the toolbox they have—it doesn’t have to be a big one, but do they know how to use the tools that they’ve acquired so far, and really, just really, really critically think through with your problems? So, to me, that’s a better skill to have than just, you know, being able to write code on the whiteboard.

Julie: Thanks for that, Dan. And earlier, before we started the official recording here, you were talking a little bit about time drift. Do you want to fill everybody in on what you were talking about because I don’t think it was Doctor Strange and the Multiverse of Madness?

Dan: No. [laugh]. I think there were some—we were talking about um…clocks?

Julie: Clocks skew.

Dan: Daylight savings time?

Julie: Yeah.

Dan: Clock skew, clock drift. There was a time at JPL when we were inserting a leap second to the time. This actually happened all throughout the world, where periodically that the clocks will drift far enough because the orbits and the rotation of the planet are not, like, perfectly aligned to 365 days in a year and 24 hours in a day. And so, every so decades, you have to insert these leap seconds in order to catch up and make time more precise. Well, space travel, when you’re planning, you have to—you’re planning to the position of the stars and the planets and the orbital bodies, and those measurements are done at such a large scale that you have—your precision goes, like, way out, you know, many, many decimal places in order to properly plan to the bodies up big.

And with the Mars Rover, one of these leap seconds happened to come in, like, right, before we launched. And it was like, oh my gosh, this is going to be to—change all of our ephemeris files—the data that you use to track positions—and we had to do it, like, synchronize it all, like, right, when the leap second was going in. And we tested this extensively because if you get it wrong with your spacecraft is traveling, like, 15,000 miles an hour towards Mars, and a one-second pointing error from Earth means, like, you missed the whole planet, you won’t even get there. [laugh]. We’re not talking about, like, missing the landing site of, like, a few kilometers. No, it’s like thousands of kilometers in pointing error.

So yeah, things are astronomical [laugh] in units. Actually, that’s why they’re called AU, astronomical units, when you’re measuring the distance from the Sun. So yeah, it was a pretty fun time. A little bit nerve-wracking just because the number of systems that had to be updated and changed at the same time. It’s kind of like doing a rolling update on a piece of software that just had to go out all at the same time. Yeah.

Jason: I think that’s really interesting, particularly because, you know, for most of us, I think, as we build things whether that’s locally or in the cloud or wherever our servers are at, we’re so used to things like NTP, right, where things just automatically sync and I don’t have to really think about it and I don’t really have to worry about the accuracy because NTP stays pretty tight. Usually, generally.

Dan: Mm-hm.

Jason: Yeah. So, I’m imagining, obviously, like, on a spacecraft flying 15,000 miles a second or whatever, no NTP out there.

Dan: [laugh]. Yeah, no NTP and no GPS. Like, all the things you take for granted, on Mars are just not there. And Mars even has a different time system altogether. Like the days on Mars are about 40 minutes longer because the planet spins slower.

And my first 90 sols—or days on Mars—of the mission, the entire planning team on earth that I was a part of, we lived on Mars time. So, we had to synchronize our Earth’s schedule with what the rover was doing so that when the rover was asleep, we were planning the next day’s activities. And when it woke up, it was ready to go and do work during the day. [laugh]. So, we did this Mars time thing for 90 days. That was mostly inherited from the Mars Exploration rovers, Spirit and Opportunity because they were only designed to live for, like, 90 days.

So, the whole team shifted. And we—and now it’s kind of done in spirit of that mission. [laugh]. Our rover, we knew it was going to last a bit longer, but just in case, let’s shift everyone to Mars time and see what happened. And it was not good. We had to [laugh] we had to end that after 90 days. People—your brain just gets completely fried after that. But it was bizarre.

And there’s no time. You have invent your own time system for Mars. Like, there’s no, it was called LMST, or Local Mars Standard Time, local mean standard time. But it was all, like, relative to, you know, the equator and where you were on the planet. And so, Mars had his own Mars time that counted at a different rate per second.

And so, it was funny, we had these clocks in the Mission Control Room that—there was this giant TV screen that had, like, four different time clocks running. It had, like, Pasadena time, UTC time, Mars time, and, like, whatever time it was at the Space Network. And I was like, “Oh, my gosh.” And so, we were always doing these, like, time conversions in our heads. It was mental. [laugh]. So, can’t we just all be on UTC time? [laugh].

Jason: So, I’m curious, with that time shift of being on Mars time and 40 minutes longer, that inherently means that by the end of that 90 days, like, suddenly, your 8 a.m. Mars local time is, like, shifted, and is now, like, hours off, right? You’re waking—

Dan: Yeah.

Jason: Up in the middle of the night?

Dan: Totally, yeah.

Jason: Wow.

Dan: Yeah, within, like, two weeks, your schedule will be, like, upside down. It’s like, every day, you’re coming in 40 minutes later. And yeah, it was… it was brutal. [laugh]. Humans are not supposed to do that.

If you’re actually living on Mars, you’re probably okay, but like, [laugh] trying to synchronize those schedules. I thought you were going from East Coast to West Coast time, working remote was hard. And, like, [laugh] that’s really remote.

Julie: Dan, that’s just astronomical.

Dan: [laugh].

Julie: I’m so sorry. I had to do it. But with that—[laugh].

Jason: [laugh].

Dan: [laugh]. [unintelligible 00:33:15].

Julie: With that, Dan, I really just want to thank you for your time on Break Things on Purpose with us today. And as promised, if I can find the links to Dan’s talks, if they’re available before this episode posts, we will put those in the show notes. Otherwise, we’ll put the link to the YouTube channel in the show notes to check for updates. And with that, I just want to thank you, Dan, and wish you a wonderful day.

Jason: Before we go, Dan, do you have anything that you’d like to plug? Any projects that people should check out, where they can find you on the internet, stuff like that?

Dan: Yeah, thank you guys very much for having me. It was a great conversation. Really enjoyed it. Please check out our new product, itopia Spaces, remote developer environments delivered, powered by Selkies. We launched it last fall and we’re really trying to ramp that up.

And then check out the open-source Selkies project, selkies.io will get you there. And yeah, we’re looking for contributors. Beyond that, you can also find me on Twitter, I’m @danisla, or on LinkedIn.

Jason: Awesome. Well, thanks again for being a part of the show. It’s been fantastic.

Dan: You’re very welcome. Thanks for having me.

Jason: For links to all the information mentioned, visit our website at gremlin.com/podcast. If you liked this episode, subscribe to the Break Things on Purpose podcast on Spotify, Apple Podcasts, or your favorite podcast platform. Our theme song is called Battle of Pogs by Komiku and is available on loyaltyfreakmusic.com.

No items found.

Start your free trial

Gremlin's automated reliability platform empowers you to find and fix availability risks before they impact your users. Start finding hidden risks in your systems with a free 30 day trial.

sTART YOUR TRIAL

Podcast: Break Things on Purpose | Dan Isla: Astronomical Reliability

Transcript

Introducing Custom Reliability Test Suites, Scoring and Dashboards

Treat reliability risks like security vulnerabilities by scanning and testing for them