Predicting Out Of Memory Kill events with Machine Learning (Ep. 203)

Predicting Out Of Memory Kill events with Machine Learning (Ep. 203)

2 View

Publish Date:
20 September, 2022
Category:
Information Technology
Video License
Standard License
Imported From:
Youtube

Source:
https://www.podbean.com/eau/pb-uvzjg-129eb0b

Sometimes applications crash. Some other times applications crash because memory is exhausted. Such issues exist because of bugs in the code, or heavy memory usage for reasons that were not expected during design and implementation. Can we use machine learning to predict and eventually detect out of memory kills from the operating system?
Apparently, the Netflix app many of us use on a daily basis leverage ML and time series analysis to prevent OOM-kills.
Enjoy the show!
Our Sponsors
Explore the Complex World of Regulations. Compliance can be overwhelming. Multiple frameworks. Overlapping requirements. Let Arctic Wolf be your guide.Check it out at https://arcticwolf.com/datascience
 
Amethix works to create and maximize the impact of the world’s leading corporations and startups, so they can create a better future for everyone they serve. We provide solutions in AI/ML, Fintech, Healthcare/RWE, and Predictive maintenance.
 
Transcript
100:00:04,150 --> 00:00:09,034And here we are again with the season four of the Data Science at Home podcast.
200:00:09,142 --> 00:00:19,170This time we have something for you if you want to help us shape the data science leaders of the future, we have created the the Data Science at Home's Ambassador program.
300:00:19,340 --> 00:00:28,378Ambassadors are volunteers who are passionate about data science and want to give back to our growing community of data science professionals and enthusiasts.
400:00:28,534 --> 00:00:37,558You will be instrumental in helping us achieve our goal of raising awareness about the critical role of data science in cutting edge technologies.
500:00:37,714 --> 00:00:45,740If you want to learn more about this program, visit the Ambassadors page on our website@datascienceathome.com.
600:00:46,430 --> 00:00:49,234Welcome back to another episode of Data Science at Home podcast.
700:00:49,282 --> 00:00:55,426I'm Francesco Podcasting from the Regular Office of Amethyx Technologies, based in Belgium.
800:00:55,618 --> 00:01:02,914In this episode, I want to speak about a machine learning problem that has been formulated at Netflix.
900:01:03,022 --> 00:01:22,038And for the record, Netflix is not sponsoring this episode, though I still believe that this problem is a very well known problem, a very common one across factors, which is how to predict out of memory kill in an application and formulate this problem as a machine learning problem.
1000:01:22,184 --> 00:01:39,142So this is something that, as I said, is very interesting, not just because of Netflix, but because it allows me to explain a few points that, as I said, are kind of invariance across sectors.
1100:01:39,226 --> 00:01:56,218Regardless of your application, is a video streaming application or any other communication type of application, or a fintech application, or energy, or whatever, this memory kill, out of memory kill still occurs.
1200:01:56,314 --> 00:02:05,622And what is an out of memory kill? Well, it's essentially the extreme event in which the machine doesn't have any more memory left.
1300:02:05,756 --> 00:02:16,678And so usually the operating system can start eventually swapping, which means using the SSD or the hard drive as a source of memory.
1400:02:16,834 --> 00:02:19,100But that, of course, will slow down a lot.
1500:02:19,430 --> 00:02:45,210And eventually when there is a bug or a memory leak, or if there are other applications running on the same machine, of course there is some kind of limiting factor that essentially kills the application, something that occurs from the operating system most of the time that kills the application in order to prevent the application from monopolizing the entire machine, the hardware of the machine.
1600:02:45,710 --> 00:02:48,500And so this is a very important problem.
1700:02:49,070 --> 00:03:03,306Also, it is important to have an episode about this because there are some strategies that I've used at Netflix that are pretty much in line with what I believe machine learning should be about.
1800:03:03,368 --> 00:03:25,062And usually people would go for the fancy solution there like this extremely accurate predictors or machine learning models, but you should have a massive number of parameters and that try to figure out whatever is happening on that machine that is running that application.
1900:03:25,256 --> 00:03:29,466While the solution at Netflix is pretty straightforward, it's pretty simple.
2000:03:29,588 --> 00:03:33,654And so one would say then why making an episode after this? Well.
2100:03:33,692 --> 00:03:45,730Because I think that we need more sobriety when it comes to machine learning and I believe we still need to spend a lot of time thinking about what data to collect.
2200:03:45,910 --> 00:03:59,730Reasoning about what is the problem at hand and what is the data that can actually tickle the particular ma


Did you miss our previous article...
https://techvideos.club/information-technology/computer-scientist-explains-one-concept-in-5-levels-of-difficulty-wired