Finding and fixing memory leaks in a Hyper application or 'How I Learned to Stop Worrying and Love the Allocator'


I was doing some initial load testing of the next version our application, so that performance regressions can be tracked, when I noticed something.

Something bad.

Something scary.

After only a few seconds of throwing wrk at it, our backend was using 1.3GB of memory, growing at around 50MB/s. Yikes.

Initial steps

One of the biggest takeaways from trying to debug a memory leak in Rust is that memory leakage is not memory unsafety. So while lashing out at the borrow checker may have been my first reaction, it’s not its fault and it did nothing wrong. Poor lil’ borrow checker.

In the words of Capt. Louis Renault, “Round up the usual suspects!”; I started with heaptrack.

What?! Just over 8KB allocated?

The issue is that heaptrack isn’t playing nice with jemalloc, and the situation isn’t better with any Valgrind tools. We need to use the system allocator, which can be done by adding the following to our main.rs:

use std::alloc::System;

#[global_allocator]
static A: System = System;

Now we’re cooking with gas! Our graph looks much better:

What is heaptrack telling us? A bit, but nothing super useful as it’s not parsing the debug symbols completely correctly.

Valgrind's massif led us on a similar wild goose chase; it was time for a change in tactics. Enter the…

Reduced test case

If we can’t find out what is leaking, only whether the program as a whole is or not, our next best option is to start removing parts of the program until the leak goes away. Isn’t there a tool that does the hard work for you? Yes, C-Reduce! Will it work for Rust? No! It only works at a syntactical level, can’t deal with macros (cargo-expand doesn’t help), testing is really slow, this was really one of my worst ideas.

Luckily, my colleague Jonas was thinking sensibly and actually worked to find a minimal example, seen here:

#![feature(alloc_system)]
#![feature(allocator_api)]
#[global_allocator]
static GLOBAL: std::alloc::System = std::alloc::System;

#[macro_use]
extern crate failure;
extern crate future_utils;
extern crate hyper;
extern crate tokio;

use failure::Error;
use future_utils::{BoxSendFuture, FutureExt};
use hyper::service::Service;
use hyper::Body;
use std::time::Duration;
use tokio::prelude::*;

fn main() {
    let server = hyper::Server::bind(&"127.0.0.1:31337".parse().unwrap())
        .serve(|| Ok::<_, Box<std::error::Error + Send + Sync>>(Server));
    let mut rt = tokio::runtime::Runtime::new().unwrap();
    rt.spawn(server.map_err(move |_err| {}));
    println!("server started");
    ::std::thread::sleep(Duration::from_secs(10));
    println!("shutting down");
    rt.shutdown_now().wait().unwrap();
}

#[derive(Debug, Fail)]
#[fail(display = "myfail")]
pub struct MyFail;

pub struct Server;
impl Service for Server {
    type ReqBody = Body;
    type ResBody = Body;
    type Error = hyper::Error;
    type Future = BoxSendFuture<hyper::Response<Self::ResBody>, Self::Error>;

    fn call(&mut self, _request: hyper::Request<Self::ReqBody>) -> Self::Future {
        Err(Error::from(::MyFail))
            .into_future()
            .or_else(|err: Error| Ok(err.downcast::<::MyFail>().unwrap().into_future()) // removing this gets rid of the leak
            .map(|_| ::hyper::Response::new(Body::empty()))
            .into_send_boxed()
    }
}

When .into_future() is called on the downcasted error, the leak occurs. Maybe a quick use of cargo-geiger will show us a single unsafe function that’s causing the leak?

“My God it’s full of unsafe code!” I exclaim, debugging our monolithic application. Digging through all that is unlikely to be our solution.

Issue and fix

Philipp, another colleague, had a hunch and looking into the source of the downcast method in failure, revealed a tricky mem::forget.

Jonas explains it well: “This happens because Error::downcast does not drop the Box<Inner<Fail>> contained within the Error, it only drops the backtrace contained in that box and moves out the error type itself.”

pub(crate) fn downcast<T: Fail>(self) -> Result<T, ErrorImpl> {
        let ret: Option<T> = self.failure().downcast_ref().map(|fail| {
            unsafe {
                // drop the backtrace
                let _ = ptr::read(&self.inner.backtrace as *const Backtrace);
                // read out the fail type
                ptr::read(fail as *const T)
            }
        });
        match ret {
            Some(ret) => {
                // forget self (backtrace is dropped, failure is moved
                mem::forget(self);
                Ok(ret)
            }
            _ => Err(self)
        }
    }

The initial fix by mitsuhiko was the following:

// deallocate the box without dropping the inner parts
#[cfg(has_global_alloc)] {
    use std::alloc::{dealloc, Layout};
    unsafe {
        let layout = Layout::for_value(&*self.inner);
        let ptr = Box::into_raw(self.inner);
        dealloc(ptr as *mut u8, layout);
    }
}

This still leaks on rust versions 1.18 and older, but Philipp had an alternative solution that works on all versions, and has since been merged:

pub(crate) fn downcast<T: Fail>(self) -> Result<T, ErrorImpl> {
    if self.failure().__private_get_type_id__() == TypeId::of::<T>() {
        let ErrorImpl { inner } = self;
        let casted = unsafe { Box::from_raw(Box::into_raw(inner) as *mut Inner<T>) };
        let Inner { backtrace:_, failure } = *casted;
        Ok(failure)
    } else {
        Err(self)
    }
}

Summary

wrk with 1000 connections was turned on and off repeatedly and the above graph shows the leak is fixed; a sharp climb in usage that then levels out.

What did we learn? Well, apart from the fact that a task is easier if your colleagues do all the hard parts, not much we didn’t know before. Leaving the warm embrace of safe Rust for the cold, unforgiving world of unsafe will sometimes result in bad things happening. That doesn’t mean it isn’t necessary but it also doesn’t mean it should be done lightly.

You may have noticed I neglected to return to why none of the memory profiling tools “saw” 100% of the memory actually being allocated. Well done if you did.