内联优化

进入与退出一个未内联的热点函数,往往会占用执行时间不可忽略的一部分。 内联这些函数可以得到小但简单的速度提升。

Rust函数的内联属性有四种:

  • - 编译器会自己决定该函数是否应该内联——根据优化等级,它的大小,等等。
  • 如果你没有使用链接时优化,函数永远不会跨crate内联。
  • #[inline] - 建议内联该函数,即使跨crate
  • #[inline(always)] - 非常建议内联该函数,包括跨crate
  • #[inline(never)] - 不建议内联该函数

内敛属性并不保证函数是否被内联,但是实际上,除了极少的个例,#[inline(always)] 后都会发生内联。

简单的情况

The best candidates for inlining are (a) functions that are very small, or (b) functions that have a single call site. The compiler will often inline these functions itself even without an inline attribute. But the compiler cannot always make the best choices, so attributes are sometimes needed. Example 1, Example 2, Example 3, Example 4, Example 5.

Cachegrind is a good profiler for determining if a function is inlined. When looking at Cachegrind’s output, you can tell that a function has been inlined if (and only if) its first and last lines are not marked with event counts. For example:

      .  #[inline(always)]
      .  fn inlined(x: u32, y: u32) -> u32 {
700,000      eprintln!("inlined: {} + {}", x, y);
200,000      x + y
      .  }
      .  
      .  #[inline(never)]
400,000  fn not_inlined(x: u32, y: u32) -> u32 {
700,000      eprintln!("not_inlined: {} + {}", x, y);
200,000      x + y
200,000  }

You should measure again after adding inline attributes, because the effects can be unpredictable. Sometimes it has no effect because a nearby function that was previously inlined no longer is. Sometimes it slows the code down. Inlining can also affect compile times, especially cross-crate inlining which involves duplicating internal representations of the functions.

Harder Cases

Sometimes you have a function that is large and has multiple call sites, but only one call site is hot. You would like to inline the hot call site for speed, but not inline the cold call sites to avoid unnecessary code bloat. The way to handle this is to split the function always-inlined and never-inlined variants, with the latter calling the former.

For example, this function:

#![allow(unused)]
fn main() {
fn one() {};
fn two() {};
fn three() {};
fn my_function() {
    one();
    two();
    three();
}
}

Would become these two functions:

#![allow(unused)]
fn main() {
fn one() {};
fn two() {};
fn three() {};
// Use this at the hot call site.
#[inline(always)]
fn inlined_my_function() {
    one();
    two();
    three();
}

// Use this at the cold call sites.
#[inline(never)]
fn uninlined_my_function() {
    inlined_my_function();
}
}

Example 1, Example 2.