close
close

first Drop

Com TW NOw News 2024

LOESS in rust
news

LOESS in rust

It’s time to port the Python LOESS code to Rust.

LOESS in rustPhoto by Matt Foxx on Unsplash

More than five years ago, counting from this writing, I published my most successful article here on Medium. That article was born out of a need to filter the data from a particularly noisy sensor from a telematics data stream. Specifically, it was a torque sensor connected to the drive axle of a truck and the noise had to be removed. LOESS was the answer, hence that article.

LOESS

By this time I was deep into Python and the project required Spark, so implementing the algorithm in Python was a no-brainer. However, times have changed and I now use Rust more often and decided to port the old code. This article describes the porting process and my choices in rewriting the code. You should read the original article and reference material to learn more about the algorithm. Here we will focus on the complexities of writing matrix code in Rust, replacing the previous NumPy implementation as closely as possible.

Rest Numerical Computing

Being a firm believer in not reinventing the wheel, I searched for recommended Rust crates to replace my use of NumPy in the original Python code, and it didn’t take long to find them nail.

nail is intended as a general-purpose, low-dimensional, linear algebra library, with an optimized set of tools for computer graphics and physics.

Although we do not use physics or computer graphics, we meet the low dimensionality requirement like a glove.

Differences

When converting the Python code to Rust, I ran into a few issues that took some time to resolve. When using NumPy in Python, we use all the features that both the language and the library provide to improve the expressiveness and readability of the code. Rust is more comprehensive than Python and at the time of writing (version 0.33. 0), nail crate still lacks some features that would help improve its expressiveness. Conciseness is a challenge.

My first hurdle was indexing arrays using other arrays. NumPy allows you to index an array using another array of integers or booleans. In the first case, each element of the indexing array is an index into the source array, and the indexer can have a dimension equal to or less than the data array. In the case of boolean indexing, the indexer must be the same size as the data, and each element must indicate whether or not the corresponding data element should be included. This feature is useful when using boolean expressions to select data.

As useful as it is, I’ve used this feature throughout the Python code:

# Python
xx = self.n_xx(min_range)

Here the variable min_range is an integer array containing the subset of indices to retrieve from the self.n_xx array.

No matter how hard I tried, I couldn’t find a solution in the Rust crate that mimicked NumPy indexing, so I had to implement one. After a few tries and benchmarks, I reached the final version. This solution was simple and effective.

// Rust
fn select_indices(values: &DVector<f64>,
indices: &DVector<usize>) -> DVector<f64> {
indices.map(|i| values(i))
}

The map expression is quite simple, but using the function name is more expressive. Therefore, I replaced the above Python code with the corresponding Rust code:

// Rust
let xx = select_indices(&self.xx, min_range);

There is also no built-in method to create a vector from a set of integers. While it is easy to do with nailthe code becomes a bit long:

// Rust
range = DVector::<usize>::from_iterator(window, 0..window);

We can avoid much of this ceremony if we fix the vector and array sizes at compile time, but we’re out of luck here because the dimensions are unknown. The accompanying Python code is more concise:

# Python
np.arange(0, window)

This brevity extends to other areas as well, such as row-wise filling of a matrix. In Python we can do something like this:

# Python
for i in range(1, degree + 1):
xm(:, i) = np.power(self.n_xx(min_range), i)

As of this writing, I have not found a better way to do the same thing with nail then this:

// Rust
for i in 1..=degree {
for j in 0..window {
xm((j, i)) = self.xx(min_range(j)).powi(i as i32);
}
}

Perhaps there is something hidden in the packaging that needs to be discovered, which could make the text a bit more concise.

Finally I found the nail documentation relatively scarce. We can expect this from a relatively young Rust box that promises a lot for the future.

The positive side

The best comes at the end: the raw performance. I invite you to try both versions of the same code (the GitHub repository links are below) and compare their performance. On my 2019 MacBook Pro 2.6GHz 6-Core Intel Core i7, the edition version of the Rust code runs in less than 200 microsecondswhile the Python code runs in less than 5 minutes milliseconds.

Conclusion

This project was another exciting and educational Python-to-Rust port of my old code. While converting familiar Python control structures to Rust is becoming more accessible by the day, NumPy conversion to nail was more of a challenge. The Rust package is promising, but needs more documentation and online support. I would welcome a more comprehensive user guide.

Rust is more ceremonial than Python, but performs much better when used properly. I will continue to use Python for my day-to-day work when building prototypes and in discovery mode, but I will be using Rust for performance and memory safety when I go to production. We can even mix and match the two using crates like PyO3, so it’s a win-win.

Rust is awesome!

References

joaofig/loess-rs: An implementation of the LOESS/LOWESS algorithm in Rust. (github.com)

joaofig/pyloess: A simple implementation of the LOESS algorithm using numpy (github.com)

Credits

I used Grammarly to check the text and accepted some of the suggestions for rewriting.

JetBrains AI Assistant helped me write some code and I also used it to learn Rust. It has become an integral part of my daily work with both Rust and Python. Unfortunately, support for nail is still short.

João Paulo Figueira is a data scientist at Daimler Truck’s tb.lx in Lisbon, Portugal.


LOESS in Rust was originally published in Towards Data Science on Medium, where people continued the conversation by bookmarking and commenting on this story.