Mostly Engineering
https://maciejkula.github.io/
Recent content on Mostly EngineeringHugo -- gohugo.ioen-usFri, 27 Jul 2018 09:17:00 -0700Recommending books (with Rust)
https://maciejkula.github.io/2018/07/27/recommending-books-with-rust/
Fri, 27 Jul 2018 09:17:00 -0700https://maciejkula.github.io/2018/07/27/recommending-books-with-rust/In this post, we’re going to build a sequence-based recommender system in Rust: a system that accepts a person’s reading history as input, and outputs recommendations on what to read next.
Building systems like this – like much of machine learning and data science – is normally the province of Python. The combination of numpy, pandas, and other libraries that build on them makes doing data science in Python a breeze.Building an autodifferentiation library
https://maciejkula.github.io/2018/07/18/building-an-autodifferentiation-library/
Wed, 18 Jul 2018 17:38:00 +0100https://maciejkula.github.io/2018/07/18/building-an-autodifferentiation-library/This blog post originally appeared on Medium
Popular general-purpose auto-differentiation frameworks like PyTorch or TensorFlow are very capable, and, for the most part, there is little need for writing something more specialized.
Nevertheless, I have recently started writing my own autodiff package. This blog post describes what I’ve learned along the way. Think of this as a poor-man’s version of a Julia Evans blog post.
Note that there are many blog posts describing the mechanics of autodifferentiation much better than I could, so I skip the explanations here.Don't use explicit feedback recommenders
https://maciejkula.github.io/2018/07/19/dont-use-explicit-feedback-recommenders/
Thu, 19 Jul 2018 19:02:00 +0100https://maciejkula.github.io/2018/07/19/dont-use-explicit-feedback-recommenders/Back in January, I gave a talk at the London RecSys Meetup about why explicit feedback recommender models are inferior to implicit feedback models in the vast majority of cases.
The key argument is that what people choose to rate or not rate expresses a more fundamental preference than what the ratings is. Ignoring that preference and focusing on the gradations of preference within ranked items is the wrong choice.My approximate nearest neighbour talk at Europython 2015
https://maciejkula.github.io/2015/08/06/my-approximate-nearest-neighbour-talk-at-europython-2015/
Thu, 06 Aug 2015 00:00:00 +0000https://maciejkula.github.io/2015/08/06/my-approximate-nearest-neighbour-talk-at-europython-2015/At this year’s Europython, I presented a talk on ‘Speeding up search with locality sensitive hashing’.
In the talk, I presented the intuition behind locality sensitive hashing and discussed several Python packages for performing approximate nearest neighbour searches (including my own, rpforest).
If you are interested in the details, please see my blog post on the Lyst developers blog. The recording of the talk and slides are embedded below.
Talk:Simple MinHash implementation in Python
https://maciejkula.github.io/2015/06/01/simple-minhash-implementation-in-python/
Mon, 01 Jun 2015 00:00:00 +0000https://maciejkula.github.io/2015/06/01/simple-minhash-implementation-in-python/MinHash is a simple but effective algorithm for estimating set similarity using the Jaccard index. Both the Wikipedia entry and this blog post are good explanations of how it works.
MinHash is attractive because it allows us to decide how similar two sets are without having to enumerate all of their elements. If we want to know how many users that performed action $A$ also performed action $B$, we can compare the MinHashes of the two sets instead of keeping track of multiple sets of millions of user ids.Incremental construction of sparse matrices
https://maciejkula.github.io/2015/02/22/incremental-construction-of-sparse-matrices/
Sun, 22 Feb 2015 00:00:00 +0000https://maciejkula.github.io/2015/02/22/incremental-construction-of-sparse-matrices/Sparse matrices are an indispensable tool – because only non-zero entries are stored, they store information efficiently and enable (some) fast linear algera operations.
In Python, sparse matrix support is provided by scipy in scipy.sparse. They come in a number of flavours. Crucially, there are those that use efficient storage and/or support fast linear algebra operations (csr_matrix, csc_matrix, and coo_matrix), and those that enable efficient incremental construction and/or random element access (lil_matrix, dok_matrix).Calling BLAS from Cython
https://maciejkula.github.io/2015/01/01/calling-blas-from-cython/
Thu, 01 Jan 2015 00:00:00 +0000https://maciejkula.github.io/2015/01/01/calling-blas-from-cython/It is often useful to be able to call BLAS routines directly from Cython. Doing so avoids calling the corresponding NumPy functions (which would incur a performance penalty of running interpreted code and type and shape checking) as well as re-implementing linear algebra operations in Cython (which will likely be both incorrect and slower).
Existing Cython BLAS wrappers Correspondingly, there are several ways of doing so.
CythonGSL provides Cython wrappers for the GNU Scientific Library.
https://maciejkula.github.io/about/
Mon, 01 Jan 0001 00:00:00 +0000https://maciejkula.github.io/about/ Maciej Kula I’m a machine learning engineer, working mostly on recommender systems.
Find me on Github and Twitter.
Software LightFM, a hybrid recommender system Spotlight, a research package for deep recommender systems Wyrm, a define-by-run autodifferentiation framework in Rust sbr-rs, a lightweight recommender system library in Rust