Incredibly Fast Random Sampling in Python

Ethan Koch
4 min readJun 10, 2019

We need speed in random sampling. How fast can we go?

Need random sampling in Python? Generally, one can turn to therandom or numpy packages’ methods for a quick solution. In fact, we solve 99% of our random sampling problems using these packages’ methods.

But, we recently came across a random sampling problem that we could not solve with such ease. For our random sampling problem, we needed:

  • A specified sample size
  • A specified number of samples
  • Sampling without replacement
  • A specified inclusion probability of each element’s inclusion in a given sample

Let’s take a look at different ways to solve this problem in Python. The set up for this problem includes a few variables. First, a set of elements (represented as indices). Second, a list of probabilities corresponding to inclusion probabilities in a given sample. Here is our setup:

import random
import numpy as np
# constants
num_elements = 20
num_samples = 1000
sample_size = 5
elements = np.arange(num_elements)
# probabilities should sum to 1
probabilities = np.random.random(num_elements)
probabilities /= np.sum(probabilities)

Method 1 — Native Python loops

--

--