Incredibly Fast Random Sampling in Python
We need speed in random sampling. How fast can we go?
Need random sampling in Python? Generally, one can turn to therandom
or numpy
packages’ methods for a quick solution. In fact, we solve 99% of our random sampling problems using these packages’ methods.
But, we recently came across a random sampling problem that we could not solve with such ease. For our random sampling problem, we needed:
- A specified sample size
- A specified number of samples
- Sampling without replacement
- A specified inclusion probability of each element’s inclusion in a given sample
Let’s take a look at different ways to solve this problem in Python. The set up for this problem includes a few variables. First, a set of elements (represented as indices). Second, a list of probabilities corresponding to inclusion probabilities in a given sample. Here is our setup:
import random
import numpy as np# constants
num_elements = 20
num_samples = 1000
sample_size = 5
elements = np.arange(num_elements)# probabilities should sum to 1
probabilities = np.random.random(num_elements)
probabilities /= np.sum(probabilities)