site banner

Friday Fun Thread for April 19, 2024

Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

1
Jump in the discussion.

No email address required.

The Fussy Suitor Problem: A Deeper Lesson on Finding Love

Inspired by the Wellness Wednesday post post by @lagrangian, but mostly for Friday Fun, the fussy suitor problem (aka the secretary problem) has more to teach us about love than I initially realized.

The most common formulation of the problem deals with rank of potential suitors. After rejecting r suitors, you select the first suitor after r that is the highest ranking so far. Success is defined as choosing the suitor who would have been the highest ranking among the entire pool of suitors (size n). Most analyses focus on the probability of achieving this definition of success, denoted as P(r), which is straightforward to calculate. The “optimal” strategy converges on setting r = n/e (approximately 37% of n), resulting in a success rate of about 37%.

However, I always found this counterintuitive. Even with optimal play, you end up failing more than half the time.

In her book The Mathematics of Love Hanna Fry suggests, but does not demonstrate, that we can convert n to time, t. She also presents simulations where success is measured by quantile rather than absolute rank. For instance, if you end up with someone in the 95th percentile of compatibility, that might be considered a success. This shifts the optimal point to around 22% of t, with a success rate of 57%.

Still, I found this answer somewhat unsatisfying. It remains unclear how much less suitable it is to settle for the 95th percentile of compatibility. Additionally, I wondered if the calculation depends on the courtship process following a uniform geometric progression in time, although this assumption is common.

@lagrangian pointed out to me that the problem has a maximum expected value for payoff at r = sqrt(n), assuming uniform utility. While a more mathematically rigorous analysis exists, I decided to start by trying to build some intuition through simulation.

In this variant of we consider payoff in utilitons (u) rather than just quantile or rank information. For convenience, I assume there are 256 suitors.

The stopping point based on sqrt(n) grows much more slowly than the n/e case, so I don’t believe this significantly alters any qualitative conclusions. I’m pretty sure using the time domain here depends on the process and rate though.

I define P(miss) as the probability of missing out or accidentally exhausting the suitors, ultimately “settling” for the 256th suitor. In that case you met the one, but passed them up to settle for the last possible persion. Loss is defined as the difference in utility between the suitor selected by stopping at the best suitor encountered after r, and the utility that would have been gained by selecting the actual best suitor. Expected Shortfall (ES) is calculated at the 5th percentile.

I generate suitors from three underlying utility distributions:

  • Exponential: Represents scenarios where there are pairings that could significantly improve your life, but most people are unsuitable.
  • Normal: Assumes the suitor’s mutual utility is an average of reasonably well-behaved (mathematically) traits.
  • Uniform: Chosen because we know the optimal point.

For convenience, I’ve set the means to 0 and the standard deviation to 1. If you believe I should have set the medians of the distributions to 0, subtract log(2) utilitons from the mean(u) exponential result.

Running simulations until convergence with the expected P(r), we obtain the following results:


| gen_dist |    r    | P(r) | P(miss) | <u> | <loss> | sd_loss | ES_5 | max_loss |
|----------|---------|------|---------|-----|--------|---------|------|----------|
|   exp    |   n/e   | 37%  |   19%   | 2.9 |  2.2   |   2.5   | 7.8  |   14.1   |
|   exp    | sqrt(n) | 17%  |   3%    | 3.0 |  2.1   |   1.8   | 6.6  |   14.8   |
|----------|---------|------|---------|-----|--------|---------|------|----------|
|   norm   |   n/e   | 37%  |   19%   | 1.7 |  1.2   |   1.5   | 4.6  |   7.0    |
|   norm   | sqrt(n) | 18%  |   3%    | 2.0 |  0.8   |   0.8   | 3.3  |   6.3    |
|----------|---------|------|---------|-----|--------|---------|------|----------|
|   unif   |   n/e   | 37%  |   19%   | 1.1 |  0.6   |   1.0   | 3.2  |   3.5    |
|   unif   | sqrt(n) | 17%  |   3%    | 1.5 |  0.2   |   0.5   | 2.1  |   3.5    |

What was most surprising to me is that early stopping (r = sqrt(n)) yields better results for both expected utility and downside risk. Previously, I would have assumed that since the later stopping criterion (r = n/e) is more than twice as likely to select the best suitor, the expected shortfall would be lower. However, the opposite holds true. You are more than 6 times as likely to have to settle in this scenario, so even if suitability is highly skewed as in the exponential case, expected value is still in favor of the r=sqrt(n) case! This is a completely different result than the r=n/e I had long accepted as optimal. The effect is even far more extreme than even the quantile-time based result.

All cases yield a positive expectation value. Since we set the mean of the generating distributions to 0, this implies that on average having some dating experience before deciding is beneficial. Don’t expect your first millihookup to turn into a marriage, but also don’t wait forever.

I should probably note for low, but plausible n <= 7, sqrt(n) is larger than n/e, but the whole number of suitors mean the optimal r (+/-1) is still given in the standard tables.

One curious factoid, is that actuaries are an appreciable outlier in terms of having a the lowest likelihood of divorce. Do they possess insights about modeling love that the rest of us don’t? I’d be very interested if anyone has other probabilistic models of relationship success. What do they know that the rest of the life, physical, and social sciences don't? Or is it that they are just more disposed to finding a suitable "good" partner than the one.

Thanks a lot for this. Do you have a pointer to your code, or could you put it up on Github/Codeberg?

I wasn't planing on publishing the source, since my code it is a bit idiosyncratic, but I guess there seems to be enough interest.

A pastebin with the code. Uhh, I guess I didn't put a license statement. Let's say BSD Zero Clause License. Do what you want, but don't blame me if it ruing your love life.

Is there a way to publish a pseudonymous/anonymous gist on Github?

Cheers, backed it up here: https://git.nunosempere.com/NunoSempere/fussy-suitor/src/branch/master/code.R

I'm not sure if there is a way to publish pseudonymously on Github. You could create a separate account (e.g., on Codeberg or on Gitlab) though

I would like to point out that once you throw the win condition of "you have to pick the best" out of the window, and instead try to optimize the EV of a real-valued score, strategies should change.

Basically, you want to still skip the first r suitors to get some information about the underlying distribution. Ones you have an idea about the distribution (possibly Pareto or normal), you want to compare the utility of the current suitor to the expected value you would get from the remainder of the queue (and keep updating your estimate of the distribution, naturally).

This means that you should pick the second-to-last suitor if they are above average utility. (Assuming that all suitors are positive utility as compared to not picking anyone, which is not the case in dating.)

It might also be better not have a hard cut at any given r, and instead have some penalty to early picking which represents "I forfeit the possibility of making a more informed choice".

If your sequence starts as -2, 5, 6e23, then there is some case to be made to marry Mr/Ms Avagadro without considering the rest of the queue. Of course, if the distribution you are sampling is a bimodal distribution which is 2/3 a boring normal distribution around zero with sigma=4 and 1/3 (1d12)^(1d100) (using roleplaying dice syntax), then you would be better served to keep sampling.

This why you get no bitches

Ignore me, I'm mildly salty because despite having the dubious distinction of being the first to apply the Secretary Problem in the context of dating, at least on The Motte, I lack the patience or mathematical astuteness necessary for such an in depth analysis. It's highly appreciated, what else can I do but hit AAQC?

the Secretary Problem was always about dating

The secretary problem was apparently introduced in 1949 by Merrill M. Flood, who called it the fiancée problem in a lecture he gave that year.

I think it just got renamed "secretary problem" to sound more genteel and respectable. Plus maybe some 1960s wink-wink nudge-nudge understanding that most secretaries at an office would get married to one of the bosses.

Well, I guess the reduction in salt intake is good for my BP.

How'd it go again "Great minds think alike, and fools seldom differ?".

Thanks for this!

One modification I wonder if you'd be willing to simulate for us: what if we don't strictly require stopping at n? You could formalize it to something like "given a strategy that on average samples n suitors when run it to its stopping point, what are the loss/etc?" p(miss) becomes 0 by definition.

That feels more like real life: if the goal is to stop dating by 35, but I happen to be dating Hitler when I hit 35, I'm probably gonna push it to 36.

It seems to me like the best way to model this would be to have some multiplicative scaling factor on utility that diminishes over time, since 50 years of your life with the second best suitor is (probably) going to be better than spending 20 years of your life with the best suitor. Perhaps a linear decay to simulate amount of lifespan remaining, so the utility of choosing the nth suitor is their actual quality multiplied by (1-0.01n). Or maybe weight it more towards the beginning to account for youth and childbearing years, like (0.99)^n or something.

It's interesting what you suggested is almost the opposite of the scenario @Felagund suggested. I suppose a hopeless romantic would not want to risk the potentially corrosive effect of having knowingly settled. I assume that in practice you would combine some knowledge of the current rate, the steepness of the expected falloff, some pure romantic inclination, and some fear of missing out into some heuristic.

In the scenario where we keep n from above, but keep going if we still haven't found the one I do think is interesting. If we set our benchmark at r=sqrt(n), 83% of the time you find your partner before n/e. Assuming (offset) exponentially distributed utility, the expected utility in this case is about the same as in the case where we assumed halting. I guess this is like the plethora of people who marry someone they meet in college? In about 10% of the cases there you manage to find a partner before the expected window closes, and patients is rewarded with about 50% more utility (4.5 vs 3).

I then assumed some very questionable things to set the next boundaries. First, we can transpose to time as above. Second, that we care about marriage with respect to producing children. Putting geriatric maternal age at 35-40, and assuming you would just offset paternal age so we don't have to deal with an extra set of scenarios, I find a new cutoff of 320/256. I think this sort of accommodates @jeroboam's point. In that case not stopping, but being willing to continue into the danger zone, 1.3% of the runs find the one by "40." Of course expected utility is higher at 5.2, but being willing to push age, but unwilling to settle only picked up a small number of additional "successes."

In the remaining 5% cases you eventually find your soulmate with an expected utility of 6.4. You do have to wait exponentially long though, with a median age equivalent of 67, and a mean of 343!

Setting the high water mark at n/e, but being unwilling to stop is similar in utility. Now you've eliminated the 3 unit of expected utility bucket, and the 4.5 unit utility bucket has 63% weight. Your willingness to go into the (questionably) age equivalent 35-40 bucket also preserves 7% of the trials. By setting your benchmark so late though, 30% of the time you miss the critical window. The higher expected utility, I guess, represents it being totally worth it to find your soul mate, assuming there was no penalty for waiting past geriatric pregnancy age.


@self_made_human don't worry I know these simulations are entirely irrelevant to us denizen of themotte, thus the fun thread and why I included the note on n <= 7, ಥ_ಥ

I really hope I don't have to resort to necrophilia by the time I'm 36, but either way, I'm sure the coroner will cut me some slack.

We all share that same hope, if I might dare a post that suggests consensus.

Mods, twist his nuts

(I really had to use all my willpower not to put the mod hat on for this comment, be thankful)

Or is it that they are just more disposed to finding a suitable "good" partner than the one.

I'd guess it's a few things. Pragmatic and risk averse personalties to start. They probably know to avoid cluster-b partners. Other people in the sciences probably are less aware and get caught up in the passion and excitement.

The high divorce risk occupations seem to involve common things. Unstable hours which disrupt home life and provide opportunity for cheating. Jobs that provide a lot of opportunities to meet attractive singles.

Actuaries are probably also better at evaluating the pain of divorce and their prospects on the singles market.

Also it probably helps that they were never that exciting in their 20s. Women who married a guy in a rock band are going to have to reconcile when the man gives up on it and gets an office job. Whereas the actuary has gone from diligent student to starting actuary to better paid actuary. So she knew what she was getting into.

And who needs divorce when you can encourage your spouse into dangerous activities that steadily increase their risk of death?

We also have to factor in (for women) a declining sexual market value. So the max score of suitors goes down over time, thus favoring settling earlier.

For men, the max score of potential mates goes up at first, peaks in the 30s, and then declines.

With an average age of first marriage for men of 30.5 and women of 28.6, women are being too picky and men not picky enough. You see, it's science.

Why is it always choose them only if they're better than the best up to that point? At some point wouldn't point, wouldn't it become better to settle for gradually worse partners? (obvious case: in a uniform distribution, only two people left, and you get someone who's at the 75th percentile)

In the original construction, you win if you choose the absolute best partner, you lose if you do not, so the 75th percentile is a guaranteed loss, no different from the bottom 1 percentile. You only want to maximize the probability of getting the actual best, so it has to be better than anything you've seen so far or there's no chance and no point settling.

However, you are right that in this modified version trying to maximize utility this no longer applies, and a proper optimal strategy should probably be a function f(n,d) describing what percentile you're willing to settle on as a function of what time step it is (n) and what your estimate of the distribution is (d), depending on what you've seen so far and your meta knowledge.