site banner

Wellness Wednesday for March 22, 2023

The Wednesday Wellness threads are meant to encourage users to ask for and provide advice and motivation to improve their lives. It isn't intended as a 'containment thread' and any content which could go here could instead be posted in its own thread. You could post:

  • Requests for advice and / or encouragement. On basically any topic and for any scale of problem.

  • Updates to let us know how you are doing. This provides valuable feedback on past advice / encouragement and will hopefully make people feel a little more motivated to follow through. If you want to be reminded to post your update, see the post titled 'update reminders', below.

  • Advice. This can be in response to a request for advice or just something that you think could be generally useful for many people here.

  • Encouragement. Probably best directed at specific users, but if you feel like just encouraging people in general I don't think anyone is going to object. I don't think I really need to say this, but just to be clear; encouragement should have a generally positive tone and not shame people (if people feel that shame might be an effective tool for motivating people, please discuss this so we can form a group consensus on how to use it rather than just trying it).

5
Jump in the discussion.

No email address required.

A little bit of a life hack for those wanting to learn technical fields fast but only as far as the application is concerned. Might be very obvious to some, but new to me.

By this I mean, let's say you want to find the area under the curve bounded by f(x)you learn there is a technique called integration. You will just call integrate(f(x),lb,ub) in your programming language of choice and wouldn't need to waste 4 months of your life learning how to compute integrals by hands. All you need to know is that a thing called integration exists and it's edge cases such as negative area. And you are off to the races, you can "learn" integration in a day.

Unfortunately, I haven't found any such book that caters to this type of "learning" for anything. I've been trying to learn optimization and every book I've come across it's the whole dog and pony show of hand calculating the solution. I don't need that! I have tools like PuLP, Gekko and PyMOO to do that for me. Just tell me the name of the thing that finds the area under the curve! (Gekko looks very cool to my untrained eye, most simple syntax and has ML integration for objective and constraint functions)

Solution? Read the documentation of packages that solve the general class of problems. I'm getting some decent mileage reading the docs of the packages above. Most documentation for technical problem solving packages have some sort of theoretical introduction in the beginning.

The approach above does have some silly failure modes. I initially learned ML by reading Sk-learn documentation years back and not knowing theory made me do retarded shit like applying scaling before splitting into test-train sets or using regression metrics for classification problems, etc. I don't exactly know how to avoid this without actually knowing the theory. (Sk-learn documentation is actually very good, I just bungled it by trying to rush it)

@JhanicManifold any opinions on approaching "learning" optimization this way? Any pitfalls to watch out for?

I'd consider negative "area" to be a central case of integration, part of the basic definition, rather than an edge case ... but "edge" vs "central" is a matter of opinion.

I would say the worst edge case is functions without a priori bounds (whether on the function itself, on one of its derivatives, on variance...). Show me the most advanced deterministic adaptive quadrature algorithm your package has, and allow me to pick a truly arbitrary f(x) to give to it, and I can make the error (actual_integral - integrate(f(x),lb,ub)) arbitrarily large, even if actual_integral is 1. Use a stochastic algorithm and I can't guarantee "arbitrarily large" for the error, but I can still get "arbitrarily large in all but an arbitrarily small fraction of runs".

Your package docs will hopefully warn you about that, if you know what you're looking for. I just checked Matlab integral for an example, and a careful reader of "uses the absolute error tolerance to limit an estimate of the absolute error" in their docs will stop and say "wait, an estimate?" Looking at scipy, the quad doc says "An estimate of the absolute error", but the quad tutorial just says "an upper bound on the error", which they probably think is a fine description because it's almost always true in practice...

Sorry if this all sounds kind of nitpicky. I might be feeling pissy right now because I wrote an algorithm which is perfect in exact arithmetic and has been getting heavy use for months, becoming a core part of user workflows ... and now people are finding floating-point-arithmetic failure cases that I really should have anticipated.

I think instead of reading the documentation of libraries I'd try to read the main books on the theory, but only reading the first part of every chapter and very lightly skimming the rest, doing none of the exercises. Like, you understand what an integral is within like 30 minutes of having the problem stated to you, and the best ressources to be introduced to the problem are still the textbooks, they just happen to come with hundreds of pages you don't need. When I first learned ML I just spent 2 days reading Murphy's 2012 book without worrying about any of the details, I just wanted to get a small introduction to literally every method so that I could get a sort of mental picture of the entire field.

The problem with only knowing the theory so shallowly is that you're kind of brittle to modifications and expansions of the problem statement. Like, how would you solve the following problem: you have a company that has an industrial process with 10 free parameters, and they have a yield function F(x) : R^10 -> R that they want to maximize, but the exact physics if the process is unknown or very complicated. Each trial run to compute F(x) costs 1 million dollars (and of course F(x) has some unknown variance), so so far they only have around 30 data points {x_i, F(x_i)}. Your job is to advise them on how to pick the next point x_i to try in order to maximize their profits (which increase with yield, but decrease with additional suboptimal trials). And what if F(x) is nonstationary? Maybe the machines degrade over time, and the optimal parameters change...

Funny you say that because in my experience Youtube ML tutorials have been mostly terrible and make those kinds of mistakes as well. There are tutorials on many abstraction levels and the ones I am talking about are the ones that go straight into implementing a solution. In retrospect, this problem can be avoided by NOT skipping all the slides of the PowerPoint that don't have any code or math in them, as they usually cover a high level overview.