This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.
Jump in the discussion.
No email address required.
Notes -
Could you expand what you mean by this? I'd think neural networks would be a local maximum.
I remember in the 90s futurists thought machine translation would replace human translators fairly soon, because the simple algorithm of looking up target language words in a hashmap was producing results so fast. BabelFish could translate "El camarero anda por la calle" in 1995. This is probably 70% as good as machine translation needed to be for many usecases. Machine translation software just needed to "tidy up" edge cases like idioms, homophones, different grammar, etc etc.
This didn't happen. Until Google started using deep learning in the 2010s, progress stalled, because the last 30% couldn't be done with hashmap lookup. Now we are in another period of rapid advancement. But this approach will probably also top out eventually.
Minimum, maximum, it doesn't matter to understand the metaphor.
A neural network through gradient descent generally want to find the global minimum of an error function and therefore maximize predictions accuracy.
It could instead search for a global maximum to the inverse of an error function or to another type of function, but the distinction is irrelevant here.
Gradient descent often fail to find the global minimum and instead because it descent/jump through derivates it can be stuck in a local minima, which simply means that it has reached a minima on a function curve and at this point, it needs to go upwards to go beyond the minima, therefore it temporarily afford to perform worse, to increase the error rate, in hope to find a new descent on the curve that will be lower than the previous minima
Not being stuck in local minima is the #1 metric to improve deep learning algorithms and while there are many optimizations towards this goal it is not computationally doable with current algorithms to have optimal learning aka reach the global minima.
So now we understand
now let's understand
They are a local minima because Neural networks are fundamentally unfit towards AGI needs.
They are just a vomit of bruteforced contingent correlates and it works surprisingly well but it is inefficient, makes poor contingent amalgamations inherently,
have no causal reasoning abilities, are stateless and cannot do continual learning AKA they can't learn new info in real time without the so called catastrophic forgetting.
For those reasons, they are by design suboptimals and therefore are a local minima in which the world is stuck, in the goal of beating local minimas.
No offence, but it's really striking to see that the rationalist diaspora people live in an alternate reality based on groundless hype and a fundamental lack of methodology, or should I dare say, lack of rationality.
We are in a winter since 2019 or since the 90s depending on what we look at.
What does the average lesswronger or redditor look at?
He looks at cool demos. Or even more than demos, cool domain specific disrutpive applications.
That is what stablediffusion and chatgpt are.
They are indeed very impressive for what they do but at the end of the day that is irrelevant towards the natural language understanding goal.
someome with methodology should instead look at the precise tasks required towards true NLU or even AGI.
POS tagging:
https://paperswithcode.com/sota/part-of-speech-tagging-on-penn-treebank
dependency parsing:
https://paperswithcode.com/sota/dependency-parsing-on-penn-treebank
coreference resolution
https://paperswithcode.com/sota/coreference-resolution-on-ontonotes
word sense disambiguation
https://paperswithcode.com/sota/word-sense-disambiguation-on-supervised
named entity recognition
https://paperswithcode.com/sota/named-entity-recognition-ner-on-conll-2003
semantic parsing
https://paperswithcode.com/sota/semantic-parsing-on-amr-english-mrp-2020
Only to name a few, all of them are needed concomitantly, and that is by far non-exhaustive.
Once you undestand that the error rate is often per word/token instead of per sentence, and that error between those tasks have dependencies and are therefore often multiplicative and you'll undestand that a 95% accuracy while it sounds impressive is in fact dogshit.
What can you see from those SOTA results?
That we have reached a plateau of extreme and increasingly diminishing returns.
Most of the gains are from 2019, the year transformers were popularized. The rest has been a bag of tricks, and unoriginal minor optimizations.
The biggest innovation while still mostly unknown/underappreciated by the researchers group think, is XLnet, from 2019 too.
There is nothing else we can do, we have maxxed out the bruteforcing of statistics amalgamations, contrary to the belief, there is almost zero progress in SOTA results and most importantly there is a fundamental shortage of innovative ideas, wether we speak of an alternative to transformers or about innovating transformers themselves, nothing potent.
While it is obvious transformers are a misdirection, despite this I can improve the state of the art in any NLP task because there are additional ineptia in the research crowd.
Firstly almost nobody is working on improving the SOTA in most tasks, e.g. coreference resolution. Just look at the number of submisions over time to realize this.
Secondly as in every research field, the researchers are highly dysfunctional, AKA they will invent many minor but interesting, universal and complementary/synergetic optimizations ideas and yet nobody will ever attempt to combine them concomitantly, despite it being trivial. That is because researchers are not meta-researchers, and because of potent NIH syndrome and other cognitive biases.
For starters, the worldwide SOTA in dependency parsing is because I asked the researcher to switch BERT for XLnet, and it worked.
I plan to outperform the SOTA in coreference resolution in 2023, that will empirically strengthen my thesis on the dysfunctionality of mankind and on artificial scarcity.
I invite you to read this complementary essay on the topic: https://www.metaculus.com/notebooks/10677/substance-is-all-you-need/
VoiceOfLogic
Was that essay on metaculus written by you, and do you have a blog?
Yes I'm the author.
Have you read it?
No I don't yet have a formal blog but I intend to write one in the following months and to shake the rationalist diaspora and confront them to their own limitations. A much needed endeavor.
Cool username BTW, have you tried lucid dreaming with cholinergics?
Thanks, and nope, never heard of that.
Btw, in that article, the source listed for the claim of peptides being miracle cancer drugs was written by an undergrad. Do you have a better source? I found that particular bit very interesting.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
Is this Julius Branson?
Unlikely, Julius doesn't know this much about machine learning.
More options
Context Copy link
I don't think there's any human being like me on this timeline but I would love to find a clone.
I've never read about Julius Branson https://juliusbranson.wordpress.com/blog/
What makes this person similar to me?
What makes you think I am him?
Are you the founder of the Obsidian.md startup BTW?
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link