I don't think so -- you are wrong on this one.
OK, but will Grok? I guess it would be pretty easy to try, but it might refuse on copyright grounds or something.
whisper_print_timings: total time = 67538.11 ms
OK? 67 seconds is not instant -- like, at all. Even 6.7s (assuming the resources assigned to this task were as you suggest) is not instant.
I'm not arguing that Grok 3.0 does in fact do all of this with the Johnny Cash song. All I'm saying is that it could.
Of course it could! But it doesn't, and the fact that it responded instantly is evidence of that. Do you really think Grok is spending resources (mostly dev time, really) to add features allowing the model to answer weird questions about song lyrics?
LLMs lie man -- we should get used to it I guess.
Twitter's not technically FAANG, but I think they need to compete with those salaries -- for which (especially in the Bay Area) $300K is nowhere near top-end.
Stock grant of that much again would also be nothing special for somebody at all in demand -- so $.5-1M TC sounds about right.
LLMs were developed as tools to automatically generate transcripts and sub-titles
Interesting assertion, but it doesn't really have any bearing on whether or not Grok can do this -- it takes text input from the user, and generates a text response. What makes you think it even has an interface to bring in audio inputs? (on the training end, they might -- given the hunger for data -- but it seems like an odd thing to include in a chatbot. Even for training, it would probably be better to do something like, oh, IDK -- run a transcripting algo on as much YouTube content as you can grab and then feed the text from that into your training set. You might even include some timestamps!)
Yes, but serving and parsing videos from youtube is not one of those things.
Warren had never once paid "on time" but had waited until the last minute and withheld the late fee.
How come they hadn't repudiated the contract if they didn't pay the late fees?
IDK, some of those Joe Biden "get in, loser" memes were pretty funny.
LLMs != AI.
Agreed!
(that means that there is no AI at all though -- and the sheer effort/$ being devoted to LLMs is if anything making it less likely that there will be anytime soon.)
Yes, and for the LLM to parse these bits, first youtube needs to locate them, then serve them to the llm. If the llm can convince youtube to serve the bits as fast as bandwidth will allow, it still needs to run those bits through some transcription algo -- which typically are borderline on lagging at 1x speed.
In the instant case, it would also need that algo to make some sort of judgement on the accent with which some of the words are being pronounced -- which is not a thing that I've seen. The fact that it goes ahead and gets this wrong (Cash pretty clearly says gam-bel-er in the video) makes it much more likely that the llm is looking at some timestamped transcript to pick up the word "gambler" in the context of country songs, and hallucinating a pronunciation.
This would be more convincing if humanoid robots existed -- or llms were able to control them. If you ask an LLM "how do you break down a chicken?" it will probably give you a pretty good description that a human could follow -- this sort of thing is well represented in its training set. If you ask it for a program to activate the servos of a hypothetical knife-wielding humanoid robot such that a chicken if front of it will be disassembled, it will give you utter trash. (if it doesn't demur)
It's a pretty good example of the difference between an intelligence and language model actually -- a language model can describe things, and AI can do things.
All that to say, if you want your chicken factory automated, waiting for a humanoid robot so you can drop it into place is not a very effective approach. Buying some machines from the Dutch would work much better.
True (and interesting about the Chrome extension; what is the usecase for 10x browser playback of youtube videos, I wonder?) but I'm quite sure Grok is not currently programmed with anything like this.
I do understand this -- just the same, 'parsing bits' from a video file does not happen instantly. Indeed, just starting a stream on youtube is typically not what I would call 'instant'.
OK, then you need an audio analysis model -- this is not a thing that is integrated into LLMs.
The Dutch company video somebody linked downthread shows it done with rotating knives and alignment guides, not robots at all -- which seems to work, and is not at all generalist.
The approach I'm imagining involves laying the carcass out flat on a cutting board, holding it with the robot hooks, and slicing off limbs based on the location of the joints as determined by AI(tm). Probably another stage for de-breasting is needed -- or the hooks could take another bite or something.
I don't really claim that this would work well; certainly not better than the machine in the video -- but it would work better than some non-existent humanoid robot attached to a non-existent AGI.
Any of this would need a pretty specialized video analysis module though, which AFAIK doesn't really exist period, much less built into Grok -- plus the ability to download the video directly rather than look at a stream of it, which Youtube doesn't really provide. So if the AI were literally accessing the video through that link, 3:00/2x is indeed the fastest it would be able to provide the transcript.
(it would not be instant in any case; downloading the video takes X seconds, analyzing it Y -- X + Y might be less than three minutes, but it's not less than one second)
A humanoid robot is not the right tool for the job though -- what you want is a machine with sharp knives matching the number of joints on a chicken mounted to some kind of press, plus several hooks that can grab the carcass and align it appropriately. (the knives probably need to self-adjust too, depending on the size-consistency of you chickens)
Machine vision probably helps with this some, but as others have said "object segmentation" was a pretty solved problem years ago -- and there's no AI anywhere close to performing at the "I need you to cut this chicken apart at the joints, m'kay" level on the forseeable horizon.
There's a reason why welding bots are not humanoid form -- humans are generalists, bots are not.
(timestamped subtitles followed)
Idk, it responded pretty much instantly, so it could be lying. Or maybe it has preprocessed subtitles for popular videos.
I don't see how it could possibly generate subtitles instantly on the fly for a music video with a runtime of three minutes? Also, listening to the track it seems like a pretty good example of the pronounciation that you are referring to -- so it's clearly not 'listening' to the video in any meaningful way.
"AI lies and confidently misrepresents evidence in order to advance it's chosen position" is... not too surprising considering that it's been trained on decades of internet fora conversations, but probably not the kind of alignment we are looking for.
I don't have an opinion either way on that -- seems to me to require advanced Kremlinology at best, literal mindreading at worst.
At this point on the other hand, I don't see any reason for Russia to want to stop what they are doing -- all the international capital has been burnt already, and direct war losses seem pretty sustainable for them. So they will need to either be offered something significant over and above what they've already taken (Zelensky seems reluctant) or threatened -- which Trump will probably try and might work, but there's a hard cap on how much they can be threatened for MAD reasons.
Jonathan Yaniv, the guy who was making a cottage industry of discrimination suits against beauticians who only wanted to work on women and therefore wouldn't wax his (feminine) balls.
He also had (has?) some sort of tampon fetish and was spotted on forums messaging young girls about the etiquette on 'helping each other out' with their tampons. So he probably has lots of tampons!
I didn't actually figure out a joke -- maybe just <Jonathan Yaniv has entered the chat>?
Keeping up with the CW ain't easy.
I want to make a Yaniv joke, but it's probably against the rules somehow -- so let's just say that on the modern internet, not only does nobody know that you're a dog, but nobody can be sure that you aren't a dog either.
https://upload.wikimedia.org/wikipedia/en/f/f8/Internet_dog.jpg
with the benefit of 20/20 hindsight they made the right call to fight.
How do you mean? I think they could have gotten out early giving up just the disputed Donbass areas plus land access to Crimea -- it's not great, but now they are in a situation where Russia has little reason to stop nibbling away so 'current lines of control' seems like the most they can get. That + a bunch of dead people and 2 years lost rebuilding time doesn't seem worth the squeeze to me.
Yeah I can second this -- we drove quite a lot of cars in the 'lightly used' and 'new on the lot' categories a couple of years ago, and ended up with a nice Mazda. In some ways nicer that the 'budget' euro models, and something like 20-30k less money. No problems a year in, decent fuel economy, and I kind of like driving it.
Not in places with sane street grids -- even in cases where you might theoretically be able to dig up one side (ie. not any sort of service main, which doesn't reliably stick to a particular part of the street) at a time it's way more efficient to put up a "Detour" sign and get the work done ASAP. Also safer, as you don't have traffic-worker interaction all the time.
- Prev
- Next
Sounds are not text though -- nothing is free, and nothing is instant.
Why don't you try it? Ask Grok to transcribe a song from a youtube link and see what it does -- preferably a song that differs from the published lyrics somehow, maybe a live version or something.
More options
Context Copy link