This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.
Jump in the discussion.
No email address required.
Notes -
Your "doesn't get the job done" link doesn't seem to go anywhere... I had to clip out everything past the "mediaplayer" portion of the URL to get to the video, where a tesla slams into a test dummy. But it doesn't take much work to find counterexamples, and this wouldn't be the first time someone fabricated a safety hazard for attention.
I don't think LIDAR is as big of a differentiator as tech press or popular analysis makes it out to be. It's very expensive (though getting cheaper), pretty frail (though getting more durable), and suffers from a lot of the same issues as machine vision (bad in bad weather, only tells you that landmarks have moved rather than telling you anything you can do with this info, false positive object identification). And this is trite, but remains a valid objection: human vision is sufficient to drive a car, so why do we need an additional, complex, fragile sensor operating on human-imperceptible bandwidth to supplement cameras operating in the same bandwidth as human eyes?
Tesla's ideological stance on machine vision seems to be: if camera-based machine vision is insufficient to tackle the problem, we should improve camera-based machine vision until it can tackle the problem. This is probably the right long-term call. If they figure out how to get the kind of performance expected from a self-driving system out of camera-based machine vision, not only have they instantly shaved a thousand bucks of specialty hardware off their BOM, arguably they've developed something far more valuable that can be slapped on all variety of autonomous machines and robotics. If the fundamental limitations are in the camera, they can use their demand in automotive as leverage to encourage major camera sensor manufacturers to innovate on areas where they currently struggle (high dynamic range, ruggedness, volume manufacturability). Meanwhile, there's a whole bunch of non-Tesla people working independently on many of the hard problems in the software side of machine vision; some of the required innovations in software don't necessarily need to come from Tesla. And if it does need to come from Tesla, they've put enough cameras and vehicle computing out in the wild by now that they could plausibly collect a massive corpus of training data and fine-tune it better than pretty much any other company outside of China.
Google, meanwhile, had years of headstart on Tesla, a few hundred billion dollars of computers, at least one lab (possibly several) at the forefront of machine vision research, extremely deep pockets to buy out tens of billions of dollars of competitors and collaborators, limited vulnerability to competitive pressure or failure in their core revenue stream, and a side business mapping the Earth compelling them to create a super-accurate landmark database for unrelated business ventures. I think the reason Google's self-driving vehicles work better than Tesla's is because Google held themselves to ludicrously high standards, half of which were for reasons unrelated to self-driving, and the likes of which are probably unattainable for more than a handful of tech megacorps. That they use LIDAR is immaterial - they've been using it since well before the system costs made commercial sense.
As for the rest of Tesla's competitors... when BigAutoCorp presents their risk management case to the government body authorizing the sale and usage of self-driving technology, it sounds a lot more convincing to say "cost is no obstacle to safety" as you strap a few thousand bucks of LIDAR to every machine and spend another few dozen engineering salaries every year on LIDAR R&D. A decade of pushing costs down has brought LIDAR to within an order of magnitude of the required threshold for consumer acceptance. I'll note that comparatively, camera costs were never an obstacle to Tesla's target pricing or market penetration. Solving problems with better hardware is fun, but solving problems with better software is scalable.
That's not to say Tesla's software is better though. I can't tell if Tesla's standards are lower than their competitors, or if their market penetration is large enough that they have a longer tail of publicized self-driving disasters to draw from, or if there's a fundamental category of objects their current cameras or software can't properly detect. Speaking from experience, I've seen autopilot get very confused by early-terminating lane markers, gaps in double yellow for left turns, etc. I think their software just kinda sucks. It's probably tough to identify the performance differences in good software with no LIDAR and bad software with LIDAR; comparatively much easier to identify bad software with no LIDAR. And really easy to blame the lack of LIDAR when you're the only people on Earth foregoing it.
The problem with the Tesla stance is that cameras (affordable ones, anyways) are still way behind human eyes -- it's not just dynamic range, the resolution/FOV tradeoff is extremely bad.
This guy estimates you would need ~576MP streaming at 30FPS with a FOV of 120 degrees to get close (actual FOV is more than that; depends how many cameras you want to have I guess). Such a system would be way more expensive than a LIDAR unit, safe to say -- especially if you expect to catch up with the 14 stop DR, which might not even be possible with current sensors.
Not sure what Tesla is using for resolution, but the extra acuity is surely not wasted in terms of picking out faraway objects and even figuring out roadlines -- this eats into the theoretical reaction-time advantage of AVs substantially.
This is not quite right. Eyes have a huge overall FOV, but the actual resolution of vision is a function of proximity to foveation angle, and there's only maybe a 5° cone of high-resolution visual acuity with the kind of detail being described. Just taking the proposed 120° cone and reducing it to 5° is more than a 99% reduction in equivalent megapixels required. And the falloff of visual acuity into peripheral vision is substantial. My napkin math with a second-order polynomial reduction in resolution as a function of horizontal viewing angle puts the actual requirements for megapixel-equivalent human-like visual "resolution" at maybe a tenth of the number derived by Clark. None of that is really helpful to understanding how to design a camera that beats the human eye at self-driving vision tasks though, because semiconductor process constraints make it extremely challenging to do anything other than homogenously spaced CCDs anyway.
On top of that, the "30FPS" discussion is mostly misguided, and I don't actually see that number anywhere in the text; I only see a suggestion that as the eye traverses the visual field, the traversal motion (Microsaccades? Deep FOV scan? No further clarity provided) fills in additional visual details. This sounds sort of like taking multiple rapid-fire images and post-processing them together into a higher-resolution version, something commercial cell phone cameras have done for a decade now. This part could also be an allusion to the brain backfilling off-focus visual details from memory. It's unclear what was meant.
This is already a solved problem, and has been for at least five years. Note that in five years, we've added 20dB dynamic range, 30dB scene dynamic range, bumped up the resolution by >6x (technically more like 4x at same framerate, but 60FPS was overkill anyway), and all that in a module cost that I can't explicitly disclose but I can guarantee you handily beats any LIDAR pricing outside of Wei Wang's Back Alley Shenzhen Specials. And it could still come down by a factor of 2 in the next few years, provided there's enough volume!
In any case, remember that the bet isn't beating the human eye at being a human eye, it's beating the human eye at being the cheap, ready-today vision apparatus for a vehicle. The whole exercise of comparing human eye performance to camera performance is, and has always been, an armchair philosopher debate. It turns out you don't need all the amazing features of the human visual system for the task of driving, this is sufficient but not necessary for a solution to the problem. You need a decent performance, volume-scalable, low-cost imaging apparatus strapped to a massive amount of decent performance, volume-scalable, low(ish)-cost post-processing hardware. It's a pretty safe bet that you can bring compute costs down over time, or increase your computational efficiency within the allocated budget over time. It's also a decent bet that the smartphone industry, with annual camera volumes in the hundreds of millions, is going to drive a lot of that camera manufacturing innovation you need, bringing the cost down to tens of dollars or better. Most of the image sensors are already integrating as much of the DSP on-die as possible, in a bid to free up the post-processing hardware to do more useful stuff, and that approach has a lot of room to grow in the wake of advanced packaging and multi-die assembly innovations in the last ten years. All the same major advances could eventually arrive for LIDAR, but it certainly didn't look that way in 2012, and even now in 2023 it still costs me a thousand bucks to kit out an automotive LIDAR because of all the highly specialized electromechanical structures and mounting hardware, money I could be using to buy a half-dozen high-quality camera modules per car...
As far as reaction time, real-time image classification fell to sub-frame processing time years ago, thanks in part to some crazy chonker GPUs available in the last few years. There's a dozen schemes for doing this on video, many in real-time. The real trouble now is chasing down the infinitely long tail of ways for any piece of the automotive vision sensing and processing pipeline to get confused, and weighing the software development and control loop time cost of straying from the computational happy path to deal with whatever they find.
This is also why I think Tesla's software just sucks. It's not the camera hardware that's the problem any more, and the camera hardware is still getting better. There's just no way not to suck when the competition is literally a trillion-dollar gigalith of the AI industry that optimized for avoiding bad PR and waited an extra four years to release a San Francisco-only taxi service. Maybe if Google was willing to stomach a hundred angry hit pieces every time a Waymo ran into a wall with the word "tunnel" spray-painted on it, we'd have three million Waymos worldwide to usher in a driverless future early. I doubt Amazon has any such inhibitions, so I guess we'll find out soon just how much LIDAR helps cover for bad software.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link