Contact Us
Sign In
Sign Up
Rules Admins Moderation Log Random Post Random User
What is this place?

This website is a place for people who want to move past shady thinking and test their ideas in a court of people who don't all share the same biases. Our goal is to optimize for light, not heat; this is a group effort, and all commentators are asked to do their part.

The weekly Culture War threads host the most controversial topics and are the most visible aspect of The Motte. However, many other topics are appropriate here. We encourage people to post anything related to science, politics, or philosophy; if in doubt, post!

Check out The Vault for an archive of old quality posts. You are encouraged to crosspost these elsewhere.

Why are you called The Motte?

A motte is a stone keep on a raised earthwork common in early medieval fortifications. More pertinently, it's an element in a rhetorical move called a "Motte-and-Bailey", originally identified by philosopher Nicholas Shackel. It describes the tendency in discourse for people to move from a controversial but high value claim to a defensible but less exciting one upon any resistance to the former. He likens this to the medieval fortification, where a desirable land (the bailey) is abandoned when in danger for the more easily defended motte. In Shackel's words, "The Motte represents the defensible but undesired propositions to which one retreats when hard pressed."

On The Motte, always attempt to remain inside your defensible territory, even if you are not being pressed.

New post guidelines

If you're posting something that isn't related to the culture war, we encourage you to post a thread for it. A submission statement is highly appreciated, but isn't necessary for text posts or links to largely-text posts such as blogs or news articles; if we're unsure of the value of your post, we might remove it until you add a submission statement. A submission statement is required for non-text sources (videos, podcasts, images).

Culture war posts go in the culture war thread; all links must either include a submission statement or significant commentary. Bare links without those will be removed.

If in doubt, please post it!

Rules
Recommended Posts And Communities
Recommended Realtime Chats
- Quokka's Den Telegram
- Astral Codex Ten Discord

PaperclipPerfector 2yr ago (text post) 4103 thread views

Small-Scale Question Sunday for June 11, 2023

Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?

This is your opportunity to ask questions. No question too simple or too silly.

Culture war topics are accepted, and proposals for a better intro post are appreciated.

Jump in the discussion.

No email address required.

Chrisprattalpharaptr Ave Imperaptor 2yr ago

Apologies for the naive question, but I'm largely ignorant of the nuts and bolts of AI/ML.

Many data formats in biology are just giant arrays, with each row representing a biological cell and columns representing a gene (RNA-Seq), parameter (flow cytometry). Sometimes rows are genetic variants and columns are various characteristics of said variant (minor allele frequency, predicted impact on protein function, etc).

Is there a way to feed this kind of data to LLMs? It seems trivial for chatGPT to parse 'This is an experiment looking at activation of CD8+ T cells, generate me a series of scatterplots and gates showcasing the data' but less trivial to parse the giant 500,000x15 (flow) or 10,000x20,000 (scRNA-Seq) arrays. Or is there a way for LLMs to interact with existing software?

Context

netstack Texas is freedom land Chrisprattalpharaptr 2yr ago

What’s the advantage over normal programming?

Imagine you have 50 samples in your experiment, each sample has 10 gates so you're skimming over 500 scatter plots and then inputting however many readouts you have into other histogram plots to represent the data you got.

This really feels like a pair of 'for' loops instead of a flexible task. You could even go up a level and write a tool that lets you pick different axes.

Context

Scimitar Chrisprattalpharaptr 2yr ago

Why language models specifically? From a cursory google I found a couple of papers which may make more sense to you than me

https://www.sciencedirect.com/science/article/pii/S1672022922001668

https://www.frontiersin.org/articles/10.3389/fimmu.2021.787574/full

To overcome the challenges faced by manual gating, many computational tools have been developed to automate every step of the cytometry data analysis, including quality control (5), batch normalization (6, 7), data visualization (8–10), cell population identification (11–16), and sample classification (17–20). The tools utilize a wide range of computations methods, ranging from rule-based algorithms to machine learning models.

Do you want LLMs so you can "talk to" your lab results? Otherwise it's easier to analyse masses of data without the LLM middleman.

Context

Chrisprattalpharaptr Ave Imperaptor Scimitar 2yr ago

Do you want LLMs so you can "talk to" your lab results? Otherwise it's easier to analyse masses of data without the LLM middleman.

Yeah, exactly. There's a lot of grunt work involved in flow cytometry analysis which I was thinking of more than the scRNA-Seq. Machine learning for most basic flow cytometry is slightly overkill because conceptually what you're doing with each gate is conceptually pretty simple. I tried to elaborate/clarify in this comment.

Context

atelier Jimmys: unrustled Chrisprattalpharaptr 2yr ago

You should send the grunt work to CCP where eve denzions can do it for fractions of a cent.

Context

Amadan Enjoying my short-lived victory atelier 2yr ago

You've been repeatedly warned to stop doing low effort drive-bys like this that contribute nothing.

Banned for five days this time.

Context

Deleted by author

Chrisprattalpharaptr Ave Imperaptor meh 2yr ago

Though to be fair, what little I saw of SNE, for example in analysis of single-cell transcriptomes, and what I heard from objective people familiar with the research, didn't necessarily inspire confidence that the patterns emerging were indicative of anything real.

Interesting. I've spent a lot of time staring at t-SNE plots (or more recently UMAPs took over) and they map pretty well to our underlying understanding of the biology. It got a bit hairy when we asked it to split the data into too many clusters and it was difficult to know if we were looking at some novel, minor cell type or a hallucination.

I think I asked that question poorly and also lack the vocabulary to describe what I'm envisioning. Current software for analyzing this kind of data (flow) exists and the typical workflow is just making a series of scatterplots with 'gates,' or subsets of cells that express a given marker. Here's a basic example.

Verbally, it's all very simple - Gate on singlets, then lymphocytes via forward/side scatter, exclude dead cells, gate on CD3+ and then split into CD4 and CD8 T cells. It's the kind of instruction that should be very easy for chatGPT to parse even with a single sentence outlining the experiment. But how to feed the data? Is there a way for chatGPT to interact with an existing analysis software to draw gates/generate scatterplots...? I assume you wouldn't want to feed the raw array of cells into your prompt, although I don't know.

Maybe I'll back up and zoom out a bit. Most people use flowjo to analyze flow cytometry data. It's a multibillion dollar industry, they haven't updated the software in something like a decade (and that update made it worse than the version I was using before), and you routinely draw the same gates over and over again. Imagine you have 50 samples in your experiment, each sample has 10 gates so you're skimming over 500 scatter plots and then inputting however many readouts you have into other histogram plots to represent the data you got. It's repetitive and the software is clunky. LLMs definitely seem 'smart' enough to understand everything that's going on, but I don't have the first idea how you communicate that kind of data to them...

Context

Deleted by author

Chrisprattalpharaptr Ave Imperaptor bnfrmt 2yr ago

Sorry, I think my description of what I was thinking of was exceptionally poor. I tried to elaborate in this comment.

Context

grognard Intel Pentium III 450 / Nvidia Riva TNT2 Ultra / Corsair 128MB PC133 / ABIT BH6 Chrisprattalpharaptr 2yr ago

If you get something like this going let me know. I’m exploring local LLM use-case of cybersecurity packet analysis. Loading bulk data separately from prompt engineering etc, this all is complicated by a small 2k context length. Newer open models have landmark attention tech for 10/30k+ context length, but they are less sophisticated 7/13billion parameter models compared to the 30b ones I’ve been using.

Context

Walterodim Only equals speak the truth, that’s my thought on’t Chrisprattalpharaptr 2yr ago

I have no useful suggestion, but that's a neat idea! Great example of the kind of thing that AI could straightforwardly do and save a huge amount of man-hours of tedious, boring labor. The AI probably still won't know why anyone would care about a particular gate, but it could make it quite easy to visualize things.

Context

deluxev2 Chrisprattalpharaptr 2yr ago

I'm no expert but have some familiarity. The LLMs have a limited context window (gpt4 is 8000 tokens) so it can't hold all of that data at once. Probably the easiest way to get it to chew through that much is to ask it for code to do the things you want (directing it to write some pygraph or R code or something). It could plausibly do it inline if you asked it to summarize chunks of data, then fed the previous summary in with the next chunk. The code would act as a much more auditable and probably accurate tool though.

Context

What is this place?

This website is a place for people who want to move past shady thinking and test their ideas in a court of people who don't all share the same biases. Our goal is to optimize for light, not heat; this is a group effort, and all commentators are asked to do their part.

The weekly Culture War threads host the most controversial topics and are the most visible aspect of The Motte. However, many other topics are appropriate here. We encourage people to post anything related to science, politics, or philosophy; if in doubt, post!

Check out The Vault for an archive of old quality posts. You are encouraged to crosspost these elsewhere.

Why are you called The Motte?

A motte is a stone keep on a raised earthwork common in early medieval fortifications. More pertinently, it's an element in a rhetorical move called a "Motte-and-Bailey", originally identified by philosopher Nicholas Shackel. It describes the tendency in discourse for people to move from a controversial but high value claim to a defensible but less exciting one upon any resistance to the former. He likens this to the medieval fortification, where a desirable land (the bailey) is abandoned when in danger for the more easily defended motte. In Shackel's words, "The Motte represents the defensible but undesired propositions to which one retreats when hard pressed."

On The Motte, always attempt to remain inside your defensible territory, even if you are not being pressed.

New post guidelines

If you're posting something that isn't related to the culture war, we encourage you to post a thread for it. A submission statement is highly appreciated, but isn't necessary for text posts or links to largely-text posts such as blogs or news articles; if we're unsure of the value of your post, we might remove it until you add a submission statement. A submission statement is required for non-text sources (videos, podcasts, images).

Culture war posts go in the culture war thread; all links must either include a submission statement or significant commentary. Bare links without those will be removed.

If in doubt, please post it!

Rules

Recommended Realtime Chats

Link copied to clipboard

Action successful!

Error, please try again later.