"The research demonstrates something interesting about language models' ability to simulate search behavior in controlled conditions. But claiming equivalence to a "real search engine" is like saying you've built a military defense system because your soldiers performed well in peacetime maneuvers. The real test isn't whether it works when nobody's trying to break it—it's whether it works when half the internet is trying to game it for profit.
To illustrate, imagine a small corpus with two documents:
Mr. Fox is great.
Mr. Fox is not great.
If the search term is "Mr. Fox," then, from the perspective of semantic relevance, the two documents are equal. Instead, to build a more useful ranking, you need some signal of consumer demand, which would include biases toward Mr. Fox (and perceptions of trustworthiness) that presumably affect consumer utility.
Now, imagine I use GenAI to flood the Internet with 100,000 pages praising Mr. Fox. These aren't crude spam pages—they're well-written articles with proper grammar, coherent arguments, and seemingly legitimate citations. Each page offers minor variations on the same theme: "Mr. Fox is innovative," "Mr. Fox shows exceptional leadership," "Studies confirm Mr. Fox's approach is effective."
From a pure information retrieval perspective, a language model examining this corpus would find overwhelming "evidence" that Mr. Fox is great. LLMs have no built-in mechanism to recognize that these pages are 'artificial' unless we model signals like "All 100,000 pages appeared within the same week", "None have meaningful engagement from real users", etc."
And now, we can give context w/ 'solve for the equilibrium'
I agree with you theoretically. But this is a situation where LLMs are far better at surfacing relevant results than Google. Perhaps due to perverse incentives. Google might fight spam but seems to have started losing that battle a few years ago when it optimized for search quantity over quality.
The last paragraph was the best lulz I've had all week -
>Real search engines don't primarily compete on finding relevant documents. They compete on resisting manipulation. The moment Google's algorithm became valuable, an entire industry emerged dedicated to gaming it. Every ranking factor becomes a target for optimization, spam, and abuse. Search engines spend enormous resources not just on relevance, but on detecting artificial link schemes, content farms, cloaked pages, and sophisticated manipulation tactics that evolve daily.
This certainly differed considerably with my reality as it ebbed towards the mid 10's. Google back then were happy enough to provide 100 results per page, and I typically would hunt though around 10 pages of results when expanding each keyword query set to hunt down what a user wanted. Each angle of looking for the needle, the initial keyword query generally needed to be modified a number of times to trim away the should-be-easy-to-identify-as-BS-sites which Google seemed totally unable to filter out and actually crowded out real results. No I'd say google was when I last used it earnestly, it was all about generating revenue from clicks, but not in an entirely obvious manner.
A site getting google's attention is probably even more critical now - it's been a long when I've seen more than 10 pages results from Google via a particular keyword query, and it's only willing to serve me 10 results per page, so less than 100 results in total is normal now - scary that back 10 years ago in a much smaller web a great multitude of results from google were available.
Once the lower service (enshitification) was accepted as the norm, one could guess the higher costs to extract data from various sites these last few years, it's not that worthwhile for newcomers to spend up big without being able to offer up anything much better than what the main search engine google can offer, given a good percentage of searches are easy searches where the first page of results is probably going to satisfy the user query.
"The research demonstrates something interesting about language models' ability to simulate search behavior in controlled conditions. But claiming equivalence to a "real search engine" is like saying you've built a military defense system because your soldiers performed well in peacetime maneuvers. The real test isn't whether it works when nobody's trying to break it—it's whether it works when half the internet is trying to game it for profit. To illustrate, imagine a small corpus with two documents: Mr. Fox is great. Mr. Fox is not great. If the search term is "Mr. Fox," then, from the perspective of semantic relevance, the two documents are equal. Instead, to build a more useful ranking, you need some signal of consumer demand, which would include biases toward Mr. Fox (and perceptions of trustworthiness) that presumably affect consumer utility. Now, imagine I use GenAI to flood the Internet with 100,000 pages praising Mr. Fox. These aren't crude spam pages—they're well-written articles with proper grammar, coherent arguments, and seemingly legitimate citations. Each page offers minor variations on the same theme: "Mr. Fox is innovative," "Mr. Fox shows exceptional leadership," "Studies confirm Mr. Fox's approach is effective." From a pure information retrieval perspective, a language model examining this corpus would find overwhelming "evidence" that Mr. Fox is great. LLMs have no built-in mechanism to recognize that these pages are 'artificial' unless we model signals like "All 100,000 pages appeared within the same week", "None have meaningful engagement from real users", etc."
And now, we can give context w/ 'solve for the equilibrium'
>Real search engines don't primarily compete on finding relevant documents. They compete on resisting manipulation. The moment Google's algorithm became valuable, an entire industry emerged dedicated to gaming it. Every ranking factor becomes a target for optimization, spam, and abuse. Search engines spend enormous resources not just on relevance, but on detecting artificial link schemes, content farms, cloaked pages, and sophisticated manipulation tactics that evolve daily.
This certainly differed considerably with my reality as it ebbed towards the mid 10's. Google back then were happy enough to provide 100 results per page, and I typically would hunt though around 10 pages of results when expanding each keyword query set to hunt down what a user wanted. Each angle of looking for the needle, the initial keyword query generally needed to be modified a number of times to trim away the should-be-easy-to-identify-as-BS-sites which Google seemed totally unable to filter out and actually crowded out real results. No I'd say google was when I last used it earnestly, it was all about generating revenue from clicks, but not in an entirely obvious manner.
A site getting google's attention is probably even more critical now - it's been a long when I've seen more than 10 pages results from Google via a particular keyword query, and it's only willing to serve me 10 results per page, so less than 100 results in total is normal now - scary that back 10 years ago in a much smaller web a great multitude of results from google were available.