The Web Is Broken – Botnet Part 2

jan.wildeboer.net

411 points by todsacerdoti 6 months ago

aorth 6 months ago

In the last week I've had to deal with two large-scale influxes of traffic on one particular web server in our organization.

The first involved requests from 300,000 unique IPs in a span of a few hours. I analyzed them and found that ~250,000 were from Brazil. I'm used to using ASNs to block network ranges sending this kind of traffic, but in this case they were spread thinly over 6,000+ ASNs! I ended up blocking all of Brazil (sorry).

A few days later this same web server was on fire again. I performed the same analysis on IPs and found a similar number of unique addresses, but spread across Turkey, Russia, Argentina, Algeria and many more countries. What is going on?! Eventually I think I found a pattern to identify the requests, in that they were using ancient Chrome user agents. Chrome 40, 50, 60 and up to 90, all released 5 to 15 years ago. Then, just before I could implement a block based on these user agents, the traffic stopped.

In both cases the traffic from datacenter networks was limited because I already rate limit a few dozen of the larger ones.

Sysadmin life...

rollcat 6 months ago

Try Anubis: <https://anubis.techaro.lol>
It's a reverse proxy that presents a PoC challenge to every new visitor. It shifts the initial cost of accessing your server's resources back at the client. Assuming your uplink can handle 300k clients requesting a single 70kb web page, it should solve most of your problems.
For science, can you estimate your peak QPS?
- marginalia_nu 6 months ago
  
  Anubis is a good choice because it whitelists legitimate and well behaved crawlers based on IP + user-agent. Cloudflare works as well in that regard but then you're MITM:ing all your visitors.
- Imustaskforhelp 6 months ago
  
  Also, I was just watching brodie robertson video about how United Nations has this random search page of unesco which actually has anubis.
  Crazy how I remember the HN post where anubis's blog post was first made. Though, I always thought it was a bit funny with anime and it was made by frustration of (I think AWS? AI scrapers who won't follow general rules and it was constantly giving requests to his git server and it actually made his git server down I guess??) I didn't expect it to blow up to ... UN.
  - xena 6 months ago
    
    Her*
    It was frustration at AWS' Alexa team and their abuse of the commons. Amusingly if they had replied to my email before I wrote my shitpost of an implementation this all could have turned out vastly differently.
    
    Imustaskforhelp 6 months ago
    
    Oh I am so so sorry I didn't see your gender and assumed it to be a (he). { really sorry about that once again}
    Also didn't expect you to respond to my comment xD
    I went through the slow realization of while reading this comment that you are the creator of anubis and I had such a smile when I realized that you commented to me.
    Also, this project is really nice, but I actually want to ask, I haven't read the docs of anubis but could it be that the proof of work isn't wasted / it can be used for something (I know I might get downvoted because I am going to mention cryptocurrency, but nano currency has a proof of work required for each transaction, so if anubis actually does the proof of work as by nano standards, then theoretically that proof of work could atleast be some useful)
    Looking forward to your comment!
    
    akaij 6 months ago
    
    Useful as a for-profit cryptocurrency? I think zero chance.
    The only way I see anything like that incorporated is a folding@home kind of thing that could help humanity as a whole.
    Of course, if someone makes it work like you suggested, and it catches on, I will personally haunt your dreams forever. Don't give them any ideas.
- martin82 6 months ago
  
  This looks very cool, but isn't it just a matter of months until all scrapers get updated and can easily beat this challenge and are able to compute modern JS stuff?
nodogoto 6 months ago

My company's site has also been getting hammered by Brazilian IPs. They're focused on a single filterable table of fewer than 100 rows, querying it with various filter combinations every second of every minute of every day.
luckylion 6 months ago

I've seen a few attacks where the operators placed malicious code on high-traffic sites (e.g. some government thing, larger newspapers), and then just let browsers load your site as an img. Did you see images, css, js being loaded from these IPs? If they were expecting images, they wouldn't parse the HTML and not load other resources.
It's a pretty effective attack because you get large numbers of individual browsers to contribute. Hosters don't care, so unless the site owners are technical enough, they can stay online quite a bit.
If they work with Referrer Policy, they should be able to mask themselves fairly well - the ones I saw back then did not.
- ninkendo 6 months ago
  
  I seem to remember a thing china did 10 years back where they injected JavaScript into every web request that went through their Great Firewall to target GitHub… I think it’s known as the “Great Cannon” because they can basically make every Chinese internet user’s browser hit your website in a DoS attack.
  Digging it up: https://www.washingtonpost.com/news/the-switch/wp/2015/04/10...
  - luckylion 6 months ago
    
    Wow, that had passed me by completely, thanks for sharing!
    Very similar indeed. The attacks I witnessed where easy to block once you identified the patterns (referrer was visible and they used predictable ?_=... query parameters to try and bypass caches), but very effective otherwise.
    I suppose in the event of a hot war, the Internet will be cut quickly to defend against things like the "Great Cannon".

hubraumhugo 6 months ago

We all agree that AI crawlers are a big issue as they don't respect any established best practices, but we rarely talk about the path forward. Scraping has been around for as long as the internet, and it was mostly fine. There are many very legitimate use cases for browser automation and data extraction (I work in this space).

So what are potential solutions? We're somehow still stuck with CAPTCHAS, a 25 years old concept that wastes millions of human hours and billions in infra costs [0].

How can enable beneficial automation while protecting against abusive AI crawlers?

[0] https://arxiv.org/abs/2311.10911

marginalia_nu 6 months ago

Proof-of-work works in terms of preventing large-scale automation.
As for letting well behaved crawlers in, I've had an idea for something like DKIM for crawlers. Should be possible to set up a fairly cheap cryptographic solution that enables crawlers a persistent identity that can't be forged.
Basically put a header containing first a string including today's date, the crawler's IP, and a domain name, then a cryptographic signature of the string. The domain has a TXT record with a public key for verifying the identity. It's cheap because you really only need to verify the string it once on the server side, and the crawler only needs to regenerate it once per day.
With that in place, crawlers can crawl with their reputation at stake. The big problem with these rogue scrapers are that they're basically impossible to identify or block, which means they don't have any incentives to behave well.
- lesostep 6 months ago
  
  > Proof-of-work works in terms of preventing large-scale automation.
  It wouldn't work to prevent the type of behavior shown in a title story
CaptainFever 6 months ago

My pet peeve is that using the term "AI crawler" for this conflates things unnecessarily. There's some people who are angry at it due to anti-AI bias and not wishing to share information, while there are others who are more concerned about it due to the large amount of bandwidth and server overloading.
Not to mention that it's unknown if these are actually from AI companies, or from people pretending to be AI companies. You can set anything as your user agent.
It's more appropriate to mention the specific issue one haves about the crawlers, like "they request things too quickly" or "they're overloading my server". Then from there, it is easier to come to a solution than just "I hate AI". For example, one would realize that things like Anubis have existed forever, they are just called DDoS protection, specifically those using proof-of-work schemes (e.g. https://github.com/RuiSiang/PoW-Shield).
This also shifts the discussion away from something that adds to the discrimination against scraping in general, and more towards what is actually the issue: overloading servers, or in other words, DDoS.
- johnnyanmac 6 months ago
  
  It's become unbearable in the "AI era". So it's appropriate to blame AI for it, ib my eyes. Especially since so much defense is based aroind training LLMs.
  It's just like how not all Ddoss's are actually hackers or bots. Sometimes a server just can't take the traffic of a large site flooding in. But the result is the same until something is investigated.
- queenkjuul 6 months ago
  
  It's not a coincidence that this wasn't a major problem until everybody and their dog started trying to build the next great LLM.
udev4096 6 months ago

Blame the "AI" companies for that. I am glad the small web is pushing hard against these scrapers, with the rise of Anubis as a starting point
- lelanthran 6 months ago
  
  > Blame the "AI" companies for that. I am glad the small web is pushing hard towards these scrapers, with the rise of Anubis as a starting point
  Did you mean "against"?
  - udev4096 6 months ago
    
    Corrected, thanks
jeroenhd 6 months ago

The best solution I've seen is to hit everyone with a proof of work wall and whitelist the scrapers that are welcome (search engines and such).
Running SHA hash calculations for a second or so once every week is not bad for users, but with scrapers constantly starting new sessions they end up spending most of their time running useless Javascript, slowing the down significantly.
The most effective alternative to proof of work calculations seems to be remote attestation. The downside is that you're getting captchas if you're one of the 0.1% who disable secure boot and run Linux, but the vast majority of web users will live a captcha free life. This same mechanism could in theory also be used to authenticate welcome scrapers rather than relying on pure IP whitelists.
- ognarb 6 months ago
  
  The issue is that it would require normal user to also do the same, which is suboptimal from a privacy point of view.
mjaseem 6 months ago

I wrote an article about a possible proof of personhood solution idea: https://mjaseem.github.io/tech/2025/04/12/proof-of-humanity.....
The broad idea is to use zero knowledge proofs with certification. It sort of flips the public key certification system and adds some privacy.
To get into place, the powers in charge need to sway.
0manrho 6 months ago

> So what are potential solutions?
It won't fully solve the problem, but with the problem relatively identified, you must then ask why people are engaging in this behavior. Answer: money, for the most part. Therefore, follow the money and identify the financial incentives driving this behavior. This leads you pretty quickly to a solution most people would reject out-of-hand: turn off the financial incentive that is driving the enshittification of the web. Which is to say, kill the ad-economy.
Or at least better regulate it while also levying punitive damages that are significant enough to both disuade bad-actors and encourage entities to view data-breaches (or the potential therein) and "leakage[0]" as something that should actually be effectively secured against. Afterall, there are some upsides to the ad-economy that, without it, would present some hard challenges (eg, how many people are willing to pay for search? what happens to the vibrant sphere of creators of all stripes that are incentivized by the ad-economy? etc).
Personally, I can't imagine this would actually happen. Pushback from monied interests aside, most people have given up on the idea of data-privacy or personal-ownership of their data, if they ever even cared in the first place. So, in the absence of willing to do do something about the incentive for this maligned behavior, we're left with few good options.
0: https://news.ycombinator.com/item?id=43716704 (see comments on all the various ways people's data is being leaked/leached/tracked/etc)
caelinsutch 6 months ago

CAPTCHAS are also quickly becoming irrelevant / not enough. Fingerprint based approaches seem to be the only realistic way forward in the cat / mouse game
CalRobert 6 months ago

I hate this but I suspect a login-only deanonymised web (made simple with chrome and WEI!) is the future. Firefox users can go to hell.
- spookie 6 months ago
  
  I'm still surprised by people everyday, after all these years. This is one of those times. Crazy how anyone would ever want a single point of identifying everything you do.
  - CalRobert 6 months ago
    
    I don't want this - It's the exact opposite of what I want.
- ArinaS 6 months ago
  
  We won't.
- CalRobert 6 months ago
  
  To elaborate (if anyone sees this) I use Firefox on Linux. I don't LIKE this future! I just think it's where the web is headed.
eastbound 6 months ago

But people don’t interact with your website anymore; they as an AI. So the AI crawler is a real user.
I say we ask Google Analytics to count an AI crawler as a real view. Let’s see who’s most popular.

zahlman 6 months ago

> I am now of the opinion that every form of web-scraping should be considered abusive behaviour and web servers should block all of them. If you think your web-scraping is acceptable behaviour, you can thank these shady companies and the “AI” hype for moving you to the bad corner.

I imagine that e.g. Youtube would be happy to agree with this. Not that it would turn them against AI generally.

Centigonal 6 months ago

yeah, but you can't, that's the problem. Plenty of service operators would like to block every scraper that doesn't obey their robots.txt, but there's no good way to do that without blocking human traffic too (Anubis et al are okay, but they are half-measures).
On a separate note, I believe open web scraping has been a massive benefit to the internet on net, and almost entirely positive pre-2021. Web scraping & crawling enables search engines, services like Internet Archive, walled-garden-busting (like Invidious, yt-dlp, and Nitter), mashups (Spotube, IFTT, and Plaid would have been impossible to bootstrap without web scraping), and all kinds of interesting data science projects (e.g. scraping COVID-19 stats from local health departments to patch together a picture of viral spread for epidemiologists).
- udev4096 6 months ago
  
  We should have a way to verify the user-agents of the valid and useful scrapers such as Internet Archive by having some kind of cryptographic signature of their user-agents and being able to validate it with any reverse proxy seems like a good start
  - nottorp 6 months ago
    
    Self signed, I hope.
    Or do you want a central authority that decides who can do new search engines?
    
    udev4096 6 months ago
    
    Using DANE is probably the best idea even though it's still not mainstream
- lelanthran 6 months ago
  
  > Plenty of service operators would like to block every scraper that doesn't obey their robots.txt, but there's no good way to do that without blocking human traffic too (Anubis et al are okay, but they are half-measures)
  Why is Anubis-type mitigations a half-measure?
  - Centigonal 6 months ago
    
    Anubis, go-away, etc are great, don't get me wrong -- but what Anubis does is impose a cost on every query. The website operator is hoping that the compute will have a rate-limiting effect on scrapers while minimally impacting the user experience. It's almost like chemotherapy, in that you're poisoning everyone in the hope that the aggressive bad actors will be more severely affected than the less aggressive good actors. Even the Anubis readme calls it a nuclear option. In practice it appears to work pretty well, which is great!
    It's a half-measure because:
    1. You're slowing down scrapers, not blocking them. They will still scrape your site content in violation of robots.txt.
    2. Scrapers with more compute than IP proxies will not be significantly bottlenecked by this.
    3. This may lead to an arms race where AI companies respond by beefing up their scraping infrastructure, necessitating more difficult PoW challenges, and so on. The end result of this hypothetical would be a more inconvenient and inefficient internet for everyone, including human users.
    To be clear: I think Anubis is a great tool for website operators, and one of the best self-hostable options available today. However, it's a workaround for the core problem that we can't reliably distinguish traffic from badly behaving AI scrapers from legitimate user traffic.
BlueTemplar 6 months ago

Yeah, also this means the death of archival efforts like the Internet Archive.
- jeroenhd 6 months ago
  
  Welcome scrapers (IA, maybe Google and Bing) can publish their IP addresses and get whitelisted. Websites that want to prevent being on the Internet Archive can pretty much just ask for their website to be excluded (even retroactively).
  [Cloudflare](https://developers.cloudflare.com/cache/troubleshooting/alwa...) tags the internet archive as operating from 207.241.224.0/20 and 208.70.24.0/21 so disabling the bot-prevention framework on connections from there should be enough.
  - realusername 6 months ago
    
    That's basically asking to close the market in favor of the current actors.
    New actors have the right to emerge.
    
    jeroenhd 6 months ago
    
    They have the right to try to convince me to let them scrape me. Most of the time they're thinly veiled data traders. I haven't seen any new company try to scrape my stuff since maybe Kagi.
    Kagi is welcome to scrape from their IP addresses. Other bots that behave are fine too (Huawei and various other Chinese bots don't and I've had to put an IP block on those).
    
    0dayz 6 months ago
    
    No they don't.
    There's no rule that you have to let anyone in who claims to be a web crawler.
    
    realusername 6 months ago
    
    So who decides that you can be one? Right now it's Cloudflare, a litteral monopoly...
    The truth is that I sympathize with the people trying to use mobile connections to bypass such a cartel.
    What Cloudflare is doing now is worse than the web crawlers themselves and the legality of blocking crawlers with a monopoly is dubious at best.
    
    areyourllySorry 6 months ago
    
    which is why they will stop claiming to be one.
    
    chii 6 months ago
    
    so what happened to competition fostering a better outcome for all then?
  - areyourllySorry 6 months ago
    
    a large chunk of internet archive's snapshots are from archiveteam, where "warriors" bring their own ips (and they crawl respectfully!). save page now is important too, but you don't realise what is useful until you lose it.
  - trinsic2 6 months ago
    
    This sounds like it would be a good idea. Create a whitelist of IPs and block the rest.

Quarrel 6 months ago

FWIW, Trend Micro wrote up a decent piece on this space in 2023.

It is still a pretty good lay-of-the-land.

https://www.trendmicro.com/vinfo/us/security/news/vulnerabil...

aucisson_masque 6 months ago

It's interesting but so far there is no definitive proof it's happening.

People are jumping to conclusions a bit fast over here, yes technically it's possible but this kind of behavior would be relatively easy to spot because the app would have to make direct connections to the website it wants to scrap.

Your calculator app for instance connecting to CNN.com ...

iOS have app privacy report where one can check what connections are made by app, how often, last one, etc.

Android by Google doesn't have such a useful feature of course, but you can run third party firewall like pcapdroid, which I recommend highly.

Macos (little snitch).

Windows (fort firewall).

Not everyone run these app obviously, only the most nerdy like myself but we're also the kind of people who would report on app using our device to make, what is in fact, a zombie or bot network.

I'm not saying it's necessarily false but imo it remains a theory until proven otherwise.

jshier 6 months ago

> iOS have app privacy report where one can check what connections are made by app, how often, last one, etc.
Privacy reports do not include that information. They include broad areas of information the app claims to gather. There is zero connection between those claimed areas and what the app actually does unless app review notices something that doesn't match up. But none of that information is updated dynamically, and it has never actually included the domains the app connects to. You may be confusing it with the old domain declarations for less secure HTTP connections. Once the connections met the system standards you no longer needed to declare it.
- zargon 6 months ago
  
  I wasn't aware of this feature. But apparently it does include that information. I just enabled it and can see the domains that apps connect to. https://support.apple.com/en-us/102188
  - hoc 6 months ago
    
    Pretty neat, actually. Thanks for looking uo that link.
Galanwe 6 months ago

There is already a lot of proof. Just ask for a sales pitch from companies selling these data and they will gladly explain everything to you.
Go to a data conference like Neudata and you will see. You can have scraped data from user devices, real-time locations, credit card, Google analytics, etc.
throwaway519 6 months ago

Given 5his is a thing even in browser plugins, and that so very few people analyse their firewalls, I'd not discount it at all. Much of the world's users hve no clue and app stores are notoriously bad at reacting even with publicsed malware e.g. 'free' VPNs in iOS Store.
abaymado 6 months ago

> iOS have app privacy report where one can check what connections are made by app, how often, last one, etc.
How often is the average calculator app user checking there Privacy Report? My guess, not many!
- gruez 6 months ago
  
  All it takes is one person to find out and raise the alarm. The average user doesn't read the source code behind openssl or whatever either, that doesn't mean there's no gains in open sourcing it.
  - dewey 6 months ago
    
    The average user is also not reading these raised “alarms”. And if an app has a bad name, another one will show up with a different name on the same day.
    
    aucisson_masque 6 months ago
    
    You're on a tech forum, you must have seen one of the many post about app, either on Android or iPhone, that acts like spyware.
    They happens from time to time, last one was not more than two week ago where it's been shown that many app were able to read the list of all other app installed on a Android and that Google refused to fix that.
    Do you really believe that an app used to make your device part of a bot network wouldn't be posted over here ?
    
    dewey 6 months ago
    
    "You're on a tech forum", that's exactly the point. The "average user" is not on a tech forum though, the average user opens the app store of their platform, types "calculator" and installs the first one that's free.
  - nottorp 6 months ago
    
    The real solution is to add a permission for network access, with the default set to deny.
CharlesW 6 months ago

Botnets as a Service are absolutely happening, but as you allude to, the scope of the abuse is very different on iOS than, say, Windows.
andelink 6 months ago

This is a hilariously optimistic, naive, disconnected from reality take. What sort of "proof" would be sufficient for you? TFA includes of course data from the authors own server logs^, but it also references real SDKs and business selling this exact product. You can view the pricing page yourself, right next to stats on how many IPs are available for you to exploit. What else do you need to see?
^ edit: my mistake, the server logs I mentioned were from the authors prior blog post on this topic, linked to at the top of TFA: https://jan.wildeboer.net/2025/02/Blocking-Stealthy-Botnets/

jeroenhd 6 months ago

> So there is a (IMHO) shady market out there that gives app developers on iOS, Android, MacOS and Windows money for including a library into their apps that sells users network bandwidth

AKA "why do Cloudflare and Google make me fill out these CAPTCHAs all day"

I don't know why Play Protect/MS Defender/whatever Apple has for antivirus don't classify apps that embed such malware as such. It's ridiculous that this is allowed to go on when detection is so easy. I don't know a more obvious example of a trojan than an SDK library making a user's device part of a botnet.

dx4100 6 months ago

Cloudflare and Google use CAPTCHAs to sell web scrapers? I don't get your point. I was under the impression the data is used to train models.
- aloha2436 6 months ago
  
  The implication is that the users that are being constantly presented with CAPTCHAs are experiencing that because they are unwittingly proxying scrapers through their devices via malicious apps they've installed.
  - pentae 6 months ago
    
    .. or that other people on their network/Shared public IP have installed
    
    evgpbfhnr 6 months ago
    
    or just that they don't run windows/mac OS with chome like everyone else and it's "suspicious". I get cloudflare capchas all the time with firefox on linux... (and I'm pretty sure there's no such app in my home network!)
    
    Doxin 6 months ago
    
    FWIW I run firefox on linux too, and I don't have any trouble with cloudflare captchas. I get them every now and then but definitely not all the time.
- jeroenhd 6 months ago
  
  When a random device on your network gets infected with crap like this, your network becomes a bot egress point, and anti bot networks respond appropriately. Cloudflare, Akamai, even Google will start showing CAPTCHAs for every website they protect when your network starts hitting random servers with scrapers or DDoS attacks.
  This is even worse with CG-NAT if you don't have IPv6 to solve the CG-NAT problem.
  I don't think the data they collect is used to train anything these days. Cloudflare is using AI generated images for CAPTCHAs and Google's actual CAPTCHAs are easier for bots than humans at this point (it's the passive monitoring that makes it still work a little bit).
- cuu508 6 months ago
  
  Trojans in your mobile apps ruin your IP's reputation which comes back to you in the form of frequent, annoying CAPTCHAs.
areyourllySorry 6 months ago

it's not technically malware, you agreed to it when you accepted the terms of service :^)
- L-four 6 months ago
  
  It's malware it does something malicious.

Liftyee 6 months ago

I don't know if I should be surprised about what's described in this article, given the current state of the world. Certainly I didn't know about it before, and I agree with the article's conclusion.

Personally, I think the "network sharing" software bundled with apps should fall into the category of potentially unwanted applications along with adware and spyware. All of the above "tag along" with something the user DID want to install, and quietly misuse the user's resources. Proxies like this definitely have an impact for metered/slow connections - I'm tempted to start Wireshark'ing my devices now to look for suspicious activity.

There should be a public repository of apps known to have these shady behaviours. Having done some light web scraping for archival/automation before, it's a pity that it'll become collateral damage in the anti-AI-botfarm fight.

akoboldfrying 6 months ago

I agree, but the harm done to the users is only one part of the total harm. I think it's quite plausible that many users wouldn't mind some small amount of their bandwidth being used, if it meant being able to use a handy browser extension that they would otherwise have to pay actual dollars for -- but the harm done to those running the servers remains.
zzo38computer 6 months ago

I agree, this should be called spyware, and malware. There are many other kind of software that also should, but netcat and ncat (probably) aren't malware.

karmanGO 6 months ago

Has anyone tried to compile a list of software that uses these libraries? It would be great to know what apps to avoid

mzajc 6 months ago

In the case of Android, εxodus has one[1], though I couldn't find the malware library listed in TFA. Aurora Store[2], a FOSS Google Play Store client, also integrates it.
[1] https://reports.exodus-privacy.eu.org/en/trackers/ [2] https://f-droid.org/packages/com.aurora.store/
- takluyver 6 months ago
  
  That seems to be looking at tracking and data collection libraries, though, for things like advertising and crash reporting. I don't see any mention of the kind of 'network sharing' libraries that this article is about. Have I missed it?
arewethereyeta 6 months ago

No but here's the thing. Being in the industry for many years I know they are required to mention it in the TOS when using the SDKs. A crawler pulling app TOSs and parsing them could be a thing. List or not, it won't be too useful outside this tech community.
lelanthran 6 months ago

> Has anyone tried to compile a list of software that uses these libraries? It would be great to know what apps to avoid
I wouldn't mind reading a comprehensive report on SOTA with regard to bot-blocking.
Sure, there's Anubis (although someone elsethread called it a half-measure, and I'd like to know why), there's captcha's, there's relying on a monopoly (cloudflare, etc) who probably also wants to run their own bots at some point, but what else is there?
il-b 6 months ago

A good portion of free VPN apps sell their traffic. This was the thing even before the AI bot explosion.

api 6 months ago

This is nasty in other ways too. What happens when someone uses these B2P residential proxies to commit crimes that get traced back to you?

Anything incorporating anything like this is malware.

reconnecting 6 months ago

Many years ago cybercriminals used to hack computers to use them as residential proxies, now they purchase them online as a service.
In most cases they are used for conducting real financial crimes, but the police investigators are also aware that there is a very low chance that sophisticated fraud is committed directly from a residential IP address.

kastden 6 months ago

Are there any lists with known c&c servers for these services that can be added to Pihole/etc?

udev4096 6 months ago

You can use one of the list from here: https://github.com/hagezi/dns-blocklists

__MatrixMan__ 6 months ago

The broken thing about the web is that in order for data to remain readable, a unique sysadmin somewhere has to keep a server running in the face of an increasingly hostile environment.

If instead we had a content addressed model, we could drop the uniqueness constraint. Then these AI scrapers could be gossiping the data to one another (and incidentally serving it to the rest of us) without placing any burden on the original source.

Having other parties interested in your data should make your life easier (because other parties will host it for you), not harder (because now you need to work extra hard to host it for them).

akoboldfrying 6 months ago

Assuming the right incentives can be found to prevent widespread leeching, a distributed content-addressed model indeed solves this problem, but introduces the problem of how to control your own content over time. How do you get rid of a piece of content? How do you modify the content at a given URL?
I know, as far as possible it's a good idea to have content-immutable URLs. But at some point, I need to make www.myexamplebusiness.com show new content. How would that work?
- __MatrixMan__ 6 months ago
  
  As for how to get rid of a piece of content... I think that one's a lost cause. If the goal is to prevent things that make content unavailable (e.g. AI scrapers) then you end up with a design that prevents things that makes content unavailable (e.g. legitimate deletions). The whole point is that you're not the only one participating in propagating the content, and that comes with trade-offs.
  But as for updating, you just format your URLs like so: {my-public-key}/foo/bar
  And then you alter the protocol so that the {my-public-key} part resolves to the merkle-root of whatever you most recently published. So people who are interested in your latest content end up with a whole new set of hashes whenever you make an update. In this way, it's not 100% immutable, but the mutable payload stays small (it's just a bunch of hashes) and since it can be verified (presumably there's a signature somewhere) it can be gossiped around and remain available even if your device is not.
  You can soft-delete something just by updating whatever pointed to it to not point to it anymore. Eventually most nodes will forget it. But you can't really prevent a node from hanging on to an old copy if they want to. But then again, could you ever do that? Deleting something on on the web has always been a bit of a fiction.
  - akoboldfrying 6 months ago
    
    > But then again, could you ever do that?
    True in the absolute sense, but the effect size is much worse under the kind of content-addressable model you're proposing. Currently, if I download something from you and you later delete that thing, I can still keep my downloaded copy; under your model, if anyone ever downloads that thing from you and you later delete that thing, with high probability I can still acquire it at any later point.
    As you say, this is by design, and there are cases where this design makes sense. I think it mostly doesn't for what we currently use the web for.
    
    __MatrixMan__ 6 months ago
    
    You could only later get the thing if you grabbed its hash while it was still available. And you could only reliably resolve that hash later if somebody (maybe you) went out of their way to pin the underlying data. Otherwise nodes would forget rather quickly, because why bother keep around unreferenced bits?
    It's the same functionality you get with permalinks and sites like archive.org--forgotten unless explicitly remembered by anybody, dynamic unless explicitly a permalink. It's just built into the protocol rather than a feature to be inconsistently implemented over and over by many separate parties.
XorNot 6 months ago

Except no one wants content addressed data - because if you knew what it was you wanted, then you would already have stored it. The web as we know it is an index - it's a way to discover that data is available and specifically we usually want the latest data that's available.
AI scrapers aren't trying to find things they already know exist, they're trying to discover what they didn't know existed.
- __MatrixMan__ 6 months ago
  
  Yes, for the reasons you describe, you can't be both a useful web-like protocol and also 100% immutable/hash-linked.
  But there's a lot middle ground to explore here. Loading a modern web page involves making dozens of requests to a variety of different servers, evaluating some javascript, and then doing it again a few times, potentially moving several Mb of data. The part people want, the thing you don't already know exist, it's hidden behind that rather heavy door. It doesn't have to be that way.
  If you already know about one thing (by its cryptographic hash, say) and you want to find out which other hashes it's now associated with--associations that might not have existed yesterday--that's much easier than we've made it. It can be done:
  - by moving kB not Mb, we're just talking about a tuple of hashes here, maybe a public key and a signature
  - without placing additional burden on whoever authored the first thing, they don't even have to be the ones who published the pair of hashes that your scraper is interested in
  Once you have the second hash, you can then reenter immutable-space to get whatever it references. I'm not sure if there's already a protocol for such things, but if not then we can surely make one that's more efficient and durable than what we're doing now.
  - XorNot 6 months ago
    
    But we already have HEAD requests and etags.
    It is entirely possible to serve a fully cached response that says "you already have this". The problem is...people don't implement this well.
    
    __MatrixMan__ 6 months ago
    
    People don't implement them well because they're overburdened by all of the different expectations we put on them. It's a problem with how DNS forces us to allocate expertise. As it is, you need some kind of write access on the server whose name shows up in the URL if you want to contribute to it. This is how globally unique names create fragility.
    If content were handled independently of server names, anyone who cares to distribute metadata for content they care about can do so. One doesn't need write access, or even to be on the same network partition. You could just publish a link between content A and content B because you know their hashes. Assembling all of this can happen in the browser, subject to the user's configs re: who they trust.
- akoboldfrying 6 months ago
  
  > because if you knew what it was you wanted, then you would already have stored it.
  "Content-addressable" has a broader meaning than what you seem to be thinking of -- roughly speaking, it applies if any function of the data is used as the "address". E.g., git commits are content-addressable by their SHA1 hashes.
  - __MatrixMan__ 6 months ago
    
    But when you do a "git pull" you're not pulling from someplace identified by a hash, but rather a hostname. The learning-about-new-hashes part has to be handled differently.
    It's a legit limitation on what content addressing can do, but it's one we can overcome by just not having everything be content addressed. The web we have now is like if you did a `git pull` every time you opened a file.
    The web I'm proposing is like how we actually use git--periodically pulling new hashes as a separate action, but spending most of our time browsing content that we already have hashes for.
Timwi 6 months ago

Are there any systems like that, even if experimental?
- jevogel 6 months ago
  
  IPFS
  - alakra 6 months ago
    
    I had high hopes for IPFS, but even it has vectors for abuse.
    See https://arxiv.org/abs/1905.11880 [Hydras and IPFS: A Decentralised Playground for Malware]
    
    __MatrixMan__ 6 months ago
    
    Can you point me at what you mean? I'm not immediately finding something that indicates that it is not fit for this use case. The fact that bad actors use it to resist those who want to shut them down is, if anything, an endorsement of its durability. There's a bit of overlap between resisting the AI scrapers and resisting the FBI. You can either have a single point of control and a single point of failure, or you can have neither. If you're after something that's both reliable and reliably censorable--I don't think that's in the cards.
    That's not to say that it is a ready replacement for the web as we know it. If you have hash-linked everything then you wind up with problems trying to link things together, for instance. Once two pages exist, you can't after-the-fact create a link between them because if you update them to contain that link then their hashes change so now you have to propagate the new hash to people. This makes it difficult to do things like have a comments section at the bottom of a blog post. So you've got to handle metadata like that in some kind of extra layer--a layer which isn't hash linked and which might be susceptible to all the same problems that our current web is--and then the browser can build the page from immutable pieces, but the assembly itself ends up being dynamic (and likely sensitive to the users preference, e.g. dark mode as a browser thing not a page thing).
    But I still think you could move maybe 95% of the data into an immutable hash-linked world (think of these as nodes in a graph), the remaining 5% just being tuples of hashes and pubic keys indicating which pages are trusted by which users, which ought to be linked to which others, which are known to be the inputs and output of various functions, and you know... structure stuff (these are our graph's edges).
    The edges, being smaller, might be subject to different constraints than the web as we know it. I wouldn't propose that we go all the way to a blockchain where every device caches every edge, but it might be feasible for my devices to store all of the edges for the 5% of the web I care about, and your devices to store the edges for the 5% that you care about... the nodes only being summoned when we actually want to view them. The edges can be updated when our devices contact other devices (based on trust, like you know that device's owner personally) and ask "hey, what's new?"
    I've sort of been freestyling on this idea in isolation, probably there's already some projects that scratch this itch. A while back I made a note to check out https://ceramic.network/ in this capacity, but I haven't gotten down to trying it out yet.
areyourllySorry 6 months ago

there is no incentive for different companies to share data with each other, or with anyone really (facebook leeching books?)
- __MatrixMan__ 6 months ago
  
  I figure we'd create that incentive by configuring our devices to only talk to devices controlled by people we trust. If they want the data at all, they have to gain our trust, and if they want that, they have to seed the data. Or you know, whatever else the agreement ends up being. Maybe we make them pay us.

reconnecting 6 months ago

Residential IP proxies have some weaknesses. One is that they ofter change IP addresses during a single web session. Second, if IP come from the same proxies provider, they are often concentrated within a sing ASN, making them easier to detect.

We are working on an open‑source fraud prevention platform [1], and detecting fake users coming from residential proxies is one of its use cases.

[1] https://www.github.com/tirrenotechnologies/tirreno

andelink 6 months ago

The first blog post in this series[1], linked to at the top of TFA, offers an analysis on the potential of using ASNs to detect such traffic. Their conclusion was that ASNs are not helpful for this use-case, showing that across the 50k IPs they've blocked, there is less than 4 IP addresses per ASN, on average.
[1] https://jan.wildeboer.net/2025/02/Blocking-Stealthy-Botnets/
- reconnecting 6 months ago
  
  What was done manually in the first blog is exactly what tirreno helps to achieve by analyzing traffic, here is live example [1]. Blocking an entire ASN should not be considered a strategy when real users are involved.
  Regarding the first post, it's rare to see both datacenter network IPs and mobile proxy IP addresses used simultaneously. This suggests the involvement of more than one botnet. The main idea is to avoid using IP addresses as the sole risk factor. Instead, they should be considered as just one part of the broader picture of user behavior.
  [1] https://play.tirreno.com
gruez 6 months ago

>One is that they ofter change IP addresses during a single web session. Second, if IP come from the same proxies provider, they are often concentrated within a sing ASN, making them easier to detect.
Both are pretty easy to mitigate with a geoip database and some smart routing. One "residential proxy" vendor even has session tokens so your source IP doesn't randomly jump between each request.
- reconnecting 6 months ago
  
  And this is the exact reason why IP addresses cannot be considered as the one and only signal for fraud prevention.
gbcfghhjj 6 months ago

At least here in the US most residential ISPs have long leases and change infrequently, weeks or months.
Trying to understand your product, where is it intended to sit in a network? Is it a standalone tool that you use to identify these IPs and feed into something else for blockage or is it intended to be integrated into your existing site or is it supposed to proxy all your web traffic? The reason I ask is it has fairly heavyweight install requirements and Apache and PHP are kind of old school at this point, especially for new projects and companies. It's not what they would commonly be using for their site.
- reconnecting 6 months ago
  
  Indeed, if it's a real user from a residential IP address, in most cases it will be the same network. However, if it's a proxy from residential IPs, there could be 10 requests from one network, the 11th request from a second network, and the 12th request back from the same network. This is a red flag.
  Thank you for your question. tirreno is a standalone app that needs to receive API events from your main web application. It can work perfectly with 512GB Postgres RAM or even lower, however, in most cases we're talking about millions of events that request resources.
  It's much easier to write a stable application without dependencies based on mature technologies. tirreno is fairly 'boring software'.
  - sroussey 6 months ago
    
    My phone will be on the home network until I walk out of the house and then it will change networks. This should not be a red flag.
    
    reconnecting 6 months ago
    
    Effective fraud prevention relies on both the full user context and the behavioral patterns of known online fraudsters. The key idea is that an IP address cannot be used as a red flag on its own without considering the broader context of the account. However, if we know that the fraudsters we're dealing with are using mobile networks proxies and are randomly switching between two mobile operators, that is certainly a strong risk signal.
    
    JimDabell 6 months ago
    
    An awful lot of free Wi-Fi networks you find in malls are operated by different providers. Walking from one side of a mall to the other while my phone connects to all the Wi-Fi networks I’ve used previously would have you flag me as a fraudster if I understand your approach correctly.
    
    reconnecting 6 months ago
    
    We are discussing user behavior in the context of a web system. The fact that your device has connected to different Wi-Fi networks doesn't necessarily mean that all of them were used to access the web application.
    Finally, as mentioned earlier, there is no silver bullet that works for every type of online fraudster. For example, in some applications, a TOR connection might be considered a red flag. However, if we are talking about hn visitors, many of them use TOR on a daily basis.
    
    sroussey 6 months ago
    
    I’ve done a bit of anti-fraud myself and it needs a lack of privacy to work well. Well fingerprinted == less fraud. Sigh.
    I’ve found TOR browsing ok, but login via TOR to just be a great alternative to snow shoeing credential stuffing.

Pesthuf 6 months ago

We need a list of apps that include these libraries and any malware scanner - including Windows Defender, Play Protect and whatever Apple calls theirs - need to put infected applications into quarantine immediately. Just because it's not directly causing damage to the device running the malware is running on, that doesn't mean it's not malware.

philippta 6 months ago

Apps should be required to ask for permission to access specific domains. Similar to the tracking protection, Apple introduced a while ago.
Not sure how this could work for browsers, but the other 99% of apps I have on my phone should work fine with just a single permitted domain.
- snackernews 6 months ago
  
  My iPhone occasionally displays an interrupt screen to remind me that my weather app has been accessing my location in the background and to confirm continued access.
  It should also do something similar for apps making chatty background requests to domains not specified at app review time. The legitimate use cases for that behaviour are few.
- klabb3 6 months ago
  
  On the one hand, yes this could work for many cases. On the other hand, good bye p2p. Not every app is a passive client-server request-response. One needs to be really careful with designing permission systems. Apple has already killed many markets before they had a chance to even exist, such as companion apps for watches and other peripherals.
  - kmeisthax 6 months ago
    
    P2P was practically dead on iPhone even back in 2010. The whole "don't burn the user's battery" thing precludes mobile phones doing anything with P2P other than leeching off of it. The only exceptions are things like AirDrop; i.e. locally peer-to-peer things that are only active when in use and don't try to form an overlay or mesh network that would require the phone to become a router.
    And, AFAIK, you already need special permission for anything other than HTTPS to specific domains on the public Internet. That's why apps ping you about permissions to access "local devices".
    
    zzo38computer 6 months ago
    
    > other than HTTPS to specific domains on the public Internet
    They should need special permission for that too.
  - Pesthuf 6 months ago
    
    Maybe there could be a special entitlement that Apple's reviewers would only grant to applications that have a legitimate reason to require such connections. Then only applications granted that permission would be able to make requests to arbitrary domains / IP addresses.
    That's how it works with other permissions most applications should not have access to, like accessing user locations. (And private entitlements third party applications can't have are one way Apple makes sure nobody can compete with their apps, but that's a separate issue.)
  - nottorp 6 months ago
    
    > On the other hand, good bye p2p.
    You mean, good bye using my bandwidth without my permission? That's good. And if I install a bittorrent client on my phone, I'll know to give it permission.
    > such as companion apps for watches and other peripherals
    That's just apple abusing their market position in phones to push their watch. What does it have to do with p2p?
    
    klabb3 6 months ago
    
    > using my bandwidth without my permission
    What are you talking about?
    > What does it have to do with p2p?
    It’s an example of when you design sandboxes/firewalls it’s very easy to assume all apps are one big homogenous blob doing rest calls and everything else is malicious or suspicious. You often need strange permissions to do interesting things. Apple gives themselves these perms all the time.
    
    nottorp 6 months ago
    
    Wait, why should applications be allowed to do rest calls by default?
    > What are you talking about?
    That’s the main use case for p2p in an application isn’t it? Reducing the vendors bandwidth bill…
    
    klabb3 6 months ago
    
    > That’s the main use case for p2p in an application isn’t it? Reducing the vendors bandwidth bill…
    The equivalent would be to say that running local workloads or compute is to reduce the vendors bill. It’s a very centralized view of the internet.
    There are many reasons to do p2p. Such as improving bandwidth and latency, circumventing censorship, improve resilience and more. WebRTC is a good example of p2p used by small and large companies alike. None of this is any more ”without permission” than a standard app phoning home and tracking your fingerprint and IP.
    
    nottorp 6 months ago
    
    Oh, funny you should pick WebRTC. Back when I was still using Chrome, it prevented my desktop from sleeping because 'WebRTC has active peer connections'. With no indication on which page that is happening.
    Great respect for the user's resources.
    
    klabb3 6 months ago
    
    Haha yeah I personally hate WebRTC. It’s a mess and I’ve literally rewritten the parts of it I need in order to avoid it. (Check my profile)
    I just brought it up as a technology that at the very least is both legitimate and common.
- udev4096 6 months ago
  
  Android is so fucking anti-privacy that they still don't have an INTERNET access revoke toggle. The one they have currently is broken and can easily be bypassed with google play services (another highly privileged process running for no reason other than to sell your soul to google). GrapheneOS has this toggle luckily. Whenever you install an app, you can revoke the INTERNET access at the install screen and there is no way that app can bypass it
  - mjmas 6 months ago
    
    Asus added this to their phones which is nice.
- zzo38computer 6 months ago
  
  I think capability based security with proxy capabilities is the way to do it, and this would make it possible for the proxy capability to intercept the request and ask permission, or to do whatever else you want it to do (e.g. redirections, log any accesses, automatically allow or disallow based on a file, use or ignore the DNS cache, etc).
  The system may have some such functions built in, and asking permission might be a reasonable thing to include by default.
  - XorNot 6 months ago
    
    Try actually using a system like this. OpenSnitch and LittleSnitch do it for Linux and MacOS respectively. Fedora has a pretty good interface for SELinux denials.
    I've used all of them, and it's a deluge: it is too much information to reasonably react to.
    Your broad is either deny or accept but there's no sane way to reliably know what you should do.
    This is not and cannot be an individual problem: the easy part is building high fidelity access control, the hard part is making useful policy for it.
    
    zzo38computer 6 months ago
    
    I suggested proxy capabilities, that it can easily be reprogrammed and reconfigured; if you want to disable this feature then you can do that too. It is not only allow or deny; other things are also possible (e.g. simulate various error conditions, artificially slow down the connection, go through a proxy server, etc). (This proxy capability system would be useful for stuff other than network connections too.)
    > it is too much information to reasonably react to.
    Even if it asks, does not necessarily mean it has to ask every time if the user lets it keep the answer (either for the current session for until the user deliberately deletes this data). Also, if it asks too much because it tries to access too many remote servers, then might be spyware, malware, etc anyways, and is worth investigating in case that is what it is.
    > the hard part is making useful policy for it.
    What the default settings should be is a significant issue. However, changing the policies in individual cases for different uses, is also something that a user might do, since the default settings will not always be suitable.
    If whoever manages the package repository, app store, etc is able to check for malware, then this is a good thing to do (although it should not prohibit the user from installing their own software and modifying the existing software), but security on the computer is also helpful, and neither of these is the substitute for the other; they are together.
- vbezhenar 6 months ago
  
  Do you suggest to outright forbid TCP connections for user software? Because you can compile OpenSSL or any other TLS library and do a TCP connection to port 443 which will be opaque for operating system. They can do wild things like kernel-level DPI for outgoing connections to find out host, but that quickly turns into ridiculous competition.
  - internetter 6 months ago
    
    > but that quickly turns into ridiculous competition.
    Except the platform providers hold the trump card. Fuck around, if they figure it out you'll be finding out.
- tzury 6 months ago
  
  Vast majority of revenues in the mobile apps ecosystem are ads, which by design pulled from 3rd parties (and are part of the broader problem discussed in this post).
  I am waiting for Apple to enable /etc/hosts or something similar on iOS devices.
- jay_kyburz 6 months ago
  
  Oh, that's an interesting idea. A local DNS where I have to add every entry. A white list rather than Australia's national blacklist.

at0mic22 6 months ago

Strange the HolaVPN e.g. Brightdata is not mentioned. They've been using user hosts for those purposes for decades, and also selling proxies en masse. Fun fact they don't have any servers for the VPN. All the VPN traffic is routed through ... other users!

andelink 6 months ago

Hola is mentioned in the authors prior post on this topic, linked to at the top of TFA: https://jan.wildeboer.net/2025/02/Blocking-Stealthy-Botnets/
arewethereyeta 6 months ago

They are even the first to do it and the most litigious of all. Trying to push patents on everything possible, even on water if they can.
Klonoar 6 months ago

Is it really strange if the logo is right there in the article?

reincoder 6 months ago

I work for IPinfo (a commercial service). We offer a residential proxy detection service, but it costs money.

If you are being bombarded by suspicious IP addresses, please consider using our free service and blocking IP addresses by ASN or Country. I think ASN is a common parameter for malicious IP addresses. If you do not have time to explore our services/tools (it is mostly just our CLI: https://github.com/ipinfo/cli), simply paste the IP addresses (or logs) in plain text, send it to me and I will let you know the ASNs and corresponding ranges to block.

throwaway74663 6 months ago

Blocking countries is such a poorly disguised form of racism. Funny how it's always the brown / yellow people countries that get blocked, and never the US, despite it being one of the leading nations in malicious traffic.
- reincoder 6 months ago
  Oh, absolutely not — I have to respectfully but strongly disagree with that sentiment.
  In cybersecurity, decisions must be guided by objective data, not assumptions or biases. When you’re facing abuse, you analyze the IPs involved and enrich them with context — ASN, country, city, whether it’s VPN, hosting, residential, etc. That gives you the information you need to make calculated decisions: Should you block a subnet? Rate-limit it? CAPTCHA-challenge it?
  Here’s a small snapshot from my own SSH honeypot:
  Summary of 1,413 attempts
  - Hosting IPs: 981 (69%) - VPNs: 35 - Top ASNs: - AS204428 (SS-Net): 152 - AS136052 (PT Cloud Hosting Indonesia): 83 - AS14061 (DigitalOcean): 76 - Top Countries: - Romania: 238 (16.8%) - United States: 150 (10.6%) - China: 134 (9.5%) - Indonesia: 115 (8.1%)
  One single /24 from Romania accounts for over 10% of the attacks. That’s not about nationality or ethnicity — it's about IP space abuse from a specific network. If a network or country consistently shows high levels of hostile traffic and your risk tolerance justifies it, blocking or throttling it may be entirely reasonable.
  Security teams don’t block based on "where people come from" — they block based on where the attacks are coming from.
  We even offer tools to help people explore and understand these patterns better. But if someone doesn’t have the time or resources to do that, I'm more than happy to assist by analyzing logs and suggesting reasonable mitigations.
  - arewethereyeta 6 months ago
    
    You should block abusers not an entire country based on a few actors. You can spin this as much as you like it is still a country block and that country is an incredible IT pool of talent and legitimate users. If we're still there you can block the United States also for your ipinfo business since all stats indicate that US is the number one source of fraud on the internet if we're talking IP addresses which your business does. Let us know how that goes.
    I hope nobody does cybersecurity in 2025 by analysing and enriching IP addresses. Not on a market where a single residential proxy provider (which you fail to identify) offers 150M+ exit nodes. Even a JA3 fingerprinting could be more useful than looking at IP addresses. I bet you, romanian ips were not operated by romanians. yet you're banning all romanians?
    
    reincoder 6 months ago
    
    The kind of blocking I'm referring to is IP metadata-based, not blanket country bans. I specifically mentioned that a single `/24` subnet was responsible for ~10% of brute-force attempts in my honeypot. That doesn’t mean I’d block all of Romania — obviously, the Romanian IP space is vastly larger — but it does raise questions about specific ASNs and IP ranges. In this case, Romanian IPs accounted for 16.8% of total attacks. That’s statistically significant and calls for deeper analysis, not assumptions.
    Cybersecurity is a probabilistic game. You build a threat model based on your business, audience, and tolerance for risk. Blocking combinations of metadata — such as ASN, country, usage type, and VPN/proxy status — is one way to make informed short-term mitigations while preserving long-term accessibility. For example:
    If an ASN is a niche hosting provider in Indonesia, ask: “Do I expect real users from here?”
    If a /24 from a single provider accounts for 10% of your attacks, ask: “Do I throttle it or add a CAPTCHA?”
    The point isn’t to permanently ban regions or people. It’s to reduce noise and protect services while staying responsive to legitimate usage patterns.
    As for IP enrichment — yes, it's still extremely relevant in 2025. Just like JA3, TLS fingerprinting, or behavioral patterns — it's one more layer of insight. But unlike opaque “fraud scores” or black-box models, our approach is fully transparent: we give you raw data, and you build your own model.
    We intentionally don’t offer fraud scoring or IP quality scores. Why? Because we believe it reduces agency and transparency. It also risks penalizing privacy-conscious users just for using VPNs. Instead, we let you decide what “risky” means in your own context.
    We’re deeply committed to accuracy and evidence-based data. Most IP geolocation providers historically relied on third-party geofeeds or manual submissions — essentially repackaging what networks told them. We took a different route: building a globally distributed network of nearly 1,000 probe servers to generate independent, verifiable measurements for latency-based geolocation. That’s a level of infrastructure investment most providers haven’t attempted, but we believe it's necessary for reliability and precision.
    Regarding residential proxies: we’ve built our own residential proxy detection system (https://ipinfo.io/products/residential-proxy) from scratch, and it’s maturing fast. One provider may claim 150M+ exit nodes, but across a 90-day rolling window, we’ve already observed 40,631,473 unique residential proxy IPs — and counting. The space is noisy, but we’re investing heavily in research-first approaches to bring clarity to it.
    IP addresses aren’t perfect but nothing is! But with the right context, they’re still one of the most powerful tools available for defending services at the network layer. We provide the context and you build the solution.

armchairhacker 6 months ago

Why jump to that conclusion?

If a scraper clearly advertises itself, follows robots.txt, and has reasonable backoff, it's not abusive. You can easily block such a scraper, but then you're encouraging stealth scrapers because they're still getting your data.

I'd block the scrapers that try to hide and waste compute, but deliberately allow those that don't. And maybe provide a sitemap and API (which besides being easier to scrape, can be faster to handle).

amiga-workbench 6 months ago

What is the point of app stores holding up releases for review if they don't even catch obvious malware like this?

_Algernon_ 6 months ago

They pretend to do a review to justify their 30% cartel tax.
- klabb3 6 months ago
  
  Oh no, they review thoroughly, to make sure you don’t try to avoid the tax.
wyck 6 months ago

This isn't obvious, 99% of apps make multiple calls to multiple services, and these SDK's are embedded into the app. How can you tell whats legit outbound/inbound? Doing a fingerprint search for the worst culprits might help catch some, but it would likely be a game of cat and mouse.
- nottorp 6 months ago
  
  > How can you tell whats legit outbound/inbound?
  If the app isn't a web browser, none are legit?
  - wyck 6 months ago
    
    99.9% of app on app store connect to the network for a multitude of reason, do you really think only browsers connect to the internet? Do you not have an app on your phone?
politelemon 6 months ago

Their marketing tells you it's for protection. What they fail to omit is it's for their revenue protection - observe that as long as you do not threaten their revenue models, or the revenue models of their partners, you are allowed through. It has never been about the users or developers.
charcircuit 6 months ago

The definition of malware is fuzzy.
SoftTalker 6 months ago

Money

arewethereyeta 6 months ago

I have some success in catching most of them at https://visitorquery.com

lq9AJ8yrfs 6 months ago

I went to your website.
Is the premise that users should not be allowed to use vpns in order to participate in ecommerce?
- arewethereyeta 6 months ago
  
  Nobody said that, it's your choice to take whatever action fits your scenario. I have clients where VPNs are blocked yes, it depends on the industry, fraud rate, chargeback rates etc.
ivas 6 months ago

Checked my connection via VPN by Google/Cloudflare WARP: "Proxy/VPN not detected"
- arewethereyeta 6 months ago
  
  Could be, I don't claim 100% success rate. I'll have a look at one of those and see why I missed it. Thank you for letting me know.
  - nickphx 6 months ago
    
    measuring latency between different endpoints? I see the webrtc turn relay request..

pton_xd 6 months ago

I thought the closed-garden app stores were supposed to protect us from this sort of thing?

20after4 6 months ago

That's what they want you to think.
whstl 6 months ago

Once again this demonstrate that closed gardens only benefit the owners of the garden, and not the users.
What good is all the app vetting and sandbox protection in iOS (dunno about Android) if it doesn't really protect me from those crappy apps...
- BlueTemplar 6 months ago
  
  Also my reaction when the call is for Google, Apple, Microsoft to fix this : DDOS being illegal, shouldn't the first reaction instead to be to contact law enforcement ?
  If you treat platforms like they are all-powerful, then that's what they are likely to become...
- 20after4 6 months ago
  
  At the very least, Apple should require conspicuous disclosure of this kind of behavior that isn't just hidden in the TOS.
- musicale 6 months ago
  
  Sandboxing means you can limit network access. For example, on Android you can disallow wi-fi and cellular access (not sure about bluetooth) on a per-app basis.
  Network access settings should really be more granular for apps that have a legitimate need.
  App store disclosure labels should also add network usage disclosure.
kibwen 6 months ago

If you find yourself in a walled garden, understand that you're the crop being grown and harvested.

areyourllySorry 6 months ago

further reading

https://krebsonsecurity.com/?s=infatica

https://krebsonsecurity.com/tag/residential-proxies/

https://spur.us/blog/

https://bright-sdk.com/ <- way bigger than infatica

dspillett 6 months ago

> So there is a (IMHO) shady market out there that gives app developers on iOS, Android, MacOS and Windows money for including a library into their apps that sells users network bandwidth.

This is yet another reason why we need to be wary of popular apps, add-ons, extensions, and so forth changing hands, by legitimate sale or more nefarious methods. Initially innocent utilities can be quickly coopted into being parts of this sort of scheme.

greesil 6 months ago

How would I know if an app on my device was doing this?

wyck 6 months ago

Install a network monitor or go even deeper and sniff packets.
- greesil 6 months ago
  
  I feel like this could be automated. Spin up a virtual device on a monitored network. Install one app, click on some stuff for awhile, uninstall and move onto the next. If the app reaches out to a lot of random sites then flag it
  Google could do this. I'm sure Apple could as well. Third parties could for a small set of apps
  - jeroenhd 6 months ago
    
    This is being done by a couple of SDKs, it'd be much easier to just find and flag those SDK files. Finding apps becomes a matter of a single pass scan over the application contents rather than attempting to bypass the VM detection methods malware is packed full of.

hinkley 6 months ago

When the enshitification initially hit the fan, I had little flashbacks of Phil Zimmerman talking about Web of Trust and amusing myself thinking maybe we need humans proving they're humans to other humans so we know we aren't arguing with LLMs on the internet or letting them scan our websites.

But it just doesn't scale to internet size so I'm fucked if I know how we should fix it. We all have that cousin or dude in our highschool class who would do anything for a bit of money and introducing his 'friend' Paul who is in fact a bot whose owner paid for the lie. And not like enough money to make it a moral dilemma, just drinking money or enough for a new video game. So once you get past about 10,000 people you're pretty much back where we are right now.

sfink 6 months ago

Isn't the point of the web of trust that you can do something about the cousins/dudes out there? Once you discover that they sold out, even once, you sever them from the web. It doesn't matter if they took 20 years to succumb to the temptation, you can cut them off tomorrow. And that cuts off everyone they vouched for, recursively, unless there's a still-trusted vouch chain to someone.
At least, that's the way I've always imagined it working. Maybe I need to read up.
akoboldfrying 6 months ago

I think it should be possible to build something that generalises the idea of Web of Trust so that it's more flexible, and less prone to catastrophic breakdown past some scaling limit.
Binary "X trusts Y" statements, plus transitive closure, can lead to long trust paths that we probably shouldn't actually trust the endpoints of. Could we not instead assign probabilities like "X trusts Y 95%", multiply probabilities along paths starting from our own identity, and take the max at each vertex? We could then decide whether to finally trust some Z if its percentage is more than some threshold T%. (Other ways of combining in-edges may be more suitable than max(); it's just a simple and conservative choice.)
Perhaps a variant of backprop could be used to automatically update either (a) all or (b) just our own weights, given new information ("V has been discovered to be fraudulent").
- hinkley 6 months ago
  
  True. Perhaps a collective vote past 2 degrees of freedom out where multiple parties need to vouch for the same person before you believe they aren't a bot. Then you're using the exponential number of people to provide diminishing weight instead of increasing likelihood of malfeasance.
  - nottorp 6 months ago
    
    But do we need an infinite and global web of trust?
    How about restricting them to everyone-knows-everyone sized groups, of like a couple hundred people?
    One can be a member of multiple groups so you're not actually limited. But the groups will be small enough to self regulate.
    
    hinkley 6 months ago
    
    What’s that going to do about all of the top search results and a good percentage of social media traffic being generated by SEO bots? Nothing.
    You want to chat with a Dunbar number of people get yourself a private discord or slack channel.
    
    nottorp 6 months ago
    
    The Dunbar number of people could vouch for small web sites they come across. Or even for FB accounts if they choose to.
    
    hinkley 6 months ago
    
    I suspect a lot of people here are the ones in their circle who bring in a lot of the cool info that their friends missed out on. This still sounds like Slack.
    
    nottorp 6 months ago
    
    We're talking about webs of trust aren't we? Not about chat rooms.
    I'm hypothesising that any such large scale structure will be perverted by commercial interests, while having multiple Dunbar sized such structures will have a chance to be useful.

rsedgwick 6 months ago

I think tech can still be beautiful in a less grandiose and "omniparadisical" way than people used to dream of. "A wide open internet, free as in speech this, free as in beer that, open source wonders, open gardens..." Well, there are a lot of incentives that fight that, and game theory wins. Maybe we download software dependencies from our friends, the ones we actually trust. Maybe we write more code ourselves--more homesteading families that raise their own chickens, jar their own pickled carrots, and code their own networking utilities. Maybe we operate on servers we own, or our friends own, and we don't get blindsided by news that the platforms are selling our data and scraping it for training.

Maybe it's less convenient and more expensive and onerous. Do good things require hard work? Or did we expect everyone to ignore incentives forever while the trillion-dollar hyperscalers fought for an open and noble internet and then wrapped it in affordable consumer products to our delight?

It reminds me of the post here a few weeks ago about how Netflix used to be good and "maybe I want a faster horse" - we want things to be built for us, easily, cheaply, conveniently, by companies, and we want those companies not to succumb to enshittification - but somehow when the companies just follow the game theory and turn everything into a TikToky neural-networks-maximizing-engagement-infinite-scroll-experience, it's their fault, and not ours for going with the easy path while hoping the corporations would not take the easy path.

yungporko 6 months ago

it's funny, i've never heard of or thought about the possibility of this happening but actually in hindsight it seems almost too obvious to not be a thing.

neilv 6 months ago

Couldn't Apple and Google (and, to a lesser extent, Microsoft) pretty easily shut down almost all the apps that steal bandwidth?

panny 6 months ago

>Apple, Microsoft and Google should act.

Do nothing, win.

They are the primary benefactors buying this data since they are the largest AI players.

panstromek 6 months ago

I'd expect this to be against app store and google play rules, they are very picky.

matheusmoreira 6 months ago

"Peer-to-business network"! Amazing. uBlock Origin gets rid of this, right?

_ink_ 6 months ago

How can I detect such behaviour on my devices / in my home network?

theteapot 6 months ago

Are ad blockers like AdBlock, uBlock effective against these?

areyourllySorry 6 months ago

i don't believe extensions can modify other extensions

proxy_err 6 months ago

Its a fair point but very dynamic to sort out. This needs a full research team to figure out. Or you know.. all of us combined!! It is definitely a problem.

TINFOIL: Sometimes I always wondered if Azure or AWS used bots to push site traffic hits to generate money... they know you are hosted with them.. They have your info.. Send out bots to drive micro accumulation. Slow boil..

luckylion 6 months ago

I think that's mostly that they don't care about having malicious bots on their networks as long as they pay.
GCE is rare in my experience. Most bots I see are on AWS. The DDOS-adjacent hyper aggressive bots that try random URLs and scan for exploits tend to be on Azure or use VPNs.
AWS is bad when you report malicious traffic. Azure has been completely unresponsive and didn't react, even for C&C servers.

badmonster 6 months ago

do you think there’s a realistic path forward for better transparency or detection—maybe at the OS level or through network-level anomaly detection?

y42 6 months ago

Let me get this straight: we want computers knowing everything, to solve current and future problems, but we don't want to give them access to our knowledge?

jeroenhd 6 months ago

I don't want computers to know everything. Most knowledge on the internet is false and entirely useless.
The companies selling us computers that supposedly know everything should pay for their database, or they should give away the knowledge they gained for free. Right now, the scraping and copying is free and the knowledge is behind a subscription to access a proprietary model that forms the basis of their business.
Humanity doesn't benefit, the snake oil salesmen do.
- y42 6 months ago
  That’s factually incorrect. You can use most of these products for free. I use ChatGPT, Perplexity, ClaudeAI, and Gemini every day without paying, and even just these free services have already improved various processes in my life.
  I do agree with you on the point that we need to find better ways to compensate the people creating content—especially considering that parts of this "AI service," as we might call it, are subscription-based.
  But in the long run, I’m quite sure that if everyone shared this opinion, it wouldn't move us forward technologically.
  Also, a couple of other points:
  Google and others have been scraping the internet for years, and no one complained then. You're not paying the AI company for the knowledge itself—you're paying for the technology behind it, for the ability to access and use it effectively.
lelanthran 6 months ago

> Let me get this straight: we want computers knowing everything, to solve current and future problems, but we don't want to give them access to our knowledge?
Who said that?
There's basically two extremes:
1. We want access to all of human knowledge, now and forever, in order to monetise it and make more money for us, and us alone.
and
2. We don't want our freely available knowledge sold back to us, with no credits to the original authors.
- y42 6 months ago
  
  1. What exactly is wrong with the first part of that point? I agree that the second part is inaccurate—right now, the money mostly flows in one direction. But as I mentioned earlier, we can use many of these tools for free.
  2. You’re not paying just to have your own knowledge echoed back at you. You’re paying so that someone (or something) can read what you provide and, ideally, return improved knowledge or fresh insights. As I said above, you’re paying for the technology and its capabilities—not the knowledge itself. That’s how I see it.
  - lelanthran 6 months ago
    
    I'm merely pointing out that there's two separate groups of people.
    You appear to be under the impression that there is only one hypocritical group.
drawfloat 6 months ago

Most people don’t want computers to know everything - ask the average person if they want more or less of their lives recorded and stored.
- y42 6 months ago
  
  > Most people don’t want computers to know everything.
  That may well be true. But how many of those people are specifically against AI companies scraping the web? That’s not really an argument—it’s an assumption based on personal perception.
  > Ask the average person if they want more or less of their lives recorded and stored.
  What exactly is the "average person"? Also, I’ll admit my earlier claim was a bit exaggerated. But let’s be clear: this isn’t about recording personal data—it’s about collecting and structuring knowledge.
  And beyond that: companies have been scraping the web for years. They still are. And they’re gathering far more personal data for online marketing, tracking, profiling—whatever the reason—and the so-called "average person" hasn’t raised much of a finger. People remain glued to platforms, willingly sharing their personal lives. And what do they get in return? Doomscrolling and five-second video clips.
3np 6 months ago

I don't want your computer to know everything about me, in fact.
- y42 6 months ago
  
  That’s not what I said. Let me rephrase it: Do you want computers to help us solve medical or scientific problems in order to improve human life?
chairmansteve 6 months ago

Not sure we do.

jt2190 6 months ago

I’m really struggling to understand how this is different than malware we’ve had forever. Can someone explain what’s novel about this?

desertmonad 6 months ago

That its not being treated like malware.
- jt2190 6 months ago
  
  In the sense that people are voluntarily installing and running this malware on their computers, rather than being tricked into running it? Is that the only difference?
  - int_19h 6 months ago
    
    They are still tricked into running it, since it's normally not an advertised "feature" of any app that uses such SDKs.
downrightmike 6 months ago

I think it is funny that the mobile OS is trying to be as secure as possible, but then they allow this to run on top

jgalt212 6 months ago

I blame the VCs. They don't stop, and implicitly encourage, website-crushing scrapers among their funded ventures.

It's not a crime if we do it with an app

https://pluralistic.net/2025/01/25/potatotrac/#carbo-loading

jonplackett 6 months ago

How is this not just illegal? Surely there’s something in GDPR that makes this not allowed.

Retr0id 6 months ago

iiuc, they do actually ask the user for permission
- fc417fc802 6 months ago
  
  Which is ironic considering that I strongly disagree with one of the primary walled garden justifications, used particularly in the case of Apple, which amounts to "the end user is too stupid to decide on his own". Unfortunately, even if I disagree with it as a guiding principle sometimes that statement proves true.
  - klabb3 6 months ago
    
    It’s not about stupidity, but practicality. People can’t give informed consent for 100 ToS for different companies, and keep those up to date. That’s why there are laws.
- SoftTalker 6 months ago
  
  No doubt in a dense wall of text that the user must accept to use the application, or worse is deemed to have accepted by using the application at all.

vlan121 6 months ago

when the shit hits the fan, this seems like the product.

ChrisMarshallNY 6 months ago

> So if you as an app developer include such a 3rd party SDK in your app to make some money — you are part of the problem and I think you should be held responsible for delivering malware to your users, making them botnet members.

I suspect that this goes for many different SDKs. Personally, I am really, really sick of hearing "That's a solved problem!", whenever I mention that I tend to "roll my own," as opposed to including some dependency, recommended by some jargon-addled dependency addict.

Bad actors love the dependency addiction of modern developers, and have learned to set some pretty clever traps.

ryandrake 6 months ago

I’m constantly amazed at how careless developers are with pulling 3rd party libraries into their code. Have you audited this code? Do you know everything it does? Do you know what security vulnerabilities exist in it? On what basis do you trust it to do what it says it is doing and nothing else?
But nobody seems to do this diligence. It’s just “we are in a rush. we need X. dependency does X. let’s use X.” and that’s it!
- ClumsyPilot 6 months ago
  
  > Have you audited this code?
  Wrong question. “Are you paid to audit this code?” And “if you fail to audit this code, who’se problem is it?”
  - ryandrake 6 months ago
    
    I think developers are paid to competently deliver software to their employer, and part of that competence is properly vetting the code you are delivering. If I wrote code that ended up having serious bugs like crashing, I’d expect to have at least a minimum consequence, like root causing it and/or writing a postmortem to help avoid it in the future. Same as I’d expect if I pulled in a bad dependency.
    
    baumy 6 months ago
    
    Your expectations do not match the employment market as I have ever experienced it.
    Have you ever worked anywhere that said "go ahead and slow down on delivering product features that drive business value so you can audit the code of your dependencies, that's fine, we'll wait"?
    I haven't.
    
    ryandrake 6 months ago
    
    Yea, and that’s the problem. If such absolute rock bottom minimal expectations (know what the code does) are seen as too slow and onerous, the industry is cooked!
    
    ClumsyPilot 6 months ago
    
    Yeah, about that, businesses are pushing and introducing code written by AI/LLM now, so now you won't even know what your own code does.
    
    djeastm 6 months ago
    
    Due diligence is a sliding scale. Work at a webdev agency is "get it done as fast as possible for this MVP we need". Work at NASA or a biomedical device company? Every line of code is triple-checked. It's entirely dependent on the cost/benefit analysis.
  - Funes- 6 months ago
    
    "who'se" is wild.
  - SoftTalker 6 months ago
    
    If a car manufacturer sources a part from a third party, and that part has a serious safety problem, who will the customer blame? And who will be responsible for the recall and the repairs?
    
    ClumsyPilot 6 months ago
    
    But we aren’t car business, am we are in joker business.
    When was the last time producer of an app was held legally accountable for negligence, had to pay compensation and damages, etc?
sixtyj 6 months ago

Malware, botnets… it is very similar. And people including developers are - in 80 per cent - eagier to make money, because… Is greed good? No, it isn’t. It is a plague.
- II2II 6 months ago
  
  You're a developer who devoted time to develop a piece of software. You discover that you are not generating any income from it: few people can even find it in the sea of similar apps, few of those are willing to pay for it, and those who are willing to pay for it are not willing to pay much. To make matters worse, you're going to lose a cut of what is paid to the middlemen who facilitate the transaction.
  Is that greed?
  I can find many reasons to be critical of that developer, things like creating a product for a market segment that is saturated, and likely doing so because it is low hanging fruit (both conceptually and in terms of complexity). I can be critical of their moral judgement for how they decided to generate income from their poor business judgment. But I don't thinks it's right to automatically label them as greedy. They may be greedy, but they may also be trying to generate income from their work.
  - andelink 6 months ago
    
    > Is that greed?
    Umm, yes? You are not owed anything in this life, certainly not income for your choice to spend your time on building a software product no one asked for. Not making money on it is a perfectly fine outcome. If you desperately need guaranteed money, don't build an app expecting it to sell; get a job.
    
    klabb3 6 months ago
    
    > If you desperately need guaranteed money, don't build an app expecting it to sell; get a job.
    Technically true but a bit of perspective might help. The consumer market is distorted by free (as in beer) apps that does a bunch of shitty things that should in many cases be illegal or require much more informed consent than today, like tracking everything they can. Then you have VC funded ”free” as well, where the end game is to raise prices slowly to boil the frog. Then you have loss leaders from megacorps, and a general anti-competitive business culture.
    Plus, this is not just in the Wild West shady places, like the old piratebay ads. The top result for ”timer” on the App Store (for me) is indeed a timer app, but with IAP of $800/y subscription… facilitated by Apple Inc, who gets 15-30% of the bounty.
    Look, the point is it’s almost impossible to break into consumer markets because everyone else is a predator. It’s a race to the bottom, ripping off clueless customers. Everyone would benefit from a fairer market. Especially honest developers.
    
    what 6 months ago
    
    >$800/year IAP
    That’s got to be money laundering or something else illicit? No one is actually paying that for a timer app?
    
    klabb3 6 months ago
    
    No I think it’s designed to catch misclicks and children operating the phone and such, sold as $17/week possibly masquerading as one-time payment. They pay for App Store ads for it too.
    
    econ 6 months ago
    
    I prefer to focus on the technical shortcomings.
    We could have people ask for software in a more convenient way.
    Not making money could be an indication the software isn't useful, but what if it is? What can the collective do in that zone?
    I imagine one could ask and pay for unwritten software then get a refund if it doesn't materialize before your deadline.
    Why is discovery (of many creation) willingly handed over to a hand full of mega corps?? They seem to think I want to watch and read about Trump and Elon every day.
    Promoting something because it is good is a great example of a good thing that shouldn't pay.
- hliyan 6 months ago
  
  There was an earlier discussion on HN about whether advertising should be more heavily regulated (or even banned outright). I'm starting to wonder whether most of the problems on the Web are negative side effects of the incentives created by ads (including all botnets, except those that enable ransomeware and espionage). Even the current worldwide dopamine addition is driven by apps and content created for engagement, whose entire purpose is ad revenue.
vinnymac 6 months ago

This is especially true for script kiddies, which is why I am so thankful for https://e18e.dev/
AI is making this worse than ever though, I am constantly having to tell devs that their work is failing to meet requirements, because AI is just as bad as a junior dev when it comes to reaching for a dependency. It’s like we need training wheels for the prompts juniors are allowed to write.
bloppe 6 months ago

These are kind of separate issues. Apps using Infatica know that they're selling access to their users' bandwidth. It's intentional.
duskwuff 6 months ago

That may be true but I think you're missing the point here.
The "network sharing" behavior in these SDKs is the sole purpose of the SDK. It isn't being included as a surprise along with some other desirable behavior. What needs to stop is developers including these SDKs as a secondary revenue source in free or ad-supported apps.
- ChrisMarshallNY 6 months ago
  
  > I think you're missing the point here
  Doubt it. This is just one -of many- carrots that are used to entice developers to include dodgy software into their apps.
  The problem is a lot bigger than these libraries. It's an endemic cultural issue. Much more difficult to quantify or fix.
zzo38computer 6 months ago

I agree that there are things with too many dependencies and I try to avoid that. I think it is a good idea to minimize how many dependencies are needed (even indirect dependencies; however, in some cases a dependency is not a specific implementation, and in that case indirect dependencies are less of a problem, although having a good implementation with less indirect dependencies is still beneficial). I may write my own, in many cases. However, another reason for writing my own is because of other kind of problems in the existing programs. Not all problems are malicious; many are just that they do not do what I need, or do too much more than what I need, or both. (However, most of my stuff is C rather than JavaScript; the problem seems to be more severe with JavaScript, but I do not use that much.)
rsedgwick 6 months ago

"Bad actors love the dependency addiction of modern developers"
Brings a new meaning to dependency injection.
- rapind 6 months ago
  
  I mean, as far as patterns go, dependency injection is also quite bad.
  - ironSkillet 6 months ago
    
    I have found that the dependency injection pattern makes it far easier to write clean tests for my code.
  - rjbwork 6 months ago
    
    Elaborate on this please. It seems a great boon in having pushed the OO world towards more functional principles, but I'm willing to hear dissent.
    
    layer8 6 months ago
    
    How is dependency injection more functional?
    My personal beef is that most of the time it acts like hidden global dependencies, and the configuration of those dependencies, along with their lifetimes, becomes harder to understand by not being traceable in the source code.
    
    rjbwork 6 months ago
    
    Dependency injection is just passing your dependencies in as constructor arguments rather than as hidden dependencies that the class itself creates and manages.
    It's equivalent to partial application.
    An uninstantiated class that follows the dependency injection pattern is equivalent to a family of functions with N+Mk arguments, where Mk is the number of parameters in method k.
    Upon instantiation by passing constructor arguments, you've created a family of functions each with a distinct sets of Mk parameters, and N arguments in common.
    
    theteapot 6 months ago
    
    > Dependency injection is just passing your dependencies in as constructor arguments rather than as hidden dependencies that the class itself creates and manages.
    That's the best way to think of it fundamentally. But the main implication of that which is at some point something has to know how to resolve those dependencies - i.e. they can't just be constructed and then injected from magic land. So global cradles/resolvers/containers/injectors/providers (depending on your language and framework) are also typically part and parcel of DI, and that can have some big implications on the structure of your code that some people don't like. Also you can inject functions and methods not just constructors.
    
    rjbwork 6 months ago
    
    That's because those containers are convenient to use. If you don't like using them, you can configure the entire application statically from your program's entry point if you prefer.
    
    layer8 6 months ago
    
    I don't understand what you're describing has to do with dependency injection. See https://news.ycombinator.com/item?id=43740196.
    
    KronisLV 6 months ago
    
    > Dependency injection is just passing your dependencies in as constructor arguments rather than as hidden dependencies that the class itself creates and manages.
    This is all well and good, but you also need a bunch of code that handles resolving those dependencies, which oftentimes ends up being complex and hard to debug and will also cause runtime errors instead of compile time errors, which I find to be more or less unacceptable.
    Edit: to elaborate on this, I’ve seen DI frameworks not be used in “enterprise” projects a grand total of zero times. I’ve done DI directly in personal projects and it was fine, but in most cases you don’t get to make that choice.
    Just last week, when working on a Java project that’s been around for a decade or so, there were issues after migrating it from Spring to Spring Boot - when compiled through the IDE and with the configuration to allow lazy dependency resolution it would work (too many circular dependencies to change the code instead), but when built within a container by Maven that same exact code and configuration would no longer work and injection would fail.
    I’m hoping it’s not one of those weird JDK platform bugs but rather an issue with how the codebase is compiled during the container image build, but the issue is mind boggling. More fun, if you take the .jar that’s built in the IDE and put it in the container, then everything works, otherwise it doesn’t. No compilation warnings, most of the startup is fine, but if you build it in the container, you get a DI runtime error about no lazy resolution being enabled even if you hardcode the setting to be on in Java code: https://docs.spring.io/spring-boot/api/kotlin/spring-boot-pr...
    I’ve also seen similar issues before containers, where locally it would run on Jetty and use Tomcat on server environments, leading to everything compiling and working locally but throwing injection errors on the server.
    What’s more, it’s not like you can (easily) put a breakpoint on whatever is trying to inject the dependencies - after years of Java and Spring I grow more and more convinced that anything that doesn’t generate code that you can inspect directly (e.g. how you can look at a generated MapStruct mapper implementation) is somewhat user hostile and will complicate things. At least modern Spring Boot is good in that more of the configuration is just code, because otherwise good luck debugging why some XML configuration is acting weird.
    In other words, DI can make things more messy due to a bunch of technical factors around how it’s implemented (also good luck reading those stack traces), albeit even in the case of Java something like Dagger feels more sane https://dagger.dev/ despite never really catching on.
    Of course, one could say that circular dependencies or configuration issues are project specific, but given enough time and projects you will almost inevitably get those sorts of headaches. So while the theory of DI is nice, you can’t just have the theory without practice.
    
    hliyan 6 months ago
    
    Inclined to agree. Consider that a singleton dependency is essentially a global, and differs from a traditional global, only in that the reference is kept in a container and supplied magically via a constructor variable. Also consider that constructor calls are now outside the application layer frames of the callstack, in case you want to trace execution.
    
    vbezhenar 6 months ago
    
    Dependency injection is not hidden. It's quite the opposite: dependency injection lists explicitly all the dependencies in a well defined place.
    Hidden dependencies are: untyped context variable; global "service registry", etc. Those are hidden, the only way to find out which dependencies given module has is to carefully read its code and code of all called functions.
    
    kortilla 6 months ago
    
    Because you’re passing functions to call.
    
    layer8 6 months ago
    
    ??? What functions?
    To me it‘s rather anti-functional. Normally, when you instantiate a class, the resulting object’s behavior only depends on the constructor arguments you pass it (= the behavior is purely a function of the arguments). With dependency injection, the object’s behavior may depend on some hidden configuration, and not even inspecting the class’ source code will be able to tell you the source of that bevavior, because there’s only an @Inject annotation without any further information.
    Conversely, when you modify the configuration of which implementation gets injected for which interface type, you potentially modify the behavior of many places in the code (including, potentially, the behavior of dependencies your project may have), without having passed that code any arguments to that effect. A function executing that code suddenly behaves differently, without any indication of that difference at the call site, or traceable from the call site. That’s the opposite of the functional paradigm.
    
    squeaky-clean 6 months ago
    
    > because there’s only an @Inject annotation without any further information
    It sounds like you have a gripe with a particular DI framework and not the idea of Dependency Injection. Because
    > Normally, when you instantiate a class, the resulting object’s behavior only depends on the constructor arguments you pass it (= the behavior is purely a function of the arguments)
    With Dependency Injection this is generally still true, even more so than normal because you're making the constructor's dependencies explicit in the arguments. If you have a class CriticalErrorLogger(), you can't directly tell where it logs to, is it using a flat file or stdout or a network logger? If you instead have a class CriticalErrorLogger(logger *io.writer), then when you create it you know exactly what it's using to log because you had to instantiate it and pass it in.
    Or like Kortilla said, instead of passing in a class or struct you can pass in a function, so using the same example, something like CriticalErrorLogger(fn write)
    
    layer8 6 months ago
    
    I don't quite understand your example, but I don't think the particulars make much of a difference. We can go with the most general description: With dependency injection, you define points in your code where dependencies are injected. The injection point is usually a variable (this includes the case of constructor parameters), whose value (the dependency) will be set by the dependency injection framework. The behavior of the code that reads the variable and hence the injected value will then depend on the specific value that was injected.
    My issue with that is this: From the point of view of the code accessing the injected value (and from the point of view of that code's callers), the value appears like out of thin air. There is no way to trace back from that code where the value came from. Similarly, when defining which value will be injected, it can be difficult to trace all the places where it will be injected.
    In addition, there are often lifetime issues involved, when the injected value is itself a stateful object, or may indirectly depend on mutable, cached, or lazy-initialized, possibly external state. The time when the value's internal state is initialized or modified, or whether or not it is shared between separate injection points, is something that can't be deduced from the source code containing the injection points, but is often relevant for behavior, error handling, and general reasoning about the code.
    All of this makes it more difficult to reason about the injected values, and about the code whose behavior will depend on those values, from looking at the source code.
    
    squeaky-clean 6 months ago
    
    > whose value (the dependency) will be set by the dependency injection framework
    I agree with your definition except for this part, you don't need any framework to do dependency injection. It's simply the idea that instead of having an abstract base class CriticalErrorLogger, with the concrete implementations of StdOutCriticalErrorLogger, FileCriticalErrorLogger, AwsCloudwatchCriticalErrorLogger which bake their dependency into the class design; you instead have a concrete class CriticalErrorLogger(dep *dependency) and create dependency objects externally that implement identical interfaces in different ways. You do text formatting, generating a traceback, etc, and then call dep.write(myFormattedLogString), and the dependency handles whatever that means.
    I agree with you that most DI frameworks are too clever and hide too much, and some forms of DI like setter injection and reflection based injection are instant spaghetti code generators. But things like Constructor Injection or Method Injection are so simple they often feel obvious and not like Dependency Injection even though they are. I love DI, but I hate DI frameworks; I've never seen a benefit except for retrofitting legacy code with DI.
    And yeah it does add the issue or lifetime management. That's an easy place to F things up in your code using DI and requires careful thought in some circumstances. I can't argue against that.
    But DI doesn't need frameworks or magic methods or attributes to work. And there's a lot of situations where DI reduces code duplication, makes refactoring and testing easier, and actually makes code feel less magical than using internal dependencies.
    The basic principle is much simpler than most DI frameworks make it seem. Instead of initializing a dependency internally, receive the dependency in some way. It can be through overly abstracted layers or magic methods, but it can also be as simple as adding an argument to the constructor or a given method that takes a reference to the dependency and uses that.
    edit: made some examples less ambiguous
    
    layer8 6 months ago
    
    The pattern you are describing is what I know as the Strategy pattern [0]. See the example there with the Car class that takes a BrakeBehavior as a constructor parameter [1]. I have no issue with that and use it regularly. The Strategy pattern precedes the notion of dependency injection by around ten years.
    The term Dependency Injection was coined by Martin Fowler with this article: https://martinfowler.com/articles/injection.html. See how it presents the examples in terms of wiring up components from a configuration, and how it concludes with stressing the importance of "the principle of separating service configuration from the use of services within an application". The article also presents constructor injection as only one of several forms of dependency injection.
    That is how everyone understood dependency injection when it became popular 10-20 years ago: A way to customize behavior at the top application/deployment level by configuration, without having to pass arguments around throughout half the code base to the final object that uses them.
    Apparently there has been a divergence of how the term is being understood.
    [0] https://en.wikipedia.org/wiki/Strategy_pattern
    [1] The fact that Car is abstract in the example is immaterial to the pattern, and a bit unfortunate in the Wikipedia article, from a didactic point of view.
    
    squeaky-clean 6 months ago
    
    They're not really exclusive ideas. The Constructor Injection section in Fowler's article is exactly the same as the Strategy pattern. But no one talks about the Strategy pattern anymore, it's all wrapped into the idea of DI and that's what caught on.
    
    layer8 6 months ago
    
    I'm curious, which language/dev communities did you pick this up from? Because I don't think it's universal, certainly not in the Java world.
    DI in Java is almost completely disconnected from what the Strategy pattern is, so it doesn't make sense to use one to refer to the other there.
    
    morsecodist 6 months ago
    
    It was interesting reading this exchange. I have a similar understanding of DI to you. I have never even heard of a DI framework and I have trouble picturing what it would look like. It was interesting to watch you two converge on where the disconnect was.
    
    rjbwork 6 months ago
    
    Usually when people refer to "DI Frameworks" they're referring to Inversion of Control (IoC) containers.
    
    kortilla 6 months ago
    
    > dependency injection is a programming technique in which an object or function receives other objects or functions that it requires, as opposed to creating them internally
    
    naasking 6 months ago
    
    How is the configuration hidden? Presumably you configured the DI container.
    
    rapind 6 months ago
    
    It starts off feeling like a superpower allowing to to change a system's behaviour without changing its code directly. It quickly devolves into a maintenance nightmare though every time I've encountered it.
    I'm talking more specifically about Aspect Oriented Programming though and DI containers in OOP, which seemed pretty clever in theory, but have a lot of issues in reality.
    I take no issues with currying in functional programming.
    
    rjbwork 6 months ago
    
    In terms of aspects I try to keep it limited to already existing framework touch points for things like logging, authentication and configuration loading. I find that writing middleware that you control with declarative attributes can be good for those use cases.
    There are other good uses of it but it absolutely can get out of control, especially if implemented by someone whose just discovered it and wants to use it for everything.
  - mvdtnz 6 months ago
    
    [flagged]

156287745637 6 months ago

AI scrapers and "sneaker bots" are just the tip of the iceberg. Why are all these entities concentrated and metastasizing from just a few superhubs? Why do they look, smell and behave like state-level machinery? If you've researched you'll know exactly what I'm talking about.

Unless complicit, tech leaders (Apple Google Microsoft) have a duty to respond swiftly and decisively. This has been going on far too long.

gpi 6 months ago

"Infatica is partnered with Bitdefender, a global leader in cybersecurity, to protect our SDK users from malicious web traffic and content, including infected URLs, untrusted web pages, fraudulent and phishing links, and more."

That's not good.

topak3000 6 months ago

[flagged]

nothrabannosir 6 months ago

@dang can this entire account be banned please
areyourllySorry 6 months ago

the next time you copy paste an ad remove the trailing quotation mark