MoreRSS

site iconHerman MartinusModify

Creator of the no-nonsense blogging platform, Bear, and a few other things.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Herman Martinus

Digital hygiene: Passwords

2025-08-13 17:25:00

This is part 3 of a 3 part series on digital hygiene. I suggest starting at part 1.

Whenever I watch heist movies, I always roll my eyes at the "hacker" character. They can consistently hack building's camera system; or download the contents of a target's phone for use later in the heist. They also manage to hack the bank, which questions the need for a heist in the first place.

While there are real-world programatic attack vectors that can be exploited, they're generally opportunistic. When a new vulnerability has been discovered, nefarious actors try to exploit it at scale before it’s patched. The chances of finding and executing a "hack" on the spot (via bluetooth or something equally ridiculous) is highly unlikely.

Although, I digress. The most common vulnerability is significantly more boring. It's compromised passwords. These can be stolen through social engineering, like phishing, that exposes account details; but it's also likely exposed through a data leak, where a service hasn't stored passwords securely, and thousands of email+password pairs are stolen. These authentication details are then systematically tested on a bunch of other services in the hopes that some people have re-used their passwords, and thereby gain control over those accounts.

And that brings me to the topic of today's post: Password hygiene.

Leaked or stolen passwords are by far the most effective way to hack an account. And so it is imperative that everyone who uses the internet, which accounts for 93% of people in the developed world, to spend some time ensuring that their accounts and login information are secure.

On Bear Blog, the blogging platform I run, it is interesting to see the frequency with which the Forgot Password flow is used. This is a pretty good indication of the number of people who do not store their passwords properly, since it should never be the case that you've forgotten your password. You should never have to remember your passwords in the first place.

I wonder how many work hours are lost globally due to people following the forgot password flow.

I have hundreds of accounts online, everything from my bank, to a free tool for converting ebooks. If I don't reuse any passwords (which I don't, see above) I'll have hundreds of email+password pairs. I certainly can't remember hundreds of different passwords and match them to the relevant services; and outside of a small subset of people, neither can anyone else.

Before I saw the light and started using a password manager, I used to use a password cipher of my own design. I'd take a string of letters, symbols, and numbers, say !xlk-bd15j-hjk, then replace a certain character with the first letter of the service I was accessing. So for example, if I was trying to access Amazon and the character I'd replace was the 6th one, the password for that service would be !xlk-ad15j-hjk.

This setup isn't very secure (but it is still better than using the same password everywhere). It works until it doesn't. The first issue I ran into with this is that some services had extra password requirements like needing at least one capital letter or a number. The second issue is that this leads to password re-use for all services with the same starting character in their name. And finally, some services do require that you change your password regularly (more on that later), so I'd have to remember which accounts had updated passwords, generally by adding a 1 at the end of it.

It is possible to get extra creative with this, and I did for a while, running a bash script to generate passwords on the fly by taking in the name of the service and hashing it. A storage-less password manager, if you will. But this turned out to be pretty inconvenient, especially since this is a solved problem.

Enter the password manager.

Keeping passwords and 2FA recovery codes safe is easy, you just need to decide on a tool, and stick with it.

There are lots of great password managers out there like Dashlane, 1Password, or Bitwarden. I'm quite partial to Apple's built-in password manager because it syncs between my devices and integrates seamlessly with Apple's biometric authentication, making every login a simple fingerprint scan.

Once you've chosen a password manager, you set a master password. This is the most important password so it is never to be forgotten or written down. I find using a passphrase is both higher entropy, and easier to remember than a password. Here's a classic XKCD comic explaining password entropy.

Now, every time you log into a service or use the forgot password flow, ensure that you put the password into your password manager, or generate a brand new password using the password manager's built-in generator. You'll only need to do this once per service, and from then on you can use the password manager to login to that service. Another reason I like Apple's password ecosystem is that a lot of this is done by default, without having to manually copy and paste passwords. Password managers do have browser extensions and mobile apps to make this easier across devices. Use them.

Your password manager will also generally alert you of password re-use. If the password has been used multiple times, I'd recommend going and updating all of the accounts. The best way to think about this is that at some point the password will be leaked. Which accounts are you comfortable having compromised? Naturally something like banking or email needs to be updated as a priority, but if it's for a background removal tool...actually, still update it. Why not?

Let's talk about 2-factor authentication (2FA), also known as multi-factor authentication (MFA). While there is a slight difference between 2FA and MFA (hint: it's the number of factors), I'll be using them interchangeably here.

MFA is a security measure to prevent access to an account where the login details have been compromised. Generally if you have good password hygiene and are vigilant about phishing attacks, this is unlikely. However, for high priority accounts it is a necessary security step.

SMS 2FA tends to get a lot of hate, justifiably, due to sim-swap attacks. However, the reason many retail services (like banks) still use SMS instead of TOTP authentication is due to retail customers not having good recovery code storage and backup. If you use Google Authenticator or a similar tool, and do not back up your codes, losing your device is an effective way to lock yourself out of your account. Banks rely on the assumption that you'll reclaim your mobile number, whereas the same cannot be said about lost TOTP recovery codes.

That all being said, if you have the option to use a TOTP authentication code instead of SMS or email 2FA, I highly recommend you do that. You'll just need to ensure you've backed up your recovery codes.

I'm going to say something quite controversial here: I think it's okay to back up your recovery codes in your password manager.

While it does mean that if your password manager is compromised, then all of your accounts (including the ones protected by MFA) are exposed, MFA is generally there to protect against compromised login details and not against a compromised password manager. If your password manager is hacked...I'm sorry. You're going to have a tough time.

At the end of the day, the best tools are the tools you use. I like how Apple's 2FA codes also populate with biometric authentication, removing the need for me to go and find my phone (which I generally leave in another room while I'm working).

Some side notes:

  • Changing passwords regularly, especially as a requirement, leads to worse and not better security. Mostly because users don't use password managers correctly, and end up defaulting to a rotation of memorised passwords. The act of changing a password is also a well known phishing attack vector.
  • Ensure you take a regular (encrypted!) backups of your passwords to store offsite, just in case you lose access to your password manager, however unlikely.
  • Hardware authentication devices are neat, but most people don't work on systems important enough to warrant that level of security. There will always be a trade-off between usability and security, and more security isn't always better. I once misplaced a Yubikey and all of the accounts I used it on had TOTP authentication as a redundancy, so I guess it didn't add much extra security?
  • Developers, please stop logging people out of their accounts! There is very little to gain from having short sessions. It's annoying and leads to users forgetting and needing to recover their passwords more often.

As frustrating as it is, it's up to us developers to account for human folly and bad password hygiene. I'd love to create a webservice that only has a username and password with no need for an email address. But I know that I'll receive regular emails asking about account recovery due to a lost or forgotten password.

tldr; Get a password manager, and use it exclusively. Don't try to remember passwords. It's easier and more secure this way. Having good password hygiene makes you significantly less likely to wake up one day with your bank account drained.

As the old joke goes: If you're running away from a bear, you don't need to be faster than a bear, just faster than everyone else.

Digital hygiene: Notifications

2025-08-06 17:05:00

This is part 2 of a 3 part series on digital hygiene. I suggest starting at part 1.

Over the past few years I've cultivated a decent relationship with my phone. Not a good one, mind you, but one I'm fairly comfortable with. There is a part of me that yearns for a return to simple, black-and-white phones, with Internet access limited to whichever room in the house had the phone line and computer. But there's no going back; and so I had to find a way to live with the Internet (and the hyper-connectivity it entails) in my pocket.

Developing a good relationship to your phone is an intentional process. It doesn't happen by accident. All apps and media, by design, are fighting for your attention. I've heard the term "attention economy" thrown around, and I feel like it's an apt description of the battle for our increasingly fractured attentions.

And the easiest way to grab your attention is via notifications.

Sometimes I see a person's phone covered in notifications and I get anxiety-by-proxy. Red badges in the triple-digits; the notification bar an endless list of banners, messages, friend requests, and marketing content. I can't imagine this is a pleasant experience, but it seems to be the norm.

In my opinion, notifications need to be reeled in as a priority. At the end of the day my phone is a tool. I want to choose how to use it. I don't want it to "keep me engaged" or sell me things. I want to own my own time, and have full control of my attention.

A more utilitarian reason to get notifications under control is that when all notifications are active, none of them are. When I use the Reminders app on my iPhone I actually want to be reminded of something, instead of that notification being buried beneath unimportant stuff.

Here's the method I used for breaking free of notification hell (you'll notice a lot of overlap with my previous post on emails):

1. Remove social media apps (or completely mute them at the very least)

I don't have traditional social media (think Instagram, Facebook, Twitter, or LinkedIn). I've written about it before. But in a nutshell, these apps consume my time and energy without giving me much value in return. Instead I try to nurture in-person relationships, or use longer-form digital communication, like calls or email.

Regardless of my personal preferences around having social media, it goes without saying that those apps should, at the very least, be muted. No banners, no badges, no sounds. It should be up to you when to engage with these platforms; because if left up to them you would never log off. Another way to manage this is to put social media apps on a separate device, like an iPad left at home, which takes this pernicious time-suck out of your pocket.

If you're trying to smoke less, don't carry a box of cigarettes around with you.

2. Opt-in instead of opt-out notifications

A simple but effective way of cleaning up phone notifications is to go and turn them all off, then selectively turn on the ones you actually need. The idea is that all notifications should be opt-in, instead of opt-out. Notifications should also be set to the least-intrusive method, depending on the application. For example, here are my only notifications on my phone:

  • Messaging apps (Telegram, WhatsApp, Messages)
    • ✅ Badges
    • ❌ Banners
    • ❌ Sounds
  • Phone calls
    • ❌ Badges
    • ✅ Banners
    • ✅ Sounds
  • Calendar and reminders
    • ✅ Badges
    • ✅ Banners
    • ✅ Sounds
  • Uptime monitor
    • ❌ Badges
    • ✅ Banners
    • ✅ Sounds
  • RSS reader
    • ✅ Badges
    • ❌ Banners
    • ❌ Sounds

Everything else is left turned off, since most things aren't time sensitive. I used to have Uber's notifications turned on, since I didn't want to miss my ride, but found that Uber doesn't respect marketing opt-out and would send me "special offers" that were impossible to turn off. Now I just make sure I don't forget that I've ordered an Uber.

With group-chats on messaging apps (which can become overwhelming), I mute all of the ones that have a lot of noise and archive them; checking them every now and then.

3. Managing sounds

Sounds and vibrations are the worst kinds of notifications since they grab your attention even when not using your device. Because of this you'll notice that only calls, calendar and reminders, and uptime monitors have sounds enabled, since these are the time sensitive ones. But even then I still have sleep mode active after 7pm, so only my uptime monitor and repeat phone calls get through.

4. Report telemarketers and robo-calls

In South Africa we have a public National Opt-Out Register (you may have something like this in your jurisdiction). This can be used by companies to determine if you're open to direct marketing communications. When I receive a marketing call from a company, if it's a human I politely ask them to remove my number from their marketing list. If I receive another call from that company (or any robo-call) I report them to the Information Regulator of South Africa for processing my personal importation without my consent, as well as not respecting the National Opt-Out Register. I then leave a public review for that company stating that they've broken the law by contacting me.

I'm very careful to never opt-in to any marketing communications, so I can say with certainty that any direct marketing I receive is definitively against privacy legislation where I live.

This has proven to be very effective. I have not received a single robo-call or direct marketing call in the past few months. It may seem like a lot of work up front, but it pays dividends since I never get pulled out of whatever I'm doing just to answer a call from a company I don't care about. It's also punishment for them trying to advertise to me in the privacy of my own home. That feels like crossing a boundary.

All of this applies to the computer as well. I don't allow any banners to pop-up on my computer, which easily pull me out of work. Slack only shows a red small badges (sans the number) to let me know there's an unread message, and even then the sidebar on my Mac is hidden by default. Your mileage may vary, depending on the work you do, but protecting deep work is important. At least to me.

So why all of this effort? I try to live an intentional and present life. I want to be here right now. Technology isn't going to regress back to the 90s and so we need to cultivate good relationships with our devices, so we can cultivate a good relationship with ourselves and the people around us.

Take back your attention.

Digital hygiene: Emails

2025-07-01 18:50:00

This is part 1 of a 3 (or 4, I haven't decided yet) part series on digital hygiene.

Email is, arguably, the backbone of the modern internet. Not only is it a means of communication, but is the de-facto identity for operating online. In this way, email isn't just how I communicate, but who I am online. Yes, some services still operate with usernames and passwords; but the vast majority of services use email as user identity. This arguably makes email the most important online account. Everything else relies on email.

Email is also where I do most of my work. From technical support, to replying to friendly emails, to receiving invoices; it is the workspace through which my occupation operates. And for all of these reasons, my email is well organised and easy to use.

People regularly comment on how quickly and personally I respond to their emails, and it's because they're generally only one of a few emails in my inbox. This isn't because I just naturally don't receive emails (I run two B2C web-services!). Instead it is because I am very active in maintaining a clean workspace. In the same way a carpenter keeps his tools neat and tidy; or a barista cleans his equipment and the counter after every coffee brewed; I put away my emails and wipe down my inbox after every use.

What's fairly interesting, though, is that people assume this is difficult. But it's not. Once I started keeping a clean inbox I actually had significantly less work, since every email I received actually warranted attention. The important ones weren't buried beneath a heap of newsletters, spam, receipts, and all the other cruft that can clog up the workspace.

Here's how I do it:

To kick things off, I have 2 email addresses. The first one is the one that I use as my identity. This is a gmail address that I've had since high school, and I use it to sign up to online services, fill in forms, and all the other things that require an online identity. I don't bother with email aliases since it just makes my identity harder to control. I respect people who do this to track data-leaks, but I couldn't be bothered.

The second email address is my conversational address. This is where people can contact me, whether it be for support or to just say hi. I don't have any web-services or online identities associated with this email address, and so every email I receive here is from a person.

My email is also not a place for adverts and marketing (we'll get to this later), or a place where I read newsletters. This is a place for work and communication. For newsletters I use an RSS reader (Reeder is my choice of client). If a website or newsletter doesn't support RSS (which is very rare) and I really want to receive updates I use Kill the Newsletter, which creates an RSS feed of received emails. This could also be another email account specifically for newsletters, if RSS isn't to your liking

This is one of the most important parts of my email strategy. My inbox isn't a place for leisurely reading. When I open my email it's with purpose. If I want to catch up on my newsletters and blogs I follow, I can flop down on the couch, open my RSS reader, and enjoy them when I'm not also trying to work.

Since my first email address is used for signing up to websites, apps, and all the rest; it inevitably receives marketing emails (even though I religiously never check that checkbox). Whenever I receive a marketing newsletter I always hit unsubscribe. I'm not interested. If I receive another email from that company I report their email as spam. This is a non-negotiable. Companies that disregard unsubscribes should be penalised, and the only way to do this is to make their email deliverability metrics slightly worse. Maybe they'll learn. Probably not.

When a new email enters my inbox I explicitly act on it. Every single email is attended to like this:

  1. If it warrants a reply, reply to it or act on the information.
  2. If it requires action or mulling over I either create an item on my todo list, or snooze the email for later.
  3. If it is a marketing email or a newsletter I unsubscribe, or mark as spam and block if they persist.
  4. If it is a receipt for a subscription, or any other recurring email that I can't block, I set up a filter to auto-archive those emails where I can find them if needed.
  5. Finally archive it. I archive all of my emails once they're completed, so the inbox only has unread and unattended emails. You can start this by simply selecting all of your emails and archiving them immediately, then follow the above steps going forward.

And I just keep doing that. I found that over time, once the cruft has been unsubscribed, filtered, or moved elsewhere; the only emails that hit my inbox are important ones that require my attention. Additionally, I only receive between 5 and 15 emails a day, and they aren't buried and require significantly less cognitive load to address.

Naturally everyone's workflow will look different based on the work that they do and personal preference. However I think it is universal to say being active about email creates a better experience for you and the people and services you interact with. This is just how I like to do it.

It's also possible to control what kinds of emails you receive. If you take a look at my contact page, you'll notice that it's intentionally formatted, and tweaked regularly. I have a big picture of my face to remind people that they're interacting with a human being (this is particularly useful for support requests). It then provides easy links to the most frequently requested resources. I specify my working hours, establishing that people will need to be patient when waiting for a response from me, especially over weekends (this is also particularly useful for support requests). And then finally my email is at the bottom in non-copy-and-paste (and maybe anti-bot) format.

I encourage people to randomly email me. Especially if it's to discuss a post of mine, invite me for a coffee, or just open a line of communication. The page is set up to point people in the right direction when looking for information, help them quickly resolve their queries, and to remind them that I'm not a nameless customer support agent.

I am quite privileged to decide what emails are important, since I work for myself. However, even if you receive a lot of email that can't be filtered out, having a system around what you can archive or unsubscribe from will inevitably make life easier.

Email is a great tool when used well. It is a place of slow(er) communication, and for some a place for connection. In many ways it is an extension of oneself. I like to keep it tidy.

Nerding out about heaters

2025-05-28 18:05:00

The weather in Cape Town is slowly descending into a cold, wet winter (except for today, which is a beautiful). I've brought the heater up from the garage, doubled up the duvets on the bed, and have started wearing long pyjamas to sleep.

Generally at this time of year Emma and I are making plans to head somewhere warm, but this year we've decided to stick it out in South Africa with a month-long road-trip to the North West for some proper time in the bushveld, and then a wedding in Joburg.

For people not from South Africa, it does get cold here. Not in the same way it gets cold in Canada or Northern Europe, but in some ways it gets colder. Let me explain:

In Canada the Canadians know that it's going to get very, very cold. Unbelievably cold. Because of this they do a whole bunch of reasonable things like insulate their homes, ensure they have a robust heating system, and purchase proper winter attire. But not in South Africa. Our weather is great most of the year, with only about 2 to 3 months of cold weather.

It can get down to 5 degrees Celcius on a cold day; and while that may not seem cold to people where the weather gets to -20, we're always completely unprepared for it. Because of our good weather the rest of the year, we forget. We forget to insulate our houses. We forget to get winter sheets. And so when the time comes even the insides of our homes are cold.

I've had Polish and German friends suffer in the South African winter due to a lack of good heating and insulation. And that brings me to my topic of interest today: Heaters.

There are so many different kinds of heaters on the market. Everything from wood fireplaces to under-floor hot water heating. Now, unless you're lucky and your house or apartment has built in heating or a fireplace of some sort, you're stuck purchasing a heater from a store (or layering up, or both).

At the end of the day there are only have 3 different kinds of heating (for the purposes of this write-up):

Conduction - where you physically touch the heated surface and that heat is transferred to you via the conductivity of the surface and the body in contact.

Radiation - where infrared light is radiated from a heating element and absorbed by our skin (or the surrounding objects) over a distance.

Convection - where the heated surface warms the surrounding air, which in turn warms us. While convection involves conduction at a molecular level (heat transfer between particles), it’s distinct from conduction because it relies on the movement of a fluid (air in this case).

Most heaters rely on a combination of these mechanisms. Wood fireplaces predominately rely on radiation (infrared from the fire) and convection (warming up the surrounding air). Under-floor heating relies on conduction (direct contact with the floor) and some convection, although this isn't the main mechanism. Electric quartz heaters (those pretty glowing bars) are predominately radiation, and the oil-filled space heaters are predominately convection.

Store bought heaters come in a variety of shapes, sizes, and price points. But I noticed that by-and-large they all have the same max wattage of 2000 watts (I believe in the US it's 1500 watts due to their 120 volt power supply). Now, theoretically energy is energy is energy. Therefore, if you have a sealed room each one of these heaters will emit the same amount of heat into the room at the same rate, regardless of whether this is an expensive heater or one of those cheapie blower types you put under your desk. And since every 2000 watt heater will cause the same amount of heating and draw the same amount of power, then the mechanism of action shouldn't be that important. Or is it?

We have 3 heaters in our apartment. A quartz bar heater, a fin space heater, and an electric blanket. And they each have their own strengths and weaknesses.

The quartz bar heater is by far my favourite. As soon as I turn it on it starts radiating a gentle warmth directly onto me. This is lovely since I don't sleep with the heater on, so when the apartment is cold in the morning I can sit in front of it with my coffee and be all toasty without having to warm up the entire space. It also warms up the couch I'm sitting on, which provides conductive heating as well. Warming up the entire space is also not easily (or cheaply) done due to South African building's lack of insulation.

Bar heater

Left on for long enough and it does gradually increase the temperature of the room, since that same number of wattage is being radiated into the space where it's absorbed by the floor and furniture, heating them up. Those objects inevitably heat up the air around them, and boom! You have convection.

The fin space heater is no-where near as nice. Due to it relying on convection to heat up the entire space it isn't very efficient (again, no insulation). Instead we use this exclusively in the home office which is a smaller space, has slightly better insulation, and a carpet which I wouldn't like to rest the bar heater on due to the localised heat.

It is a great example of how doing something useful with the heat first is important. With the bar heater it warms me and the objects I interact with up before trying to warm up the surrounding air. Again, in the end it's 2000 watts regardless of method, so may as well make use of that energy locally before trying to warm up the room.

My third heater is the electric blanket. If I put this on 10 minutes before getting into bed it warms it up so well that it's such a toasty pleasure to get into. This is also by far the most efficient form of heating since it is insulated by default, and relies entirely on conduction. It warms up the specific area that I will inhabit without losing much heat to the surrounding area. Due to this the power draw of an electric blanket is negligible compared to actual space heaters (about 7-10%). This is a great example of super localised heat. There's no way that I'm going to use my electric blanket to heat up the apartment.

I grew up with gas heaters (and briefly an anthracite stove heater) and have spent a decent amount of time with wood-fired stoves and fireplaces, and I can honestly say that electricity is just so much subjectively better. No having to leave the window open to vent the carbon dioxide (meaning more gas needs to be burnt to compensate). No ash coating surfaces around the fireplace, no smell of smoke in the house. No having to constantly replenish stocks of wood or haul around gas canisters.

Also, being able to flick the bar heater on and off as needed is great when compared to starting and maintaining a fire in a wood-burning stove. I only spend 45 minutes having coffee before heading to gym in the morning and really couldn't be bothered to make a fire. I'm also a fan of consistent heat instead of the fluctuations of burning hydrocarbons. But that's personal preference and relegated to indoors. I love a good campfire and the ambiance of fire in general.

There are many other heater designs, but when narrowed down to wattage and mechanism, it helps me reason about it better. I guess what can be taken away from this writeup is that if you're stuck deciding between heaters, just know that they'll probably all heat up your space at the same rate, but finding a way to use that heat locally first is good practise. Also, save your money. Expensive heaters are not better by default.

Yes, I will have coffee with you

2025-04-14 19:49:11

Every now and then a reader of my blog finds themselves in Cape Town and reaches out to me to ask for recommendations, or grab a coffee, or just say hi. And I love it. It's opened up a new avenue for meeting interesting people and potential friends. I'm quite far away from my reader-base, which tends to be predominately in the US, Western Europe, and South East Asia. So this happens very infrequently, but when it does, you can be sure that I'm keen.

There's also the secondary benefit of meeting people who use Bear Blog: It gives me an opportunity to solicit feedback, get a general vibe of what works and what doesn't, and to keep a finger on the pulse of the project. It allows me to talk shop with someone who understands what it is I do. One of the downsides of being working solo is that it can get occupationally lonely. My friends know what Bear is, but very few people actively blog.

So with all that being said, this post is an open-invitation. If you're in Cape Town, either living here or passing through: Yes. I will have coffee with you.

The Great Scrape

2025-03-26 17:02:00

LLMs feed on data. Vast quantities of text are needed to train these models, which are in turn receiving valuations in the billions. This data is scraped from the broader internet, from blogs, websites, and forums, without the author's permission and all content being opt-in by default.

Needless to say, this is unethical. But as Meta has proven, it's much easier to ask for forgiveness than permission. It is unlikely they will be ordered to "un-train" their next generation models due to some copyright complaints.

I wish the problem ended with the violation of consent for how our writing is used. But there's another, more immediate problem: The actual scraping.

These companies are racing to create the next big LLM, and in order to do that they need more and more novel data with which to train these models. This incentivises these companies to ruthlessly scrape every corner of the internet for any bit of new data to feed the machine. Unfortunately these scrapers are terrible netizens and have been taking down site-after-site in an unintentional wide-spread DDoS attack.

Over the past 6 months Bear, and every other content host on the internet, has been affected. Both Sourcehut and LWN have written about their difficulties in holding back the scourge of AI scrapers. This seems to be happening to big and small players alike. Self-hosted bloggers have had to figure out rate-limiting and CDNs too, which is pretty unfair for someone who just wants to write on the internet.

Bear is hit daily by bot networks requesting tens of thousands of pages in short time periods, and while I now have systems in place to prevent it actually taking down the server, when it started happening a few months ago it certainly had an impact on performance.

This is a difficult problem to solve, due to the way that these scrapers are designed. The first is that only a small portion of these scrapers identify themselves as such. These are all blocked at the WAF (Web Application Firewall) level and never reach any Bear blogs (about 500,000 requests have been blocked in the last 24 hours). However the vast majority of scrapers identify themselves as regular web-browsers, and use multiple servers and IP addresses, making all of the usual tools like rate-limiting and user-agent parsing obsolete. Not to mention that they all completely ignore robots.txt and other self-regulation rules.

One of the mitigation options is to add a challenge to every single page (like Cloudflare's managed challenge), but this is an unpleasant user-experience and blocks bots that are actually welcome, such as search engine crawlers. So while it is possible to mitigate all bot traffic, that would effectively make all blogs non-searchable on all the major search engines. Some of the LLM scrapers cheekily identify themselves as Googlebot or Yandexbot as well. This option would also affect anyone who runs scripts on their own site for backups or custom automations. Not ideal.

I've had to remove RSS subscriber analytics since I can't mitigate bots very well on RSS feeds which are explicitly designed for bots. This influx has caused the RSS analytics to be completely wrong, and it felt better to remove it than to display incorrect information.

As of right now I have several strategies in place to combat this deluge that are working well. If you're a service provider or sysadmin being negatively impacted by these scrapers, contact me and I'd be happy to show you what's worked for me.

Right now everything is under control on Bear. Over the past month bots have only managed to impact performance on Bear once, and that endpoint has since been protected. I've added significantly more active monitoring, and any time I see a spike of requests I find a common pattern, block it, and monitor whether it has affected any real users.

Thankfully, none of these scrapers render CSS, and therefore don't get logged as visitors on Bear's analytics.

The best case scenario is that the AI companies find another way to train their models without ruthlessly slashing and burning the internet. However, I doubt this will happen. Instead I see it getting worse before it gets better. More tools are being released to combat this, one interesting tool from Cloudflare is the AI Labyrinth which traps AI scrapers that ignore robots.txt in a never-ending maze of no-follow links. This is how the arms race begins.

And I'm ready for it. Let's fight this exploitation of the commons.