Links are one of the core features of the web. We use them to browse websites and to refer to content hosted somewhere else. The problem with links is that they might stop working at any time. Websites change and die, content is moved, modified and deleted, services introduce paywalls and login pages, laws make sites inaccessible. This is usually referred to as “link rot”.
Just a few hours ago Twitter decided to put all tweets[a] behind a login wall. This change might not be permanent if we are to believe a tweet from the owner (and of course you need an account to read it), but just like that, millions of links shared over the years, bookmarks, and open tabs no longer work as expected. And some of those links are important.
Popular services have been doing similar things for years. For example, Facebook and Instagram redirect users they don’t like (IP, browser, etc) to a login page independently of the content you’re trying to access (could be a meme or some announcement from your government). Reddit started hiding content from mobile users, requiring them to login or install their app (even though the content is right there behind the popup). Imgur, a very popular image hosting service, now has a problem with hosting images, started deleting content, and redirects users accessing images to pages with ads and trackers. Google tried very hard to create a Facebook competitor with Google+, but the service closed and a lot of content was lost.
This affects embedded content too. On top of the privacy problems of adding external content to pages, if the content isn’t really on the page, it might disappear, change or be put behind some wall. For some content this is not a big deal, but sometimes it is. For example, some news websites embed public posts from politicians. What if the post is removed? Did the person ever said what the site claims they did? Or what if the post is updated to say something else?
How to mitigate this?
There’s nothing we can do to stop links we don’t control from breaking, but we can duplicate the content so it exists in more than one place. We can:
- Take screenshots of the page/content.
- Archive the content on services like the Wayback Machine and Archive.today.
- Provide different sources.
- Keep our own copy of the content.
And so on.
A few examples:
» If you’re on social media and want to make a comment about something <famous person> said, instead of quoting their post, take a screenshot and post the text + link to the original post + screenshot.
» When providing sources for something, link to at least 2 different sites and/or to the archived page. Avoid URL shorteners or special links, link directly to the content.
» If you’re writing an article and want to add a social media post, instead of copying the code the social media platform gives you, link to the source and add a screenshot of the content instead. This doesn’t work well for video content, but it’s fine for text and images.
» If you publish content, can you host it on multiple platforms? For example, if you have a podcast or create music, maybe you can post it on Sound Cloud, YouTube, and archive.org? Same for video. For written content, you can easily archive it on the sites mentioned above.
» If you can host the content yourself and are committed to keep it online for a long time, maybe you can do that instead of using a 3rd party service? For example, I hosted the screenshot of the tweet mentioned at the start of this post because I wanted to provide a mirror for those who couldn’t read the tweet. And yes, the image is also on the Wayback Machine.
» Do you use Dropbox, Google Drive, etc, to share files publicly? These services have different limits and restrict access to popular content. They may also stop working in a few years. Can you host the content on at least two different places?
» Let’s say you want to create a tutorial with links and images and post it on a forum. When adding a link, make sure you also save it on one of the archiving sites mentioned above. This ensures that 1) there’s a copy of the content somewhere else and 2) that you can always view the content as it was when you linked to it. For images, if the forum allows you to upload files, perhaps you could use it to host the images and then have a copy on some image hosting website or cloud service? You can then link to the copies by having a [mirror] link next to the original link and link to the backup image by adding a “full size” link (or something like that). And then you can archive your own post so there’s a copy of it if the forum goes down.
» Keep a copy of important content. It can be a folder on your computer or on some cloud service. Keep a copy of important files you’ve shared online. For 3rd party text content, you can use “read it later” services like Pockit or Instapaper. You can also save screenshots or screen recordings. Browsers like Firefox or Chrome let you take full page screenshots. There are also extensions like SingleFile that save pages as a single HTML file with all images embedded. If you’re using a phone, screenshots and screen recordings are also easy to do.
» If you manage a website/platform, can you (and would it be acceptable to) fix broken URLs? For example, upgrade old images served via http to https or redirect links/images that no longer work to the Internet Archive’s Wayback Machine?
Me trying to do this:
I blogged about the 2022 Russian Invasion of Ukraine a few months ago. I wanted to have a record of how I followed the first days of the war with enough context so those reading it now or 10 years from now could understand why I had certain opinions or was reaching certain conclusions.
Since I host this site myself, I don’t have the same limitations as someone using social media sites, so I went beyond what most people can do:
- Web pages: I was going to use the Wayback Machine copy for the news articles, but they sometimes hide sites/content when asked, so I decided to host pages myself. Between a screenshot and the full (HTML) page, I decided to host the page. If you click on the [a] (“a” for “archived”) link, you’ll see the copy hosted by me.
- Social media content: A lot of the content was being posted on Reddit and Twitter. I went with a mix of HTML pages and full or partial screenshots (in the context of this post, the Twitter screenshots are very useful now).
- The “nasty” content: Some content doesn’t stay online for a long time. Sometimes people’s sensibilities change, sometimes the platform hosting the content want to get rid of it because of money reasons. I hosted the videos myself since I could.
- Quotes: I copied the relevant paragraph when quoting something. The links to the source and to the copy are available, but if that stops working, the text – which is important for my post – is still there.
By doing this, I can be fairly sure that at least one copy of that context will be there for as long the post remains online. I even saved Wikipedia pages in case they are updated. And of course, the post and every file and external links are also on the Wayback Machine.
Don’t add to the problem!
Once upon a time, I had the “brilliant” idea to create an image hosting site and a URL shortening site. They were mostly for me, but anyone could use them. There was no revenue, but I thought it was a “cool thing” to do, so I did it. Well, the image hosting site quickly hit the bandwidth limit of my shared hosting plan. The URL shortener didn’t use many resources, but someone started using it to hide phishing pages. Long story short, they didn’t last long.
“With great power comes great responsibility”, so if you decide to create a website or host content people use, take some time to think about it. I know you want to do it, but can you do it reliably and for a long time? Can you do it cheaply so you don’t have to shut it down if you lose your job or if you don’t hit your donations target a few months in a row?
If you’re already running a site, try not to break old links. Avoid changing domains and if you do, keep the old domain for a few years and redirect users to the new one. You should also avoid changing the path to files/pages as someone might have linked to them. If the URL changes, make sure there’s a redirect in place and that it will continue to work even when you decide make new changes. If you have to shut the site down, try to archive the content somewhere else so it doesn’t get completely lost.
We all make mistakes. Those two sites I created? I created them because I could, but I shouldn’t have. I’ve learned that lesson and when I stated using this site to host content that other people expect to be here, I did it with the knowledge that it wouldn’t be a simple personal blog anymore. I have redirects that need to work for a long time and free/cheap alternatives to host the files if, for some reason, I’m unable to continue hosting them here.
Essentially, we should all try to be better netizens[a]. You will break links, but you can try to avoid doing it. If you must break stuff, at least try to minimise the disruption it may cause. And please, don’t redirect links everyone expects to be public to a login page.