Downloading from Wayback: Difference between revisions

From Flashpoint Datahub
Jump to navigation Jump to search
No edit summary
 
(12 intermediate revisions by 3 users not shown)
Line 1: Line 1:
This page describes how to download web game/animation files from Wayback Machine. Once the files are downloaded, they can be curated using Flashpoint Core.
This page describes how to download web game/animation files from Wayback Machine. Once the files are downloaded, they can be curated using Flashpoint Infinity.


==Understanding Wayback Machine Capture URLs==
==Understanding Wayback Machine Capture URLs==


Let's break down this example of a typical Wayback Machine capture URL:
Let's break down this example of a typical Wayback Machine capture URL:
<code>https://web.archive.org/web/20150118221400if_/http://www.teagames.com/games/crazygolf2</code>
https://web.archive.org/web/20150118221400/http://www.teagames.com/games/crazygolf2


First we have the Wayback Machine '''base URL''': <code><nowiki>https://web.archive.org/web/</nowiki></code>. This is present at the beginning of all Wayback Machine capture URLs.
First we have the Wayback Machine '''base URL''': <code><nowiki>https://web.archive.org/web/</nowiki></code>. This is present at the beginning of all Wayback Machine capture URLs.


Next is the '''date code''': <code>20150118221400</code>. This indicates the year (2015), the month (January), etc that the capture was taken. Wayback Machine automatically finds the closest capture to any date code you give it. Here are some examples:
Next is the '''date code''': <code>20150118221400</code>. This indicates the date (2015-01-18) and the time (22:14:00) that the capture was taken. Wayback Machine automatically finds the closest capture to any date code you give it. It treats any incomplete date code as a range and searches for the closest capture to the ''end'' of that range. Here are some examples:
* <code>https://web.archive.org/web/2/http://example.com/</code> will load the ''latest capture'' of example.com.
* <code>https://web.archive.org/web/2/http://example.com/</code> will load the ''latest capture'' of example.com.
* <code>https://web.archive.org/web/0/http://example.com/</code> will load the ''earliest capture'' of example.com.
* <code>https://web.archive.org/web/0/http://example.com/</code> will load the ''earliest capture'' of example.com.
Line 14: Line 14:
* <code>https://web.archive.org/web/2005/http://example.com/</code> will load the ''capture closest to the end of the year 2005''.
* <code>https://web.archive.org/web/2005/http://example.com/</code> will load the ''capture closest to the end of the year 2005''.


Next is the '''modifier suffix''': <code>if_</code>. This can be appended to the date code of a capture URL to change Wayback's behavior. For example, appending <code>if_</code> will remove the Wayback toolbar from the top of the archived page.
The last part of the capture URL is the URL from the original site. In this case it is <code><nowiki>http://www.teagames.com/games/crazygolf2</nowiki></code>.  
* The most useful suffix is the <code>id_</code> suffix. This allows you to download the original, unmodified file! When downloading files for curating in Flashpoint, you should always use this modifier.
* The original URL (not the full capture URL!) will always be your launch command in Flashpoint.


Finally, the last part of the capture URL is the URL from the original site. In this case it is <code><nowiki>http://www.teagames.com/games/crazygolf2</nowiki></code>.
===Downloading the Original Files===
* The original URL (not the full capture URL!) will always be your launch command in Flashpoint.
 
By default, Wayback rewrites links and inserts a toolbar at the top of all archived pages. But when curating for Flashpoint, we want the original, unmodified files! To get those, you must add a '''modifier suffix''' to the end of the capture's date code. Wayback supports several suffixes, but the one you should use is '''<code>id_</code>''', which means "identical" (to the original). Here is the capture URL from before, with the <code>id_</code> suffix added:
https://web.archive.org/web/20150118221400id_/http://www.teagames.com/games/crazygolf2
Notice that the Wayback toolbar no longer appears. When downloading files for Flashpoint curations, you should always use the <code>id_</code> suffix!


==Searching Wayback Machine for Captures==
==Searching Wayback Machine for Captures==
Line 64: Line 67:
===Setting Up Wayback Download Mode===  
===Setting Up Wayback Download Mode===  


Wayback Download Mode is compatible with Flashpoint Core 9 and higher. Assuming a compatible version of Core is installed, here are the steps to activate this mode:
Wayback Download Mode is compatible with Flashpoint Core 9 and beyond. Here are the steps to activate this mode:
# Download the [https://cdn.discordapp.com/attachments/569426826691346444/800570858568810536/WaybackDownloadPatch_Core_9.7z] and extract it into your Flashpoint folder. Replace files when prompted.
# Download the [https://cdn.discordapp.com/attachments/496132309498724391/1118021954276962314/WaybackDownloadPatch_Core11.zip Wayback Download Patch] and extract it into your Flashpoint folder. Replace files when prompted.
## Core 9 and 10 use [https://cdn.discordapp.com/attachments/496132309498724391/845203473070948372/WaybackDownloadPatch_Core10.zip this legacy patch] instead.  
# Restart Flashpoint Launcher, then switch to the Config tab.
# Restart Flashpoint Launcher, then switch to the Config tab.
# Click the "Server" dropdown menu and switch to "Wayback Download Mode." <br> [[File:WaybackDownloadMode.png]]
# Click the "Server" dropdown menu and switch to "Wayback Download Mode." <br> [[File:WaybackDownloadMode.png]]
Line 83: Line 87:
===Turning Off Wayback Download Mode===
===Turning Off Wayback Download Mode===


After you're done downloading from Wayback, you will need to turn it off to use Flashpoint Core normally. To do this, click the "Config" tab of Flashpoint Launcher, then switch back to "Apache Webserver" using the "Server" dropdown menu.
After you're done downloading from Wayback, you will need to turn it off to use Flashpoint Core normally. To do this, click the "Config" tab of Flashpoint Launcher, then switch back to "Legacy Webserver" ("Apache Webserver" for Core 9 and 10) using the "Server" dropdown menu.


==Using cURLsDownloader==
==Using cURLsDownloader==
Line 98: Line 102:
# That's it! Save the text file, then refer to the [[Curation_Tutorial|Curation Tutorial]] and the cURLsDownloader Manual to complete your curation.
# That's it! Save the text file, then refer to the [[Curation_Tutorial|Curation Tutorial]] and the cURLsDownloader Manual to complete your curation.
#* After the files are finished downloading, cURLsDownloader will give you the option to automatically move the downloaded files to their original URLs. Be sure to take advantage of this option!
#* After the files are finished downloading, cURLsDownloader will give you the option to automatically move the downloaded files to their original URLs. Be sure to take advantage of this option!
==Using Wayback Machine Downloader==
Wayback Machine Downloader is a command line tool that can automatically download whole websites from the Wayback Machine. You can also use different filters to customize exactly which files are downloaded. [https://github.com/hartator/wayback-machine-downloader#wayback-machine-downloader See its GitHub page for more information].
<noinclude>
[[Category:Other Guides]]
</noinclude>

Latest revision as of 20:55, 20 January 2024

This page describes how to download web game/animation files from Wayback Machine. Once the files are downloaded, they can be curated using Flashpoint Infinity.

Understanding Wayback Machine Capture URLs

Let's break down this example of a typical Wayback Machine capture URL:

https://web.archive.org/web/20150118221400/http://www.teagames.com/games/crazygolf2

First we have the Wayback Machine base URL: https://web.archive.org/web/. This is present at the beginning of all Wayback Machine capture URLs.

Next is the date code: 20150118221400. This indicates the date (2015-01-18) and the time (22:14:00) that the capture was taken. Wayback Machine automatically finds the closest capture to any date code you give it. It treats any incomplete date code as a range and searches for the closest capture to the end of that range. Here are some examples:

The last part of the capture URL is the URL from the original site. In this case it is http://www.teagames.com/games/crazygolf2.

  • The original URL (not the full capture URL!) will always be your launch command in Flashpoint.

Downloading the Original Files

By default, Wayback rewrites links and inserts a toolbar at the top of all archived pages. But when curating for Flashpoint, we want the original, unmodified files! To get those, you must add a modifier suffix to the end of the capture's date code. Wayback supports several suffixes, but the one you should use is id_, which means "identical" (to the original). Here is the capture URL from before, with the id_ suffix added:

https://web.archive.org/web/20150118221400id_/http://www.teagames.com/games/crazygolf2

Notice that the Wayback toolbar no longer appears. When downloading files for Flashpoint curations, you should always use the id_ suffix!

Searching Wayback Machine for Captures

There are two main methods of searching Wayback Machine for captures. Keep in mind that for unknown reasons, sometimes only one method will bring up a certain result. If you can't find a URL or asset using one method, try the other.

Quick Search

Let's say you want to search for all captures of this URL: http://www.teagames.com/games/crazygolf2.

Use the same structure as a capture URL, but replace the date code and modifier suffix with an asterisk: https://web.archive.org/web/*/http://www.teagames.com/games/crazygolf2.

This will show a calendar with colored circles for each capture. Hover over a circle and click a time code to access a capture.

  • A blue circle means the file was found at the requested URL (200 OK). This is generally what you want.
  • A green circle means there was a redirect (301 or 302).
  • A yellow circle means the URL was not found (4xx).
  • A red circle means there was a server error on the original website (5xx)

Now let's say you want to look at all the SWFs in TeaGames's swf directory.

As before, use an asterisk instead of a date code in the capture URL. But now, you also need to put an asterisk at the end of the original URL portion. Here is what you will get: http://web.archive.org/web/*/https://teagames.com/swf/*

Use the "filter results" box on the top-right to search for specific filenames or URLs. You can also search for specific MIME types, such as application/x-shockwave-flash.

CDX Search

This section goes over some basic examples of searches with Wayback's CDX API. The full documentation is here.

This search will list all captures of URLs starting with www.bbc.co.uk/cbbc/games/:
http://web.archive.org/cdx/search?url=www.bbc.co.uk/cbbc/games/&matchType=prefix&collapse=urlkey&filter=!statuscode%3A[45]&fl=original

This search will list all captures with status 200 OK from the domain www.happyfeet-game.com. These portions of the URL accomplish that: matchType=domain and statuscode%3A200.
https://web.archive.org/cdx/search?url=www.happyfeet-game.com&matchType=domain&collapse=urlkey&filter=statuscode%3A200&fl=original

You can also add a date range to a CDX search. For example, adding &from=2006&to=2008 to the previous search will return only captures from 2006 to 2008:
https://web.archive.org/cdx/search?url=www.happyfeet-game.com/&matchType=domain&collapse=urlkey&filter=statuscode%3A200&fl=original&from=2006&to=2008

Finally, this search will list all captures of SWF, DCR, and CCT files from URLs starting with superdudes.net:
https://web.archive.org/cdx/search/cdx?url=superdudes.net&matchType=prefix&filter=original:.*(\.swf|\.dcr|\.cct).*&fl=original&collapse=urlkey

Using Wayback Download Mode

Wayback Download Mode is a new mode of Flashpoint Core that allows you to download files directly from Wayback Machine. These files can then be added to your curations.

Setting Up Wayback Download Mode

Wayback Download Mode is compatible with Flashpoint Core 9 and beyond. Here are the steps to activate this mode:

  1. Download the Wayback Download Patch and extract it into your Flashpoint folder. Replace files when prompted.
    1. Core 9 and 10 use this legacy patch instead.
  2. Restart Flashpoint Launcher, then switch to the Config tab.
  3. Click the "Server" dropdown menu and switch to "Wayback Download Mode."
    WaybackDownloadMode.png
  4. Click "Save and Restart." Flashpoint Core will now use Wayback Download Mode when you launch games or curations.

Using Wayback Download Mode

  1. In Flashpoint Core, click New Game on the bottom-right corner of the launcher. If you do not see a New Game button, switch to the Config tab and check "Enable Editing."
    • Alternatively, you can switch to the Curate tab, then click New Curation instead of New Game. For more information about using the Curate tab, see Curation Tutorial.
  2. Enter the title and platform of the game, and any other metadata you wish. For the Launch Command, paste the original URL that you want to download from Wayback.
    • Note that if the original URL uses HTTPS, you will need to replace https with http.
  3. For the Application Path, click the dropdown and choose the appropriate application in Flashpoint for the type of game you are curating.
  4. Double-click the game to launch it. The game should load in the application you specified. Watch the Logs tab of Flashpoint Launcher as you play the game.
  5. Once all of the assets seem to have loaded, navigate to Flashpoint's Legacy\htdocs folder and sort the contents by Date Modified. Determine which files and folders belong to the game you just downloaded.
  6. Copy the files and folders from the htdocs folder to your curation's content folder, retaining the same structure. Refer to the Curation Tutorial.

Turning Off Wayback Download Mode

After you're done downloading from Wayback, you will need to turn it off to use Flashpoint Core normally. To do this, click the "Config" tab of Flashpoint Launcher, then switch back to "Legacy Webserver" ("Apache Webserver" for Core 9 and 10) using the "Server" dropdown menu.

Using cURLsDownloader

You can also download files from Wayback Machine using a combination of CDX Search and cURLsDownloader by following the steps below.

  1. Use CDX Search to obtain a list of original URLs to download from Wayback machine.
  2. Copy and paste these URLs into a text editor such as Notepad++ or Sublime Text.
  3. Now you need to turn these original URLs into capture URLs. First, determine a suitable date code.
    • For example, if the captures you want are from around January 2014, 201401 is a good date code.
  4. Compose a capture URL prefix using the Wayback Machine base URL, your date code, and the id_ suffix.
    • An example capture URL prefix would be https://web.archive.org/web/201401id_/.
  5. Use your text editor's Macros or Find & Replace functionality to insert your capture URL prefix before each original URL in the list.
  6. That's it! Save the text file, then refer to the Curation Tutorial and the cURLsDownloader Manual to complete your curation.
    • After the files are finished downloading, cURLsDownloader will give you the option to automatically move the downloaded files to their original URLs. Be sure to take advantage of this option!

Using Wayback Machine Downloader

Wayback Machine Downloader is a command line tool that can automatically download whole websites from the Wayback Machine. You can also use different filters to customize exactly which files are downloaded. See its GitHub page for more information.