Discussion in 'General Sega Discussion' started by Black Squirrel, Mar 6, 2020.
Two days in a row I've managed to crash this site just by using it. They're archiving Digitiser - my short-term aim was to replace our text-only reviews, but obviously there's ten years of broadcasting here and a massive chunk of it is relevant to our interests. And an unstable(?) website is a good example of why mirrors are a good thing.
I had to come up with a special means of handling Digitiser many moons ago:
archive interviews from sega-16. The intervies get moved around and the links won't work.
I'd do a more thorough job but it's tedious without a means of mass uploading files. Here's all the relevant reviews that Super Page 58 have archived for 1993.
How quickly the rest are done will depend on whether I can be bothered to do my real life job I'm actually paid for.
...so maybe by Tuesday afternoon.
Do you need help?... What have I to do?... upload everything?
I'm planning on getting do the rest of the reviews, but the rest will be up for grabs
Ok... if you need something just give me a shout... and don't forget the instructions... I'm not good with this stuff and I can easily get lost ... ...
Right, so here's how Digitiser worked:
Digitiser was a teletext service. Teletext was a protocol(?) that was big in Europe (especially in the UK), where digital information was encoded between the analogue frames of a television signal. With the correct receiver (aka all new TVs from the late 1970s) that data could be interpreted into a digital image. It was how we got on-demand news before the internet, and it lasted well beyond its natural life. It could do text, it could do blocky graphics, it could flash some things... and that's about it.
Digitiser was a section that covered video games on Channel 4's teletext service. It lasted ten years and was updated almost daily, so although limited by 1970s technology, it was able to report news well in advance of physical magazines. Also it was free as long as you had access to a television and could receive Channel 4, which you almost certainly could by 1993.
That is to say, its readership was probably quite large, and so is kinda important.
However, teletext wasn't "archived" in the traditional sense. Things would be up for one or two days max, so loads of it has been lost to time. But the nature of the broadcast meant that if you were recording a program on VHS... you'd get teletext data too, so efforts are being made to source old tapes and extract information. The archive is incomplete, but it's slowly being filled. It's a British institution, some are more excited than others.
Sonic/Sega Retro cares because Sonic/Sega = video games. So ideally we want an archive too.
I forget the terms - Digitiser had half a dozen pages, and those could be divided into 255 sub-pages but you probably wouldn't see that many, and whatever. I made a bunch of tables to house everything:
(Digitiser didn't get permanent page numbers until 1999 which is why there's x71, x72 etc.)
Currently the only sensible way to do this is to represent the page as an already decoded image (different TVs give slightly different results, so it's not ideal). I started naming files "Digitiser UK <DATE> <PAGE> <SUBPAGE>.png" and sticking them on Retro CDN. I've only tackled reviews but as you can see, there was news, previews, tips letters and all sorts of strange things involving worms and swans. Someone else can make the call on whether we need to archive things like Roaming Thomas.
As far as using Digitiser on the wiki goes, I had to jump through a lot of hoops to get it down to this
add each image to the archive above, and you should be able to generate a nice set of thumbnails. It's easy, but time consuming. If there's a page number it can't cope with, let me know.
I've been mirroring images from fan site Super Page 58 (which has a lot more information than us), but I find that it tends to break if you make too many requests in a short amount of time. I don't want completely undermine their efforts since we're only going to use sub-set of what they offer, but if the site dies after being forced to load a set of 8-colour images.... eeehhhhh. The good news is that Saturdays on Digitiser usually were just a round-up of the week, and nothing happened on Sundays. The service also occasionally took breaks, so we're not likely to need to cover every single day of its existence anyway.
You'll also notice some of the images have dates that don't match the dates on Super Page 58 - that's because Digitiser updated in the early mornings, and people often left their VCRs to record things overnight.
Longer term I want to do things differently. I want to store raw teletext data and interpret it through the web browser - technology already exists to do that but we haven't got the raw data and while I've thought about reverse-engineering images, at the moment this is a pipe dream that's years, if not decades away.
(and yes we are absolutely interested in similar services offered in other countries, teletext-based or otherwise)
There are some RTP's teletext pages that survived on their old website thanks to archive.org and its wonderful Wayback Machine... but unfortunately RTP had no section that covered video games... so those pages are just crap (unless you want to know your horoscope from ages ago or some nice recipes... ... )
A few more scans of Arcadia, Gamest, etc. ...
Somewhat-crusty quality PDFs, but these scans of a Turkish Nintendo magazine do have reviews of at least 9 Sega games for the DS and Wii.
A big batch of VideoGames (& Computer Entertainment) turned up last night. I've got copies with correct page numbering ready to go, but as Retro CDN is refusing to upload them, it could take a while.
Just so we're not stepping on each others' toes.
Found a bunch of scans of Play (US), and as of now I'm downloading all the ones we don't already have. They're very hefty scans going up to 400MB, so to make them easier to load on our servers (and to upload on my connection) I'm also compressing them to 150dpi.
I am officially giving up on Surge issue #2. It was added to Retromags a few days ago, I chopped out any supplements to make the page numbers line up, sorted out review scores on Sega Retro with the intention of having the file uploaded and... Retro CDN just won't take it.
Problems are almost certainly server-side, but after 30-40 attempts, it's no longer worth the effort. Perhaps someone else can get better results - I was resisting compressing the thing, but maybe 389MB is an unlucky number in Mediawiki land.
I had the same problem with the upload. The page on Sega Retro has a link to the (non-existing) pdf, maybe that's a problem for Retro CDN? What about the file name? I changed the dpi settings to 150 dpi, maybe it'll work now.
Sega Retro shouldn't affect Retro CDN - I had all those VideoGames (& Computer Entertainment) scans linked up before the uploads completed.
It's hard to know exactly why it was failing given it just throws out a generic "your internet sucks" error - I can only guess it's something about the data being uploaded, be it the file size, how the thumbnails were generated, the position of local planets, who knows.
Once upon a time I could use FTP and bypass everything. I really miss those days.
I keep meaning to have a third round of that Sega Tat tumblr I made a few years ago.
Sega Toys have a range for the pandemic.
^ hey look wrong topic
Something I thought we'd done, but turns out we haven't - archive.org have (many) Dreamcast manuals that aren't on Sega Retro. I uploaded a handful then got bored - it's a perfect task to delegate.
The quality ranges from "good enough" to "terrible don't upload this". None should be uploaded without checking the contents first - some pages are missing, others are unreadible, and I think there might be a few fakes in there too. It requires a working set of eyes.
Steps, as always:
if there's only a PDF:
if there are double page spreads, split the images in two:
correct the dpi of each image so the thumbnails don't look like arse:
recompile as PDF:
of particular note are the PAL manuals. There are a lot of incomplete, French-only scans which are of no use to us, however with the Dreamcast, there were legitimate French-only manuals too. Check the product codes and page counts and whatever. I've also noticed a few with extra scans tagged on the end such as boxes or registration cards - these are separate items so would need to be uploaded separately (if they don't already exist).
If there are missing pages, e.g. this one, put in a placeholder image so that the page numbering lines up.
I switched to using ImageMagick for splitting images. Here's the batch file for splitting vertically:
for %%f in (*.png *.jpg *.jpeg) do magick convert -crop [email protected] "%%f" "out_%%~nf_%d.png"
Replace 2x1 with 1x2 to split horizontally.
And here's the batch file for converting PNGs to 96 dpi JPGs:
for %%f in (*.png *.ppm) do magick convert -strip -interlace Plane -quality 95 -units PixelsPerInch -density 96 "%%f" "%%~nf.jpg"
I was circa 500 manuals away from downloading the whole site, that's when I realized no, I'm not that crazy (and that there are multiple versions of the same manuals by different uploaders). xP
Bookmarked it for later.
Separate names with a comma.