Sonic and Sega Retro Message Board: Questions about php - Sonic and Sega Retro Message Board

Jump to content

Hey there, Guest!  (Log In · Register) Help
Page 1 of 1
    Locked
    Locked Forum

Questions about php Topic renamed because I have more questions :P

#1 User is offline nineko 

Posted 07 March 2015 - 09:15 AM

  • I am the Holy Cat
  • Posts: 5684
  • Joined: 17-August 06
  • Gender:Male
  • Location:italy
  • Project:I... don't even know anymore :U
  • Wiki edits:5,251
Basically, I want to implement something on one of the websites I manage. Before anyone asks, no, I can't easily use a database due to the nature of the project, even if I'm aware that a database would be the optimal solution, so moving on.

I will have a lot of data to deal with; it has a magnitude of many hundreds / few thousands of lines, with a dozen or two of fields. Since the data is going to be very repetitive by nature I'm thinking to use one or more arrays as LUTs and store all the data as numbers to save space, and also because the data will need to be filterable by most (if not all) the fields. Now, here's my question.

Amongst the fields there will be date fields, e.g. year, month, day, (hour, minute). Which is why I'm wondering: should I keep the thousands of lines within a same file and perform a lot of IFs over them, or should I maybe split the data by hand in smaller files? I explain.

Case 1:
all_data.csv
-> lots of IFs

Case 2:
2014.csv
2015.csv
-> the year filter can be applied "outside" with a FOR, while IFs for month and day will still apply

Case 3:
201501.csv
201502.csv
-> both year and month filters can be applied "outside"

Case 4:
20150101.csv
20150102.csv
-> all the date is filtered outside (with nested FORs or DOs or something); however, this would lead to 365 files per year, most of which with 0 or few lines.

So I'm wondering. Which is easier on the server? Lots of IFs, or lots of FOPENs?

Since I'm generating the data from another program I wrote it wouldn't be a problem to split it differently. What I can't do is feed that data into a database with an automatic procedure, so yeah. Which is the best plan B?
This post has been edited by nineko: 13 March 2015 - 08:14 PM

#2 User is offline Glitch 

Posted 07 March 2015 - 12:56 PM

  • Posts: 158
  • Joined: 22-September 08
  • Gender:Male
  • Project:Sonic 2 LD
  • Wiki edits:22
How you store the data will depend on what you want to do with it once it's on disk.

Both fread and fopen will require syscalls so they'll have the context switching overhead. If you're going for read speed you'll want to avoid that like the plague, so big files with sizeable read buffers would be ideal. From what you've said it looks like you'll be querying based on date ranges so I'd suggest:

Use big files with a fixed size limit. Keep appending rows until you hit that limit then start a new file. Maintain a separate index file containing your date values with pointers to your data in the fixed size files.

The main problems with lots of small files are: a). most filesystems don't cope very well with directories with many small files (you'd need something like rieserfs), and b). if you're on a shared VPS chances are you've got an inode limit.

So, yes, I'm suggesting you build your own basic database.

#3 User is offline nineko 

Posted 07 March 2015 - 01:29 PM

  • I am the Holy Cat
  • Posts: 5684
  • Joined: 17-August 06
  • Gender:Male
  • Location:italy
  • Project:I... don't even know anymore :U
  • Wiki edits:5,251
Thanks, I too assumed that having many small files was a bad idea, but since I hate to use IFs I wanted to get a second opinion from someone who knows more than me. I was already inclined towards a hybrid approach, now you gave me a confirmation. For now I'll just start with one big file to see how it goes, I might split by year eventually. I might also store redundant data for year / month / day combinations and perform checks upon them if both filters are enabled, e.g. filter for "201501" at once instead of filtering for "2015" and "01" separately, I am quite sure a few wasted bytes per line are well worth the removal of one IF. I might also do that at run time I guess, by comparing "201501" to (year * 100 + month), or something.
This post has been edited by nineko: 07 March 2015 - 01:30 PM

#4 User is offline nineko 

Posted 13 March 2015 - 08:18 PM

  • I am the Holy Cat
  • Posts: 5684
  • Joined: 17-August 06
  • Gender:Male
  • Location:italy
  • Project:I... don't even know anymore :U
  • Wiki edits:5,251
Sorry to post again, but I have another question. Is there a way, in php, to detect if a page is being loaded into a frame and return its name? I'm considering the chance to allow other people embed a page from my website inside an iframe, and I would like the presentation to change a little in those cases. I looked on Google with no success, it looks like it's not one of the $_SERVER variables (even 'REQUEST_URI' returns the URI of the framed page, and not the one of the container).

#5 User is offline GerbilSoft 

Posted 14 March 2015 - 08:02 AM

  • RickRotate'd.
  • Posts: 2846
  • Joined: 11-January 03
  • Gender:Male
  • Location:USA
  • Project:Gens/GS
  • Wiki edits:5,000 + one spin
That's entirely client-side. The only way to handle it would be to include a client-side javascript that issues a different page request depending on whether or not the page is in an iframe.

#6 User is offline nineko 

Posted 14 March 2015 - 09:43 AM

  • I am the Holy Cat
  • Posts: 5684
  • Joined: 17-August 06
  • Gender:Male
  • Location:italy
  • Project:I... don't even know anymore :U
  • Wiki edits:5,251
Thank you. My venture on Google indeed gave me the impression that I'd need to use javascript, but I hoped there could be a workaround for that. Too bad there isn't, but I won't pollute my project with something I consider on par with a sin (javascript). I added a GET variable for the purpose, the average user won't even know how to mess with it.

#7 User is offline Vangar 

Posted 16 March 2015 - 06:21 PM

  • Posts: 3473
  • Joined: 08-January 04
  • Gender:Not Telling
  • Location:America
  • Wiki edits:2
I'm interested in why you can't use a database for what seems like database data.

#8 User is offline nineko 

Posted 16 March 2015 - 08:17 PM

  • I am the Holy Cat
  • Posts: 5684
  • Joined: 17-August 06
  • Gender:Male
  • Location:italy
  • Project:I... don't even know anymore :U
  • Wiki edits:5,251
It's data I gather with another program I wrote, and that program runs on my own computer. It reads the data from two other websites, and cleans it up a lot. Once the data I extracted from those two websites is cleaned and sorted, I have to save it somewhere. Since I'm doing it with a program on my own computer, the most practical solution is to save it as a file, and CSV is quite handy. Storing it into a database would require me to either save it as a file made up of SQL instruction and somehow process it (from a mySql console or from a custom php file), or I can still output a CSV as I'm doing now and call a php page which takes that CSV just once and puts it into a database. I have to admit those are two valid options, but they would add more steps to my procedure, which I finalised in the past days and it works quite well.

I know it would be *MUCH* better if I somehow managed to gather the data directly from a php script or something so I didn't need to pass through my computer, but that's definitely too hard for me, it was hard enough to do it in a language and an environment I'm familiar with (VBA code in Microsoft Excel, which I also use to sort / process the data (which spans on 58 sheets, by the way)), no way I'm doing all that importing / cleaning / sorting in php, I don't know if that would even be possible.

I predict a maximum size of ~2100 rows for the CSV file, and so far (~1900 rows) the server is doing well with it.
This post has been edited by nineko: 16 March 2015 - 08:20 PM

#9 User is offline Billy 

Posted 16 March 2015 - 09:18 PM

  • RIP Oderus Urungus
  • Posts: 1817
  • Joined: 24-June 05
  • Gender:Male
  • Location:Colorado, USA
  • Project:retrooftheweek.net - Give it a visit and tell me what you think!
  • Wiki edits:15
Have you looked into SQLite or anything like that? It's not client-server, so it's great for embedding into desktop apps (the db is saved as a file), and there's bindings for PHP.

#10 User is offline nineko 

Posted 16 March 2015 - 09:32 PM

  • I am the Holy Cat
  • Posts: 5684
  • Joined: 17-August 06
  • Gender:Male
  • Location:italy
  • Project:I... don't even know anymore :U
  • Wiki edits:5,251
That's another option I admit I overlooked.

#11 User is offline Skeledroid 

Posted 17 March 2015 - 07:16 PM

  • Posts: 227
  • Joined: 17-November 06
  • Gender:Male
  • Wiki edits:1
I would just convert the CSV to JSON so you can use json_encode/decode in PHP to mess around with all the data as arrays in RAM.

Page 1 of 1
    Locked
    Locked Forum

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users