Hi all there!
I was reading this discussion with attention since we update from 9.72 to 10.3.5 an then 10.3.6 we start to experimenting this issue, almost every single day an in the worst cases, we lost o zero'd ALL the .dat files.
Actually I'm to dissagree with Justin, Martin and all people says that it's not a E-Blah problem. But before start a polemic and unuseful discussion let me say that personally I have a lot to thank to E-Blah software and of course to Justin and all colaborators of this team.
I'm going to explain what is my idea after moths to have to "fight" with this problem. I need to reconstruct the user list every shor time, some days 3 o 4 times.
All info showed bellow was taken at the moment I write this:
Site:
http://foros.pesca.org.mx/
Server specs:
==========
CPU info:[root@124780-web1 Members]#
cat /proc/cpuinfo vendor_id : AuthenticAMD
cpu family : 15
model : 67
model name : Dual-Core AMD Opteron(tm) Processor 1212
stepping : 3
cpu MHz : 2010.947
cache size : 1024 KB
Hard disk stats:[root@124780-web1 Members]#
df -hFilesystem Size Used Avail Use% Mounted on
/dev/sda5 227G 61G 155G 29% //dev/sda1 99M 12M 82M 13% /boot
none 1014M 0 1014M 0% /dev/shm
/dev/sda2 2.0G 161M 1.8G 9% /tmp
This server host to others small sites, but its is almost 90% dedicated to our forum. And as you can see, the server has plenty disk space not logically divided or shared. So the argument of few space do not explain the problem, at least not in our case.
OS:[root@124780-web1 Members]#
uname -aLinux 124780-web1.bir.com.mx 2.6.9-55.ELsmp #1 SMP Fri Apr 20 17:03:35 EDT 2007 i686 athlon i386 GNU/Linux
(Red Hat Entrerprise Ed.)
Until two weeks ago, we install an aditional GB of RAM:
[root@124780-web1 Members]#
cat /proc/meminfo MemTotal: 2074780 kB
MemFree: 81712 kB
Buffers: 121752 kB
Cached: 1632724 kB
SwapCached: 4 kB
Active: 855152 kB
Inactive: 967108 kB
HighTotal: 1179456 kB
HighFree: 2944 kB
LowTotal: 895324 kB
LowFree: 78768 kB
SwapTotal: 1052248 kB
SwapFree: 1052044 kB
Dirty: 3064 kB
Writeback: 0 kB
Mapped: 87780 kB
Slab: 154072 kB
CommitLimit: 2089636 kB
Committed_AS: 634916 kB
PageTables: 4468 kB
VmallocTotal: 106488 kB
VmallocUsed: 1884 kB
VmallocChunk: 104552 kB
Well, all starts when we (actually Justin do it) update the forum. Since the differences between the user databases structures of the diferents version, we "lost" about a 30% of the users. I say "lost" because there was no erased o zoroe'd but their info could not be changed or updated, so now we see a lots of "guest" signed messages. (even with my own original user)
Now, at the beginnig of this month start to experimented the problem becuase the amount of traffic generated by the users and in particular when we
RAN WITH LOW MEMORY, so we limit the access only for 250 users, with out no success, then limit again to 200, again with no success. Since the server is monitored by the provider, RackSpace for your information, we received lot of reports and the recommended us a RAM expansion:

So we expand the RAM from 1 GB to 2 GB, the problem was removed but only for 2 or 3 day when the user can login again because we retired the limitation. Look at this
top comand output:
top - 15:06:14 up 7 days, 20:25, 1 user, load average: 0.19, 0.28, 0.29
Tasks: 123 total, 2 running, 120 sleeping, 0 stopped, 1 zombie
Cpu(s): 1.7% us, 0.3% sy, 0.0% ni, 98.0% id, 0.0% wa, 0.0% hi, 0.0% si
Mem: 2074780k total, 2011732k used, 63048k free, 125452k buffers
Swap: 1052248k total, 204k used, 1052044k free, 1649304k cachedConsidere this output was taken today, saturday when we have low activity because all members went to fish!

If you visit our portal at any day of the in-week, you can see the speed with the users post his messages: it shows 100 replays. In a very short time, the portal is renewed, at least the replays section.
So, instead a problem with the disk space, now belive that is more RAM related and the RAM related problems many times are cuased by the software. Also I think the lock management is a little deficient.
In particular, I understad Perl in a little so revising the code, I noted that its approach to read and write to the files in some places are:
open A, $file;
@CONTAINER = <A>;
close A;
foreach (@CONTAINER) {
...
...
}
Althoug in the newer versions the last aproach was changed to:
open A, $file;
while (<A>) {
...
...
}
close A;
This change was applied only in some places but not in others so, as natural, since the first method is a very "hungry" RAM consumer, very qickly, when you have many users reading and writing at the same time, is very easy that the List2.txt file becomes "corrupted" (In some cases, it becomes binary instead ascii).
A similar ocurrs with .DAT files, even in rare cases, the other files, like PM and other. Only once we experiment the loss of one large discusion): To explain an empty .DAT file: If in the moment when the file is open and the server becomes low or not memory available or goes with no resourses, the file was closed with no data or even the pointer or the variable that holds the handler is missing then the .DAT file is also lost.
The final "mistery" that I can not imagine out a reason, is that we have three particular user whom very often lost is .DAT files and some times, the lost ALL his files. One in particular, named "JOSERRA" has lost his user data for a least 7 times in two months, of this, 2 times lost all files.
As I told you all, is not my intention to create a debate, I belive E-Blah is a very good software and its worth to revise it and make it even better.
What do you think?
h yamasaki