Discussion:
High disk load +mount/atacontrol/NFS/SMBFS crashes the system
(too old to reply)
Alejandro Pulver
2007-04-14 21:47:19 UTC
Permalink
Hello.

I have experienced the following problem a couple of times in 2
different machines and FreeBSD versions (see below): when the disk is
continuously reading/writing (like when copying/extracting a file,
checking the filesystem in the background, etc.) my system crashes
sometimes (it's not an everyday thing, but quite frustrating when it
happens).

When copying from another machine by NFS/SMBFS more than one file at
the same time (or when using the disk, like described above) often
crashes (and the disk light indicator turns off). Running "atacontrol
ad0 mode UDMA100" when it was UDMA133 crashed the system (the disk
activity indicator was always on) when I tried to solve the problem
that way. Also when I was installing a port which installs many files
on the second machine without using NFS/SMBFS, trying to mount a local
NTFS filesystem (with kernel driver) crashed.

The first machine is an Athlon XP 2400+ with FreeBSD 6.1-RELEASE and
custom kernel (see below) and the second one a new Athlon64 X2 3500
with FreeBSD 6.2-RELEASE running in i386 mode, with generic SMP kernel.
See the boot messages and kernel config here:

http://people.freebsd.org/~alepulver/disk-crash.tar.bz2

Also I got (only twice, when checking the filesystem after one of these
crashes) the following error on the first machine, that I don't know if
it's related or not to the previous problems:

fsync: giving up on dirty
0xc51d6990: tag devfs, type VCHR
usecount 1, writecount 0, refcount 806 mountedhere 0xc51a4000
flags ()
v_object 0xc144cb58 ref 0 pages 3232
lock type devfs: EXCL (count 1) by thread 0xc54e2c00 (pid 837)
dev ad2s1f

I would appreciate any help. If you need more information just ask.

Thanks and Best Regards,
Ale

P.S.: please CC me as I'm not subscribed.
Garrett Cooper
2007-04-15 00:40:38 UTC
Permalink
Post by Alejandro Pulver
Hello.
I have experienced the following problem a couple of times in 2
different machines and FreeBSD versions (see below): when the disk is
continuously reading/writing (like when copying/extracting a file,
checking the filesystem in the background, etc.) my system crashes
sometimes (it's not an everyday thing, but quite frustrating when it
happens).
When copying from another machine by NFS/SMBFS more than one file at
the same time (or when using the disk, like described above) often
crashes (and the disk light indicator turns off). Running "atacontrol
ad0 mode UDMA100" when it was UDMA133 crashed the system (the disk
activity indicator was always on) when I tried to solve the problem
that way. Also when I was installing a port which installs many files
on the second machine without using NFS/SMBFS, trying to mount a local
NTFS filesystem (with kernel driver) crashed.
The first machine is an Athlon XP 2400+ with FreeBSD 6.1-RELEASE and
custom kernel (see below) and the second one a new Athlon64 X2 3500
with FreeBSD 6.2-RELEASE running in i386 mode, with generic SMP kernel.
http://people.freebsd.org/~alepulver/disk-crash.tar.bz2
Also I got (only twice, when checking the filesystem after one of these
crashes) the following error on the first machine, that I don't know if
fsync: giving up on dirty
0xc51d6990: tag devfs, type VCHR
usecount 1, writecount 0, refcount 806 mountedhere 0xc51a4000
flags ()
v_object 0xc144cb58 ref 0 pages 3232
lock type devfs: EXCL (count 1) by thread 0xc54e2c00 (pid 837)
dev ad2s1f
I would appreciate any help. If you need more information just ask.
Thanks and Best Regards,
Ale
P.S.: please CC me as I'm not subscribed.
Ale,
Could you provide more information about your machine, in particular
the devices attached (lspci -vv from sysutils/pciutils does the trick)
and the options enabled in your custom kernel please?
Also, could you provide more information about what the settings are
that you are using for NFS and SMBFS (-rsize, -wsize, special
mountd/rpcbind options, etc).
-Garrett
Garrett Cooper
2007-04-16 06:33:47 UTC
Permalink
Post by Alejandro Pulver
On Sat, 14 Apr 2007 17:40:38 -0700
Post by Garrett Cooper
Post by Alejandro Pulver
Hello.
I have experienced the following problem a couple of times in 2
different machines and FreeBSD versions (see below): when the disk is
continuously reading/writing (like when copying/extracting a file,
checking the filesystem in the background, etc.) my system crashes
sometimes (it's not an everyday thing, but quite frustrating when it
happens).
When copying from another machine by NFS/SMBFS more than one file at
the same time (or when using the disk, like described above) often
crashes (and the disk light indicator turns off). Running "atacontrol
ad0 mode UDMA100" when it was UDMA133 crashed the system (the disk
activity indicator was always on) when I tried to solve the problem
that way. Also when I was installing a port which installs many files
on the second machine without using NFS/SMBFS, trying to mount a local
NTFS filesystem (with kernel driver) crashed.
[...]
Post by Garrett Cooper
Ale,
Hello.
Thank you for your reply.
Post by Garrett Cooper
Could you provide more information about your machine, in particular
the devices attached (lspci -vv from sysutils/pciutils does the trick)
and the options enabled in your custom kernel please?
Sure. I have updated the file (added pci_machine_1.txt and
pci_machine_2.txt). The kernel configuration is already there (named
ATHLON-PHOBOS), the second machine has a default SMP kernel.
http://people.freebsd.org/~alepulver/disk-crash.tar.bz2
Post by Garrett Cooper
Also, could you provide more information about what the settings are
that you are using for NFS and SMBFS (-rsize, -wsize, special
mountd/rpcbind options, etc).
-Garrett
rpcbind_enable="YES"
nfs_server_enable="YES"
nfs_client_enable="YES"
# mount deimos:/wxp /mnt
After both FreeBSD machines crashed when the problem happened (because
of the NFS waiting infinitely), I started using "-i". The second
command was to copy some data from a Windows machine.
BTW I don't think the problem is related to NFS/SMBFS but to the disk
drivers, since it happens without them too. One is ATA (has an year)
and the other is SATA (new). However I am not experienced in this to
tell.
Thanks and Best Regards,
Ale
Ale,
I'm not sure what's going on exactly based on the information you
provided, but I would try the following steps to isolate the issue:

1) See if you can upgrade the first machine to a later version of
FreeBSD, say 6.2. I believe that there were related issues resolved in
6.2, but my memory could be incorrect. See if your problems occur after
that.
2) Try grabbing a different machine if possible and see if the same
issue occurs when you put the new machine as server and client with one
of the other machines.
3) Try switching roles with the 2 machines. If machine 1 is usually
server, let it play client and vice versa with machine 2.
4) Remove the new drive if possible, see if issue goes away. If it does,
try acquiring a cheap(er) drive and put it

Also, it appears that another FreeBSD team member had a similar issue
(see: http://people.freebsd.org/~pho/stress/log/cons205.html and
http://people.freebsd.org/~pho/stress/log/cons225.html). I dunno how but
it showed up as one of the leading searches on Google.

It looks like a (localized) filesystem issue, but I'm not sure what it
is exactly.

-Garrett
Roger Olofsson
2007-04-23 16:27:41 UTC
Permalink
Post by Alejandro Pulver
On Sun, 15 Apr 2007 23:33:47 -0700
Post by Garrett Cooper
Ale,
I'm not sure what's going on exactly based on the information you
1) See if you can upgrade the first machine to a later version of
FreeBSD, say 6.2. I believe that there were related issues resolved in
6.2, but my memory could be incorrect. See if your problems occur after
that.
I did that.
Post by Garrett Cooper
2) Try grabbing a different machine if possible and see if the same
issue occurs when you put the new machine as server and client with one
of the other machines.
I used a Win XP machine as client / server.
Post by Garrett Cooper
3) Try switching roles with the 2 machines. If machine 1 is usually
server, let it play client and vice versa with machine 2.
Also did this.
Post by Garrett Cooper
4) Remove the new drive if possible, see if issue goes away. If it does,
try acquiring a cheap(er) drive and put it
It's the only drive it has, I meant the second machine is all new, not
just the disk.
Post by Garrett Cooper
Also, it appears that another FreeBSD team member had a similar issue
(see: http://people.freebsd.org/~pho/stress/log/cons205.html and
http://people.freebsd.org/~pho/stress/log/cons225.html). I dunno how but
it showed up as one of the leading searches on Google.
It looks like a (localized) filesystem issue, but I'm not sure what it
is exactly.
The fsync() problem seems to be related to that, but the rest could be
be a different thing. Also I only got it twice. Maybe the filesystem
issues were only derived from the crashes.
I was unable to reproduce the problem in the first machine, maybe it
was fixed on FreeBSD 6.2 as you said. The only things I also did when
testing was unloading fuse.ko (unused) and linprocfs.ko (after
umounting it). However I will test it a few times more, and let you
know the results.
The strange crash in the new 6.2 machine when using atacontrol is still
unexplained and I couldn't make it happen again (it now refuses to
switch to UDMA100 mode when it is SATA300, maybe they aren't supported
in SATA drives, but the other time it just crashed without advise).
Thank you for your help with this.
Best Regards,
Ale
Dear Ale,

I have experienced something similar as you described when this thread
started. The solution for me was to exchange the NIC I had for one that
worked better. I learned that using cheap nics with realtek chips causes
crashes even on the most stable operating system in the world.

When I browsed the source code for the driver of the realtek-based nic I
regretted I hadn't done so earlier. The comments were _crystal_ clear
about the design and performance of it. See /usr/src/sys/pci/if_rl.c. I
particularly liked the following bit:

/*
* Here's a totally undocumented fact for you. When the
* RealTek chip is in the process of copying a packet into
* RAM for you, the length will be 0xfff0. If you spot a
* packet header with this value, you need to stop. The
* datasheet makes absolutely no mention of this and
* RealTek should be shot for this.
*/

Hope you will solve the issue!

Greetings
/Roger
Alejandro Pulver
2007-04-15 19:17:53 UTC
Permalink
On Sat, 14 Apr 2007 17:40:38 -0700
Post by Garrett Cooper
Post by Alejandro Pulver
Hello.
I have experienced the following problem a couple of times in 2
different machines and FreeBSD versions (see below): when the disk is
continuously reading/writing (like when copying/extracting a file,
checking the filesystem in the background, etc.) my system crashes
sometimes (it's not an everyday thing, but quite frustrating when it
happens).
When copying from another machine by NFS/SMBFS more than one file at
the same time (or when using the disk, like described above) often
crashes (and the disk light indicator turns off). Running "atacontrol
ad0 mode UDMA100" when it was UDMA133 crashed the system (the disk
activity indicator was always on) when I tried to solve the problem
that way. Also when I was installing a port which installs many files
on the second machine without using NFS/SMBFS, trying to mount a local
NTFS filesystem (with kernel driver) crashed.
[...]
Post by Garrett Cooper
Ale,
Hello.

Thank you for your reply.
Post by Garrett Cooper
Could you provide more information about your machine, in particular
the devices attached (lspci -vv from sysutils/pciutils does the trick)
and the options enabled in your custom kernel please?
Sure. I have updated the file (added pci_machine_1.txt and
pci_machine_2.txt). The kernel configuration is already there (named
ATHLON-PHOBOS), the second machine has a default SMP kernel.

http://people.freebsd.org/~alepulver/disk-crash.tar.bz2
Post by Garrett Cooper
Also, could you provide more information about what the settings are
that you are using for NFS and SMBFS (-rsize, -wsize, special
mountd/rpcbind options, etc).
-Garrett
I am not using nothing special here. In rc.conf:

rpcbind_enable="YES"
nfs_server_enable="YES"
nfs_client_enable="YES"

And the commands (at different times):

# mount deimos:/wxp /mnt
# mount -t smbfs //***@mariana/c /mnt

After both FreeBSD machines crashed when the problem happened (because
of the NFS waiting infinitely), I started using "-i". The second
command was to copy some data from a Windows machine.

BTW I don't think the problem is related to NFS/SMBFS but to the disk
drivers, since it happens without them too. One is ATA (has an year)
and the other is SATA (new). However I am not experienced in this to
tell.

Thanks and Best Regards,
Ale
Alejandro Pulver
2007-04-23 02:26:33 UTC
Permalink
On Sun, 15 Apr 2007 23:33:47 -0700
Post by Garrett Cooper
Ale,
I'm not sure what's going on exactly based on the information you
1) See if you can upgrade the first machine to a later version of
FreeBSD, say 6.2. I believe that there were related issues resolved in
6.2, but my memory could be incorrect. See if your problems occur after
that.
I did that.
Post by Garrett Cooper
2) Try grabbing a different machine if possible and see if the same
issue occurs when you put the new machine as server and client with one
of the other machines.
I used a Win XP machine as client / server.
Post by Garrett Cooper
3) Try switching roles with the 2 machines. If machine 1 is usually
server, let it play client and vice versa with machine 2.
Also did this.
Post by Garrett Cooper
4) Remove the new drive if possible, see if issue goes away. If it does,
try acquiring a cheap(er) drive and put it
It's the only drive it has, I meant the second machine is all new, not
just the disk.
Post by Garrett Cooper
Also, it appears that another FreeBSD team member had a similar issue
(see: http://people.freebsd.org/~pho/stress/log/cons205.html and
http://people.freebsd.org/~pho/stress/log/cons225.html). I dunno how but
it showed up as one of the leading searches on Google.
It looks like a (localized) filesystem issue, but I'm not sure what it
is exactly.
The fsync() problem seems to be related to that, but the rest could be
be a different thing. Also I only got it twice. Maybe the filesystem
issues were only derived from the crashes.

I was unable to reproduce the problem in the first machine, maybe it
was fixed on FreeBSD 6.2 as you said. The only things I also did when
testing was unloading fuse.ko (unused) and linprocfs.ko (after
umounting it). However I will test it a few times more, and let you
know the results.

The strange crash in the new 6.2 machine when using atacontrol is still
unexplained and I couldn't make it happen again (it now refuses to
switch to UDMA100 mode when it is SATA300, maybe they aren't supported
in SATA drives, but the other time it just crashed without advise).

Thank you for your help with this.

Best Regards,
Ale
Alejandro Pulver
2007-04-24 00:58:58 UTC
Permalink
On Sun, 22 Apr 2007 23:26:33 -0300
Alejandro Pulver <***@FreeBSD.org> wrote:

[...]
Post by Alejandro Pulver
The strange crash in the new 6.2 machine when using atacontrol is still
unexplained and I couldn't make it happen again (it now refuses to
switch to UDMA100 mode when it is SATA300, maybe they aren't supported
in SATA drives, but the other time it just crashed without advise).
In the machine which was recently upgraded to 6.2 using "atacontrol"
when the disk is reading/writing crashes the system half or more of the
times.

Maybe it's a BIOS or hard disk issue, but I can't try it in the new
machine because it uses "SATA300" and other modes aren't documented in
the manual page. The other machine supports UDMA* modes.

Otherwise there is a problem in "atacontrol". When this happens, the
disk light keeps blinking but the system freezes, and keeps waiting for
the disk to respond forever.

Best Regards,
Ale
Rick C. Petty
2007-04-24 03:59:44 UTC
Permalink
Post by Alejandro Pulver
In the machine which was recently upgraded to 6.2 using "atacontrol"
when the disk is reading/writing crashes the system half or more of the
times.
Which atacontrol command were you doing? Just plain "atacontrol" shouldn't
do anything useful.

-- Rick C. Petty
Alejandro Pulver
2007-04-25 00:20:27 UTC
Permalink
On Mon, 23 Apr 2007 22:59:44 -0500
Post by Rick C. Petty
Post by Alejandro Pulver
In the machine which was recently upgraded to 6.2 using "atacontrol"
when the disk is reading/writing crashes the system half or more of the
times.
Which atacontrol command were you doing? Just plain "atacontrol" shouldn't
do anything useful.
To change DMA modes, like:

# atacontrol mode ad0 UDMA100

Best Regards,
Ale

Loading...