Discussion:
msleep() on recursivly locked mutexes
(too old to reply)
Hans Petter Selasky
2007-04-26 19:36:33 UTC
Permalink
Hi,

In the new USB stack I have defined the following:

u_int32_t
mtx_drop_recurse(struct mtx *mtx)
{
u_int32_t recurse_level = mtx->mtx_recurse;
u_int32_t recurse_curr = recurse_level;

mtx_assert(mtx, MA_OWNED);

while(recurse_curr--) {
mtx_unlock(mtx);
}

return recurse_level;
}

void
mtx_pickup_recurse(struct mtx *mtx, u_int32_t recurse_level)
{
mtx_assert(mtx, MA_OWNED);

while(recurse_level--) {
mtx_lock(mtx);
}
return;
}

When I do a msleep() I do it like this:

level = mtx_drop_recurse(ctd->p_mtx);

error = msleep(ctd, ctd->p_mtx, 0,
"config td sleep", timeout);

mtx_pickup_recurse(ctd->p_mtx, level);

Are there any comments on integrating this functionality into msleep(), and
adding mtx_drop_recurse() and mtx_pickup_recurse() to the FreeBSD kernel?

--HPS
Julian Elischer
2007-04-26 21:18:00 UTC
Permalink
Post by Hans Petter Selasky
Hi,
u_int32_t
mtx_drop_recurse(struct mtx *mtx)
{
u_int32_t recurse_level = mtx->mtx_recurse;
u_int32_t recurse_curr = recurse_level;
mtx_assert(mtx, MA_OWNED);
while(recurse_curr--) {
mtx_unlock(mtx);
}
return recurse_level;
}
The reason that mutexes ever recurse in the first place is usually because one
piece of code calls itself (or a related piece of code) in a blind manner.. in other
words, it doesn't know it is doing so. The whole concept of recursing mutexes is a bit
gross. The concept of blindly unwinding them is I think just asking for trouble.

Further the idea that holding a mutex "except for when we sleep" is a generally
bright idea is also a bit odd to me.. If you hold a mutex and release it
during sleep you probably should invalidate all assumptions you made during the
period before you slept as whatever you were protecting has possibly been raped
while you slept. I have seen too many instances where people just called msleep
and dropped the mutex they held, picked it up again on wakeup, and then blithely
continued on without checking what happened while they were asleep.

It seems to me that if someone else held that lock for some purpose when you slept,
then you don't know what it was that they were trying to protect, so you don't
know what to recheck when you wake up.. I think this is extremely dangerous as you
don't know when you are in this situation..

Personally I think Matt Dillon had a good idea when he suggested that the
need to sleep when holding a lock should require the lock holder to back out as
far as needed as to be able to see why the lock was held and to comfortably cope
with the situaution when the protected structures got frobbed while it slept.
(He actually at one time committed some primitives to do this. I forget their names
and htey were never used, but the idea has some merrit.)


This change may be OK in the situations you plan but it makes me very uncomfortable.
it basically should have a big sign on it saying "You could spend years looking
for the wierd bugs you are likely to get if you use this" on it.
Post by Hans Petter Selasky
void
mtx_pickup_recurse(struct mtx *mtx, u_int32_t recurse_level)
{
mtx_assert(mtx, MA_OWNED);
while(recurse_level--) {
mtx_lock(mtx);
}
return;
}
level = mtx_drop_recurse(ctd->p_mtx);
error = msleep(ctd, ctd->p_mtx, 0,
"config td sleep", timeout);
mtx_pickup_recurse(ctd->p_mtx, level);
Are there any comments on integrating this functionality into msleep(), and
adding mtx_drop_recurse() and mtx_pickup_recurse() to the FreeBSD kernel?
--HPS
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
Julian Elischer
2007-04-27 18:07:39 UTC
Permalink
Post by Matthew Dillon
The real culprit here is passing held mutexes to unrelated procedures
in the first place because those procedures might have to block, in
order so those procedures can release and reacquire the mutex.
That's just bad coding in my view. The unrelated procedure has no
clue as to what the mutex is or why it is being held and really has no
business messing with it.
[description of DFLY spinlocks removed for brevity]
Post by Matthew Dillon
If I were to offer advise it would be: Just stop trying to mix water
and hot wax. Stop holding mutexes across potentially blocking procedure
calls. Stop passing mutexes into unrelated bits of code in order for
them to be released and reacquired somewhere deep in that code. Just
doing that will probably solve all of the problems being reported.
Actually Matt, I don't think there is much argument here.. It has come to the time
where locking needs a big broom, and I suspect that there will be such a broom
activated at BSDCan. There is some work going on (by john and others) to
'unify and sanitise' the locking. We'll see where it goes.

Or is that wax and hot water?
Attilio Rao
2007-05-02 16:38:45 UTC
Permalink
Post by Matthew Dillon
The real culprit here is passing held mutexes to unrelated procedures
in the first place because those procedures might have to block, in
order so those procedures can release and reacquire the mutex.
That's just bad coding in my view. The unrelated procedure has no
clue as to what the mutex is or why it is being held and really has no
business messing with it.
What I did was implement spinlocks with VERY restricted capabilities,
far more restricted then the capabilities of your mutexes. Our
spinlocks are meant only to be used to lock up tiny pieces of code
(like for ref counting or structural or flag-changing operations).
Plus the kernel automatically acts as if it were in a critical section
if it takes an interrupt while the current thread is holding a spinlock.
That way mainline code can just use a spinlock to deal with small bits
of interlocked information without it costing much in the way of
overhead.
Well, this is currently what our spinmutexes do too.
The couplet mtx_lock_spin()/mtx_unlock_spin() simply starts/exits a
critical section (disabling interrupts in the while and avoiding
preemption). They are intended to be used for very small pieces of code too.
Post by Matthew Dillon
I made the decision that ANYTHING more complex then that would have to
use a real lock, like a lockmgr lock or a token, depending on the
characteristics desired. To make it even more desireable I also
stripped down the lockmgr() lock implementation, removing numerous
bits that were inherited from very old code methodologies that have no
business being in a modern operating system, like LK_DRAIN. And I
removed the passing of an interlocking spinlock to the lockmgr code,
because that methodology was being massively abused in existing code
(and I do mean massively).
Well, if you add a more smart interface, you have *exactly* our sx locks
implementation.
Basically, sx and lockmgr in FreeBSD just differs beacause of the
lockmgr's stupid API, beacause of draining and beacause of interlock.
But they are basically very very similar*.
Post by Matthew Dillon
I'm not quite sure what the best way to go is for FreeBSD, because
you guys have made your mutexes just as or even more sophisticated
then your normal locks in many respects, and you have like 50 different
types of locks now (I can't keep track of them all).
Not sure what you mean with 'more sophisticated'... anyways...
The only one problem I currently see with our locking primitives is that
they are not very well documented (or part of the documentation is
stale) and this can be a problem when there are a couple of locking
primitives as we have but this doesn't mean that they are complex.
Really, any primitive is very simple and is thought to be used in its
particular context. The restriction we have on locks just are a sort of
warning for people developing wrong locking strategies.
For example, there are not tecnological difficulties in allowing holding
mutexes when sleeping but if this really happen, probabilly there is a
problem in your locking scheme.
Post by Matthew Dillon
If I were to offer advise it would be: Just stop trying to mix water
and hot wax. Stop holding mutexes across potentially blocking procedure
calls. Stop passing mutexes into unrelated bits of code in order for
them to be released and reacquired somewhere deep in that code. Just
doing that will probably solve all of the problems being reported.
I cannot understand what part of the codes you are referring with this...

Thanks,
Attilio

* Another difference is about upgrading, but I consider FreeBSD's
lockmgr upgrading a really bad choice of design, and world could be a
very better place without it
Attilio Rao
2007-04-26 21:38:55 UTC
Permalink
Post by Hans Petter Selasky
Hi,
u_int32_t
mtx_drop_recurse(struct mtx *mtx)
{
u_int32_t recurse_level = mtx->mtx_recurse;
u_int32_t recurse_curr = recurse_level;
mtx_assert(mtx, MA_OWNED);
while(recurse_curr--) {
mtx_unlock(mtx);
}
return recurse_level;
}
void
mtx_pickup_recurse(struct mtx *mtx, u_int32_t recurse_level)
{
mtx_assert(mtx, MA_OWNED);
while(recurse_level--) {
mtx_lock(mtx);
}
return;
}
level = mtx_drop_recurse(ctd->p_mtx);
error = msleep(ctd, ctd->p_mtx, 0,
"config td sleep", timeout);
mtx_pickup_recurse(ctd->p_mtx, level);
Are there any comments on integrating this functionality into msleep(), and
adding mtx_drop_recurse() and mtx_pickup_recurse() to the FreeBSD kernel?
Several times I thought to adding recursed mutex handling in msleep &
condvar, but in the end the better approach is mantaining the current
behaviour.

Recursed mutexes often are results of incorrect locking strategies and
catering that kind of errors is necessarily a bad idea.
It is a lot better handling potential recursing mutexes outside from
sleeping points (that is what your implementation does).

Just, please, don't deal with internal struct mtx fields for that (so,
please, don't refer to mtx->mtx_recurse) just use the mutex(9) API
provided.

Attilio
--
Peace can only be achieved by understanding - A. Einstein
M. Warner Losh
2007-04-26 21:47:13 UTC
Permalink
In message: <***@c2i.net>
Hans Petter Selasky <***@c2i.net> writes:
: Are there any comments on integrating this functionality into msleep(), and
: adding mtx_drop_recurse() and mtx_pickup_recurse() to the FreeBSD kernel?

This is generally a bad idea. We allow it for Giant because it has to
be acquired everywhere and we have to drop it in some extremely ugly
places. To enshrine this behavior for all mutexes strikes me as ill
advised. Adding additional hacks to recursive mutexes seems wrong to
my way of thinking.

Warner
Attilio Rao
2007-04-26 21:50:16 UTC
Permalink
Post by Julian Elischer
The reason that mutexes ever recurse in the first place is usually because one
piece of code calls itself (or a related piece of code) in a blind manner.. in other
words, it doesn't know it is doing so. The whole concept of recursing mutexes is a bit
gross. The concept of blindly unwinding them is I think just asking for trouble.
Further the idea that holding a mutex "except for when we sleep" is a generally
bright idea is also a bit odd to me.. If you hold a mutex and release it
during sleep you probably should invalidate all assumptions you made during the
period before you slept as whatever you were protecting has possibly been raped
while you slept. I have seen too many instances where people just called msleep
and dropped the mutex they held, picked it up again on wakeup, and then blithely
continued on without checking what happened while they were asleep.
Basically, the idea you cannot hold "blocking" locks (mutexes and
rwlocks) while sleeping, cames from the difference there is behind
turnstiles and sleepqueues.

Turnstiles are thought to serve syncronous events, for short period of
time (or rather short) while sleepqueues are thought to serve
asyncronous events, so that the path to be protected can be
definitively bigger. If you fit in the situation you have to call
first a blocking lock and later a sleeping lock, probabilly you are
just using a wrong locking strategy and you should really revisit it.

As you mention, it is not always possible to drop the blocking lock
before to sleep since you can break your critical path and free the
way for races of various genre. Even unlocking Giant, that is
auto-magically done by sleeping primitives, can lead to very difficult
to discover races (I can remind one in tty code, old of some months,
that can be a good proof-of-concept for that).

Attilio
--
Peace can only be achieved by understanding - A. Einstein
Hans Petter Selasky
2007-04-27 05:53:05 UTC
Permalink
Post by Julian Elischer
Post by Hans Petter Selasky
Hi,
u_int32_t
mtx_drop_recurse(struct mtx *mtx)
{
u_int32_t recurse_level = mtx->mtx_recurse;
u_int32_t recurse_curr = recurse_level;
mtx_assert(mtx, MA_OWNED);
while(recurse_curr--) {
mtx_unlock(mtx);
}
return recurse_level;
}
The reason that mutexes ever recurse in the first place is usually because
one piece of code calls itself (or a related piece of code) in a blind
manner.. in other words, it doesn't know it is doing so. The whole concept
of recursing mutexes is a bit gross. The concept of blindly unwinding them
is I think just asking for trouble.
Further the idea that holding a mutex "except for when we sleep" is a
generally bright idea is also a bit odd to me.. If you hold a mutex and
release it during sleep you probably should invalidate all assumptions you
made during the period before you slept as whatever you were protecting has
possibly been raped while you slept. I have seen too many instances where
people just called msleep and dropped the mutex they held, picked it up
again on wakeup, and then blithely continued on without checking what
happened while they were asleep.
It seems to me that if someone else held that lock for some purpose when
you slept, then you don't know what it was that they were trying to
protect, so you don't know what to recheck when you wake up.. I think this
is extremely dangerous as you don't know when you are in this situation..
Personally I think Matt Dillon had a good idea when he suggested that the
need to sleep when holding a lock should require the lock holder to back
out as far as needed as to be able to see why the lock was held and to
comfortably cope with the situaution when the protected structures got
frobbed while it slept. (He actually at one time committed some primitives
to do this. I forget their names and htey were never used, but the idea has
some merrit.)
This change may be OK in the situations you plan but it makes me very
uncomfortable. it basically should have a big sign on it saying "You could
spend years looking for the wierd bugs you are likely to get if you use
this" on it.
Yes, but could we factor this code out into a msleep_recurse() function then?
And the mtx_pickup/mtx_drop functions I would rather see in "kern_synch.c",
because it is so much faster to change the recurse level, than it is to do a
while loop.

--HPS
Hans Petter Selasky
2007-04-27 05:48:49 UTC
Permalink
Post by Attilio Rao
Post by Julian Elischer
The reason that mutexes ever recurse in the first place is usually
because one piece of code calls itself (or a related piece of code) in a
blind manner.. in other words, it doesn't know it is doing so. The whole
concept of recursing mutexes is a bit gross. The concept of blindly
unwinding them is I think just asking for trouble.
Further the idea that holding a mutex "except for when we sleep" is a
generally bright idea is also a bit odd to me.. If you hold a mutex and
release it during sleep you probably should invalidate all assumptions
you made during the period before you slept as whatever you were
That is not always correct. If you run your code in a separate
thread/taskqueue, then you simply wait for this thread/taskqueue to complete
somewhere else. This is basically when I need to exit mutexes.
Post by Attilio Rao
Post by Julian Elischer
protecting has possibly been raped while you slept. I have seen too many
instances where people just called msleep and dropped the mutex they
held, picked it up again on wakeup, and then blithely continued on
without checking what happened while they were asleep.
Basically, the idea you cannot hold "blocking" locks (mutexes and
rwlocks) while sleeping, cames from the difference there is behind
turnstiles and sleepqueues.
Turnstiles are thought to serve syncronous events, for short period of
time (or rather short) while sleepqueues are thought to serve
asyncronous events, so that the path to be protected can be
definitively bigger. If you fit in the situation you have to call
first a blocking lock and later a sleeping lock, probabilly you are
just using a wrong locking strategy and you should really revisit it.
The suggestion is just for convenience. Usually you don't have a recursed
mutex to sleep on. It is just to catch some rare cases where you will end up
with a doubly locked mutex, which is not part of the ordinary code path. I
don't have such cases in the kernel of the new USB stack, but there are some
cases in the USB device drivers, which is due to some mutex locking moves.
Those I can fix.

My idea was that by allowing recursive mutexes to sleep, you will end up with
less panics in the end for the unwary code developer. You just protect your
code with mutexes and if they recurse calling synchronous USB functions, you
don't have to care.
Post by Attilio Rao
As you mention, it is not always possible to drop the blocking lock
before to sleep since you can break your critical path and free the
way for races of various genre. Even unlocking Giant, that is
auto-magically done by sleeping primitives, can lead to very difficult
to discover races (I can remind one in tty code, old of some months,
that can be a good proof-of-concept for that).
--HPS
Julian Elischer
2007-04-27 18:09:40 UTC
Permalink
Post by Julian Elischer
I think trying to sleep with a recursed mutex should be an instant panic,
even if the mutex IS marked as being allowed to sleep.
I mean marked as being able to recurse.
Hans Petter Selasky
2007-04-27 05:50:04 UTC
Permalink
Hi,
Could you perhaps describe some of the codepaths in the USB stack that
require this behavior?
There are no requirements for that. It is just for the convenience of unwary
USB device driver developers.

--HPS
John-Mark Gurney
2007-04-27 07:05:21 UTC
Permalink
Post by Hans Petter Selasky
Hi,
Could you perhaps describe some of the codepaths in the USB stack that
require this behavior?
There are no requirements for that. It is just for the convenience of unwary
USB device driver developers.
The reason we have restrictions on such things as prevention of recursion
on mutexes is for the convenience of the unwary developer so they don't
do something stupid, and not realize it till it's causing bugs...
Multithreaded programming is hard, and even w/ the power of witness,
is still hard to get right, and we need to help capable programmers not
make mistakes...

There is always ugen if the driver developer doesn't want to deal w/
making their driver multithread safe...
--
John-Mark Gurney Voice: +1 415 225 5579

"All that I will do, has been done, All that I have, has not."
Attilio Rao
2007-04-27 16:32:47 UTC
Permalink
Post by Hans Petter Selasky
Post by Attilio Rao
Post by Julian Elischer
The reason that mutexes ever recurse in the first place is usually
because one piece of code calls itself (or a related piece of code) in a
blind manner.. in other words, it doesn't know it is doing so. The whole
concept of recursing mutexes is a bit gross. The concept of blindly
unwinding them is I think just asking for trouble.
Further the idea that holding a mutex "except for when we sleep" is a
generally bright idea is also a bit odd to me.. If you hold a mutex and
release it during sleep you probably should invalidate all assumptions
you made during the period before you slept as whatever you were
That is not always correct. If you run your code in a separate
thread/taskqueue, then you simply wait for this thread/taskqueue to complete
somewhere else. This is basically when I need to exit mutexes.
Post by Attilio Rao
Post by Julian Elischer
protecting has possibly been raped while you slept. I have seen too many
instances where people just called msleep and dropped the mutex they
held, picked it up again on wakeup, and then blithely continued on
without checking what happened while they were asleep.
Basically, the idea you cannot hold "blocking" locks (mutexes and
rwlocks) while sleeping, cames from the difference there is behind
turnstiles and sleepqueues.
Turnstiles are thought to serve syncronous events, for short period of
time (or rather short) while sleepqueues are thought to serve
asyncronous events, so that the path to be protected can be
definitively bigger. If you fit in the situation you have to call
first a blocking lock and later a sleeping lock, probabilly you are
just using a wrong locking strategy and you should really revisit it.
The suggestion is just for convenience. Usually you don't have a recursed
mutex to sleep on. It is just to catch some rare cases where you will end up
with a doubly locked mutex, which is not part of the ordinary code path. I
don't have such cases in the kernel of the new USB stack, but there are some
cases in the USB device drivers, which is due to some mutex locking moves.
Those I can fix.
I don't think you got my point.
I can understand this is for convenience and I can understand why you
want it but definitively you should never have a recursed mutex, That's all.
This is why msleep(), condvar() and lockmgr() panics if a recursed mutex
is passed. Definitively, we mustn't cater a misbehaving approach with
extra-hack in the kernel code.

Attilio
Bosko Milekic
2007-04-26 21:52:16 UTC
Permalink
Hi,
Could you perhaps describe some of the codepaths in the USB stack that
require this behavior?
--
Bosko Milekic <***@gmail.com>
http://www.crowdedweb.com/
Daniel Eischen
2007-04-27 13:14:10 UTC
Permalink
Post by Hans Petter Selasky
Post by Julian Elischer
The reason that mutexes ever recurse in the first place is usually
because one piece of code calls itself (or a related piece of code) in a
blind manner.. in other words, it doesn't know it is doing so. The whole
concept of recursing mutexes is a bit gross. The concept of blindly
unwinding them is I think just asking for trouble.
Further the idea that holding a mutex "except for when we sleep" is a
generally bright idea is also a bit odd to me.. If you hold a mutex and
release it during sleep you probably should invalidate all assumptions
you made during the period before you slept as whatever you were
That is not always correct. If you run your code in a separate
thread/taskqueue, then you simply wait for this thread/taskqueue to complete
somewhere else. This is basically when I need to exit mutexes.
I know there are reasons why we have msleep(), but use of msleep()
and recursive mutexes should raise be frowned upon. If you want to
sleep, that's why we have condvar's. And if you are releasing a
synchronization lock in a different thread, then why aren't you
using condvar's wrapped by mutexes() to protect the internal state?

When you hold a mutex, it should be for a very short time. And
I agree with the other comment that all drivers should be multi-thread
safe, so we shouldn't add cruft to allow for non MT-safe drivers.
--
DE
Julian Elischer
2007-04-27 17:41:37 UTC
Permalink
Post by Hans Petter Selasky
Post by Attilio Rao
Post by Julian Elischer
The reason that mutexes ever recurse in the first place is usually
because one piece of code calls itself (or a related piece of code) in a
blind manner.. in other words, it doesn't know it is doing so. The whole
concept of recursing mutexes is a bit gross. The concept of blindly
unwinding them is I think just asking for trouble.
Further the idea that holding a mutex "except for when we sleep" is a
generally bright idea is also a bit odd to me.. If you hold a mutex and
release it during sleep you probably should invalidate all assumptions
you made during the period before you slept as whatever you were
That is not always correct. If you run your code in a separate
thread/taskqueue, then you simply wait for this thread/taskqueue to complete
somewhere else. This is basically when I need to exit mutexes.
then you are probably using the wrong synchronization primitive.
Post by Hans Petter Selasky
Post by Attilio Rao
Post by Julian Elischer
protecting has possibly been raped while you slept. I have seen too many
instances where people just called msleep and dropped the mutex they
held, picked it up again on wakeup, and then blithely continued on
without checking what happened while they were asleep.
Basically, the idea you cannot hold "blocking" locks (mutexes and
rwlocks) while sleeping, cames from the difference there is behind
turnstiles and sleepqueues.
Turnstiles are thought to serve syncronous events, for short period of
time (or rather short) while sleepqueues are thought to serve
asyncronous events, so that the path to be protected can be
definitively bigger. If you fit in the situation you have to call
first a blocking lock and later a sleeping lock, probabilly you are
just using a wrong locking strategy and you should really revisit it.
that's what I think..
if you are trying to make a nail with threading, then you
probably should be using a screw to start with.

You obviously know a lot about USB.
You have a good grask of how that works.
Maybe what you need to do is write a small document on how the USB side
should work, and what needs to be locked, and enter into a cooperative
partnership with some other people who are active in the locking side of
things to produce an optimal locking strategy for your module.
Post by Hans Petter Selasky
The suggestion is just for convenience. Usually you don't have a recursed
mutex to sleep on. It is just to catch some rare cases where you will end up
with a doubly locked mutex, which is not part of the ordinary code path. I
don't have such cases in the kernel of the new USB stack, but there are some
cases in the USB device drivers, which is due to some mutex locking moves.
Those I can fix.
Basically you shouldn't have a recursed mutex FULL STOP. We have a couple
of instances in the kernel where we allow a mutex to recurse, but they had to be
hard fought, and the general rule is "Don't". If you are recursing on
a mutex you need to switch to some other method of doing things.
e.g. reference counts, turnstiles, whatever.. use the mutex to create these
but don't hold the mutex for long enough to need to recurse on it. A mutex should
generally lock, dash-in and work, unlock. We have some cases where that is
not true, but we are trying to get rid of them, not add more.

also look at read-write type locks for use as much as possible,
(man 9 locking) should be a guide to this (if people were to add to this
it would be even better).
Post by Hans Petter Selasky
My idea was that by allowing recursive mutexes to sleep, you will end up with
less panics in the end for the unwary code developer. You just protect your
code with mutexes and if they recurse calling synchronous USB functions, you
don't have to care.
I think trying to sleep with a recursed mutex should be an instant panic,
even if the mutex IS marked as being allowed to sleep.
Post by Hans Petter Selasky
Post by Attilio Rao
As you mention, it is not always possible to drop the blocking lock
before to sleep since you can break your critical path and free the
way for races of various genre. Even unlocking Giant, that is
auto-magically done by sleeping primitives, can lead to very difficult
to discover races (I can remind one in tty code, old of some months,
that can be a good proof-of-concept for that).
--HPS
Matthew Dillon
2007-04-27 17:39:16 UTC
Permalink
The real culprit here is passing held mutexes to unrelated procedures
in the first place because those procedures might have to block, in
order so those procedures can release and reacquire the mutex.
That's just bad coding in my view. The unrelated procedure has no
clue as to what the mutex is or why it is being held and really has no
business messing with it.

What I did was implement spinlocks with VERY restricted capabilities,
far more restricted then the capabilities of your mutexes. Our
spinlocks are meant only to be used to lock up tiny pieces of code
(like for ref counting or structural or flag-changing operations).
Plus the kernel automatically acts as if it were in a critical section
if it takes an interrupt while the current thread is holding a spinlock.
That way mainline code can just use a spinlock to deal with small bits
of interlocked information without it costing much in the way of
overhead.

I made the decision that ANYTHING more complex then that would have to
use a real lock, like a lockmgr lock or a token, depending on the
characteristics desired. To make it even more desireable I also
stripped down the lockmgr() lock implementation, removing numerous
bits that were inherited from very old code methodologies that have no
business being in a modern operating system, like LK_DRAIN. And I
removed the passing of an interlocking spinlock to the lockmgr code,
because that methodology was being massively abused in existing code
(and I do mean massively).

I'm not quite sure what the best way to go is for FreeBSD, because
you guys have made your mutexes just as or even more sophisticated
then your normal locks in many respects, and you have like 50 different
types of locks now (I can't keep track of them all).

If I were to offer advise it would be: Just stop trying to mix water
and hot wax. Stop holding mutexes across potentially blocking procedure
calls. Stop passing mutexes into unrelated bits of code in order for
them to be released and reacquired somewhere deep in that code. Just
doing that will probably solve all of the problems being reported.

-Matt
Julian Elischer
2007-04-27 18:01:27 UTC
Permalink
First of all: Where is FreeBSD's locking strategy document?
It is just started..
man 9 locking. it needs a lot of work still.
We should have a
global strategy when we write device drivers, so that various modules can be
connected together without races and locking order reversals.
In my view there are two categories of mutexes.
1) Global mutexes.
2) Private mutexes.
Rule: The Global mutex is locked last.
errr I'm not sure I'm following but this sounds kind-of reversed from normal behaviour.
How do we organize this.
1a) The global mutex protects interrupts/callbacks into a hardware driver.
This is the lowest level.
2a) Private mutexes protects sub-devices of the hardware driver. This might be
as simple as an IF-QUEUE.
I'm having trouble following already.
you mean subcomponents?
P0 indicates process 0. P1 indicates an optional second process.
P0 lock(2);
P0 lock(1);
P0 unlock(1);
P0 unlock(2);
this looks "interesting".
Can you give a more concrete example of this?
what work is done in the upcall? WHo is upcalling to who?
P1 lock(1);
P1 wakeup P0 // for example
P1 unlock(1);
pretty normal.
P0 lock(2); // the new USB stack currently uses P1 here
P0 unlock(2);
Sure, in the up-call you lock #2 longer than lock #1, that is why lock #1 has
got to be a short as possible. But it does not make sense to make lock #2
lock shorter than lock #1. I cannot see that the system will go faster that
way.
In the downcall I see no problems at all. #1 does its things as fast as
possible. In parallell #2 can still execute, until the "callback".
I hope you understand my semantics.
What do you want to change from what I have explained?
Any comments?
My conclusion: If you want more paralellism, then use more mutexes. Don't try
to push an existing mutex by unlocking it!
that may or may not be true depending on how busy the mutex is..
but I don't think there is an argument about this.


shouldn't this be somewhere other than "hackers"?
--HPS
Hans Petter Selasky
2007-04-27 18:10:14 UTC
Permalink
Post by Matthew Dillon
The real culprit here is passing held mutexes to unrelated procedures
in the first place because those procedures might have to block, in
order so those procedures can release and reacquire the mutex.
That's just bad coding in my view. The unrelated procedure has no
clue as to what the mutex is or why it is being held and really has no
business messing with it.
What I did was implement spinlocks with VERY restricted capabilities,
far more restricted then the capabilities of your mutexes. Our
spinlocks are meant only to be used to lock up tiny pieces of code
(like for ref counting or structural or flag-changing operations).
Plus the kernel automatically acts as if it were in a critical section
if it takes an interrupt while the current thread is holding a
spinlock. That way mainline code can just use a spinlock to deal with small
bits of interlocked information without it costing much in the way of
overhead.
I made the decision that ANYTHING more complex then that would have to
use a real lock, like a lockmgr lock or a token, depending on the
characteristics desired. To make it even more desireable I also
stripped down the lockmgr() lock implementation, removing numerous
bits that were inherited from very old code methodologies that have no
business being in a modern operating system, like LK_DRAIN. And I
removed the passing of an interlocking spinlock to the lockmgr code,
because that methodology was being massively abused in existing code
(and I do mean massively).
I'm not quite sure what the best way to go is for FreeBSD, because
you guys have made your mutexes just as or even more sophisticated
then your normal locks in many respects, and you have like 50 different
types of locks now (I can't keep track of them all).
If I were to offer advise it would be: Just stop trying to mix water
and hot wax. Stop holding mutexes across potentially blocking
procedure calls. Stop passing mutexes into unrelated bits of code in order
for them to be released and reacquired somewhere deep in that code. Just
doing that will probably solve all of the problems being reported.
Maybe one should put some characters into blocking and non-blocking function
names so that the compiler can check if non-blocking functions call blocking
functions.

Doing everything blocking will make the system work, but suddently it can
dead-lock, when two threads are waiting for eachother, and you will have zero
clue about it.

I like the current FreeBSD. It's a brain exercise and it clearly shows that
MPSAFE/Giant-free is quality mark, and not something you acchieve by adding
some lock/unlock statements here and there. Sometimes it means you have to
completely rewrite code. And that is the most difficult part to convince
other developers about. Been there done that :-)

--HPS
Hans Petter Selasky
2007-04-27 17:17:29 UTC
Permalink
Post by Daniel Eischen
Post by Hans Petter Selasky
Post by Julian Elischer
The reason that mutexes ever recurse in the first place is usually
because one piece of code calls itself (or a related piece of code) in
a blind manner.. in other words, it doesn't know it is doing so. The
whole concept of recursing mutexes is a bit gross. The concept of
blindly unwinding them is I think just asking for trouble.
Further the idea that holding a mutex "except for when we sleep" is a
generally bright idea is also a bit odd to me.. If you hold a mutex and
release it during sleep you probably should invalidate all assumptions
you made during the period before you slept as whatever you were
That is not always correct. If you run your code in a separate
thread/taskqueue, then you simply wait for this thread/taskqueue to
complete somewhere else. This is basically when I need to exit mutexes.
I know there are reasons why we have msleep(), but use of msleep()
and recursive mutexes should raise be frowned upon. If you want to
sleep, that's why we have condvar's. And if you are releasing a
synchronization lock in a different thread, then why aren't you
using condvar's wrapped by mutexes() to protect the internal state?
When I find time, I will convert my new USB stack into using condvar's, but
they will do the same as msleep(), and that is to exit the passed mutex.
Post by Daniel Eischen
When you hold a mutex, it should be for a very short time. And
I agree with the other comment that all drivers should be multi-thread
safe, so we shouldn't add cruft to allow for non MT-safe drivers.
Yes, and no.

Mutexes are used to get the CPU out of the code. Therefore you should not
lock/unlock all the time, to ensure that the locked time is as short as
possible. Because then you get double work checking the state after that you
lock a mutex again. Surely, in a "static" environment there is nothing to
check. But in a dynamic environment you need to check that "descriptors" of
all kinds are still present, after that you lock a mutex. Unlocking a mutex
allows "anything" to happen. Keeping a mutex locked prevents certain things
from happening.

First of all: Where is FreeBSD's locking strategy document? We should have a
global strategy when we write device drivers, so that various modules can be
connected together without races and locking order reversals.

In my view there are two categories of mutexes.

1) Global mutexes.

2) Private mutexes.

Rule: The Global mutex is locked last.

How do we organize this.

1a) The global mutex protects interrupts/callbacks into a hardware driver.
This is the lowest level.

2a) Private mutexes protects sub-devices of the hardware driver. This might be
as simple as an IF-QUEUE.

I have chosen the following model:

P0 indicates process 0. P1 indicates an optional second process.

Up-call:

P0 lock(2);
P0 lock(1);
P0 unlock(1);
P0 unlock(2);

Down-call:

P1 lock(1);
P1 wakeup P0 // for example
P1 unlock(1);

P0 //callback:
P0 lock(2); // the new USB stack currently uses P1 here
P0 unlock(2);

Sure, in the up-call you lock #2 longer than lock #1, that is why lock #1 has
got to be a short as possible. But it does not make sense to make lock #2
lock shorter than lock #1. I cannot see that the system will go faster that
way.

In the downcall I see no problems at all. #1 does its things as fast as
possible. In parallell #2 can still execute, until the "callback".

I hope you understand my semantics.

What do you want to change from what I have explained?

Any comments?

My conclusion: If you want more paralellism, then use more mutexes. Don't try
to push an existing mutex by unlocking it!

--HPS
Daniel Eischen
2007-04-27 18:36:21 UTC
Permalink
Post by Hans Petter Selasky
Post by Daniel Eischen
When you hold a mutex, it should be for a very short time. And
I agree with the other comment that all drivers should be multi-thread
safe, so we shouldn't add cruft to allow for non MT-safe drivers.
Yes, and no.
Mutexes are used to get the CPU out of the code. Therefore you should not
lock/unlock all the time, to ensure that the locked time is as short as
possible. Because then you get double work checking the state after that you
lock a mutex again. Surely, in a "static" environment there is nothing to
check. But in a dynamic environment you need to check that "descriptors" of
all kinds are still present, after that you lock a mutex. Unlocking a mutex
allows "anything" to happen. Keeping a mutex locked prevents certain things
from happening.
If you need to prevent "things" from happening, and it is at more
of a macro level than micro level, then you probably want a condvar
or barrier sort of synchroninzation, or possibly a rwlock. When
the thread currently in the guts of your driver exits, he releases
the CV or rwlock and allows another thread to enter (which possibly
causes another "anything" to happen).
--
DE
John Baldwin
2007-04-27 15:49:41 UTC
Permalink
Post by Hans Petter Selasky
Are there any comments on integrating this functionality into msleep(), and
adding mtx_drop_recurse() and mtx_pickup_recurse() to the FreeBSD kernel?
Nope. Fix the code to not recurse instead, or to know it has a recursed mutex
and make sure it doesn't call mtx_sleep() or cv_wait() with a recursed lock.
It's not that hard to do. The rest of the kernel manages that restriction
fine.
--
John Baldwin
Hans Petter Selasky
2007-04-27 18:32:20 UTC
Permalink
Post by Julian Elischer
First of all: Where is FreeBSD's locking strategy document?
It is just started..
man 9 locking. it needs a lot of work still.
Excellent.
Post by Julian Elischer
We should have a
global strategy when we write device drivers, so that various modules can
be connected together without races and locking order reversals.
In my view there are two categories of mutexes.
1) Global mutexes.
2) Private mutexes.
Rule: The Global mutex is locked last.
errr I'm not sure I'm following but this sounds kind-of reversed from
normal behaviour.
Yes, it is. But you have to make a choice. Either this way or the other way.
But not both! The reason for my choice is that I consider it to be more
complicated to be a Global device than a private device. Complicated means
that the Global device, which is locked last, has to unlock its lock before
it calls back the "private" device, like I have explained below.

Hence there are fewer Global devices (e.g. USB host controllers), then private
devices (USB device drivers), we end up with less code/headache locking the
Global devices last. If that is unclear I can explain more.
Post by Julian Elischer
How do we organize this.
1a) The global mutex protects interrupts/callbacks into a hardware
driver. This is the lowest level.
2a) Private mutexes protects sub-devices of the hardware driver. This
might be as simple as an IF-QUEUE.
I'm having trouble following already.
you mean subcomponents?
Yes, children of devices. Sub-devices.
Post by Julian Elischer
P0 indicates process 0. P1 indicates an optional second process.
P0 lock(2);
P0 lock(1);
The work done [here] might be to setup DMA-able memory, and link it into the
hardware.
Post by Julian Elischer
P0 unlock(1);
P0 unlock(2);
this looks "interesting".
Can you give a more concrete example of this?
what work is done in the upcall? WHo is upcalling to who?
For example an USB device driver might be up-calling to the USB host
controller driver. Down call is when the transfer finishes.
Post by Julian Elischer
P1 lock(1);
P1 wakeup P0 // for example
P1 unlock(1);
pretty normal.
P0 lock(2); // the new USB stack currently uses P1 here
P0 unlock(2);
Sure, in the up-call you lock #2 longer than lock #1, that is why lock #1
has got to be a short as possible. But it does not make sense to make
lock #2 lock shorter than lock #1. I cannot see that the system will go
faster that way.
In the downcall I see no problems at all. #1 does its things as fast as
possible. In parallell #2 can still execute, until the "callback".
I hope you understand my semantics.
What do you want to change from what I have explained?
Any comments?
My conclusion: If you want more paralellism, then use more mutexes. Don't
try to push an existing mutex by unlocking it!
that may or may not be true depending on how busy the mutex is..
but I don't think there is an argument about this.
shouldn't this be somewhere other than "hackers"?
I've CC'ed freebsd-arch.

--HPS
Sergey Babkin
2007-04-27 22:09:18 UTC
Permalink
Post by Julian Elischer
Basically you shouldn't have a recursed mutex FULL STOP. We have a couple
of instances in the kernel where we allow a mutex to recurse, but they had to be
hard fought, and the general rule is "Don't". If you are recursing on
a mutex you need to switch to some other method of doing things.
e.g. reference counts, turnstiles, whatever.. use the mutex to create these
One typical problem is when someone holds a mutex
and needs to call a function that also tried to get the mutex.
The typical solution for it is to provide two versions of
this function, one expecting the mutex being already held
by the caller, the other being a wrapper that grabs the mutex and
then calls the actual worker function.

-SB
Attilio Rao
2007-05-02 17:11:21 UTC
Permalink
Post by Sergey Babkin
Post by Julian Elischer
Basically you shouldn't have a recursed mutex FULL STOP. We have a couple
of instances in the kernel where we allow a mutex to recurse, but they had to be
hard fought, and the general rule is "Don't". If you are recursing on
a mutex you need to switch to some other method of doing things.
e.g. reference counts, turnstiles, whatever.. use the mutex to create these
One typical problem is when someone holds a mutex
and needs to call a function that also tried to get the mutex.
The typical solution for it is to provide two versions of
this function, one expecting the mutex being already held
by the caller, the other being a wrapper that grabs the mutex and
then calls the actual worker function.
If that happens rather frequently, the better thing to do is assuming
that the lock is alredy held in the function (adding an assertion for
it) than acquiring/releasing before/after the function.

Attilio
Robert Watson
2007-04-28 09:52:16 UTC
Permalink
Post by Julian Elischer
Further the idea that holding a mutex "except for when we sleep" is a
generally bright idea is also a bit odd to me.. If you hold a mutex and
release it during sleep you probably should invalidate all assumptions you
made during the period before you slept as whatever you were protecting has
possibly been raped while you slept. I have seen too many instances where
people just called msleep and dropped the mutex they held, picked it up
again on wakeup, and then blithely continued on without checking what
happened while they were asleep.
And interesting observation here is that FreeBSD 4.x and earlier were actually
rife with exactly this sort of race condition, exercised only when under
kernel memory pressure because sleeping occurred only then. The explicit
locking model we use now makes these races larger due increased concurrency
(preemption, parallelism, etc), but also makes our assertion model stronger.

Robert N M Watson
Computer Laboratory
University of Cambridge
Robert Watson
2007-04-28 09:57:34 UTC
Permalink
Basically you shouldn't have a recursed mutex FULL STOP. We have a couple of
instances in the kernel where we allow a mutex to recurse, but they had to
be hard fought, and the general rule is "Don't". If you are recursing on a
mutex you need to switch to some other method of doing things. e.g.
reference counts, turnstiles, whatever.. use the mutex to create these but
don't hold the mutex for long enough to need to recurse on it. A mutex
should generally lock, dash-in and work, unlock. We have some cases where
that is not true, but we are trying to get rid of them, not add more.
Most of these instances have to do with legacy code and data structures that
involve high levels of code recursion and reentrance. This is frequently an
unreliable way to organize code anyway, and often involves other bugs that are
less visible. Over time, it's my hope that we can eliminate quite a few
sources of remaining lock recursion, but there are some tricky cases involving
repeated callbacks between layers that make that harder. For example, in the
socket/network pcb relationship, there's a lack of clarity on which side
drives the overlapping state machines present in both sets of data structures.
Over time, we're migrating towards a model in which the socket infrastructure
is more of a "library" in service to network protocols that will drive the
actual transitions, but in the mean time, lock recursion is required.

For any significantly rewritten or new code, I would expect that recursion
would be avoided in almost all cases.

Robert N M Watson
Computer Laboratory
University of Cambridge
John Baldwin
2007-04-30 19:41:22 UTC
Permalink
Post by Hans Petter Selasky
Post by Julian Elischer
P0 unlock(1);
P0 unlock(2);
this looks "interesting".
Can you give a more concrete example of this?
what work is done in the upcall? WHo is upcalling to who?
For example an USB device driver might be up-calling to the USB host
controller driver. Down call is when the transfer finishes.
I think in this case you don't want to keep the periph locked while you ask
the controller to process requests. Instead, the periph drivers should queue
requests to the controller and receive replies, but they should be considered
as two independent objects. For example, network drivers drop their lock
when passing a packet (request) up the stack.
--
John Baldwin
Attilio Rao
2007-05-02 16:41:22 UTC
Permalink
First of all: Where is FreeBSD's locking strategy document?
It is just started.. man 9 locking. it needs a lot of work still.
I'm working with rwatson@ about a document that can nicely fit in
locking(9), but we are a little bit stuck in terminology.
In order to not confuse even more locking consumers we are trying to
find right and unique terms for kernel events linked to locks.


Attilio
Julian Elischer
2007-05-02 16:53:01 UTC
Permalink
Post by Attilio Rao
First of all: Where is FreeBSD's locking strategy document?
It is just started.. man 9 locking. it needs a lot of work still.
locking(9), but we are a little bit stuck in terminology.
In order to not confuse even more locking consumers we are trying to
find right and unique terms for kernel events linked to locks.
Attilio
send me what you have :-)
I'll look at it too.

Loading...