OpenBSD CVS

CVS log for src/sys/kern/sched_bsd.c


[BACK] Up to [local] / src / sys / kern

Request diff between arbitrary revisions


Default branch: MAIN


Revision 1.92 / (download) - annotate - [select for diffs], Wed May 29 18:55:45 2024 UTC (3 days, 22 hours ago) by claudio
Branch: MAIN
CVS Tags: HEAD
Changes since 1.91: +11 -17 lines
Diff to previous 1.91 (colored)

Convert SCHED_LOCK from a recursive kernel lock to a mutex.

Over the last weeks the last SCHED_LOCK recursion was removed so this
is now possible and will allow to split up the SCHED_LOCK in a upcoming
step.

Instead of implementing an MP and SP version of SCHED_LOCK this just
always uses the mutex implementation.
While this makes the local s argument unused (the spl is now tracked by
the mutex itself) it is still there to keep this diff minimal.

Tested by many.
OK jca@ mpi@

Revision 1.91 / (download) - annotate - [select for diffs], Sat Mar 30 13:33:20 2024 UTC (2 months ago) by mpi
Branch: MAIN
Changes since 1.90: +1 -3 lines
Diff to previous 1.90 (colored)

Prevent a recursion inside wakeup(9) when scheduler tracepoints are enabled.

Tracepoints like "sched:enqueue" and "sched:unsleep" were called from inside
the loop iterating over sleeping threads as part of wakeup_proc().  When such
tracepoints were enabled they could result in another wakeup(9) possibly
corrupting the sleepqueue.

Rewrite wakeup(9) in two stages, first dequeue threads from the sleepqueue then
call setrunnable() and possible tracepoints for each of them.

This requires moving unsleep() outside of setrunnable() because it messes with
the sleepqueue.

ok claudio@

Revision 1.90 / (download) - annotate - [select for diffs], Wed Jan 24 19:23:38 2024 UTC (4 months, 1 week ago) by cheloha
Branch: MAIN
CVS Tags: OPENBSD_7_5_BASE, OPENBSD_7_5
Changes since 1.89: +5 -5 lines
Diff to previous 1.89 (colored)

clockintr: switch from callee- to caller-allocated clockintr structs

Currently, clockintr_establish() calls malloc(9) to allocate a
clockintr struct on behalf of the caller.  mpi@ says this behavior is
incompatible with dt(4).  In particular, calling malloc(9) during the
initialization of a PCB outside of dt_pcb_alloc() is (a) awkward and
(b) may conflict with future changes/optimizations to PCB allocation.

To side-step the problem, this patch changes the clockintr subsystem
to use caller-allocated clockintr structs instead of callee-allocated
structs.

clockintr_establish() is named after softintr_establish(), which uses
malloc(9) internally to create softintr objects.  The clockintr subsystem
is no longer using malloc(9), so the "establish" naming is no longer apt.
To avoid confusion, this patch also renames "clockintr_establish" to
"clockintr_bind".

Requested by mpi@.  Tweaked by mpi@.

Thread: https://marc.info/?l=openbsd-tech&m=170597126103504&w=2

ok claudio@ mlarkin@ mpi@

Revision 1.89 / (download) - annotate - [select for diffs], Tue Oct 17 00:04:02 2023 UTC (7 months, 2 weeks ago) by cheloha
Branch: MAIN
Changes since 1.88: +3 -3 lines
Diff to previous 1.88 (colored)

clockintr: move callback-specific API behaviors to "clockrequest" namespace

The API's behavior when invoked from a callback function is impossible
to document.  Move the special behavior into a distinct namespace,
"clockrequest".

- Add a 'struct clockrequest'.  Basically a stripped-down 'struct clockintr'
  for exclusive use during clockintr_dispatch().
- In clockintr_queue, replace the "cq_shadow" clockintr with a "cq_request"
  clockrequest.  They serve the same purpose.
- CLST_SHADOW_PENDING -> CR_RESCHEDULE; different namespace, same meaning.
- CLST_IGNORE_SHADOW -> CLST_IGNORE_REQUEST; same meaning.
- Move shadow branch in clockintr_advance() to clockrequest_advance().
- clockintr_request_random() becomes clockrequest_advance_random().
- Delete dead shadow branches in clockintr_cancel(), clockintr_schedule().
- Callback functions now get a clockrequest pointer instead of a special
  clockintr pointer: update all prototypes, callers.

No functional change intended.

Revision 1.88 / (download) - annotate - [select for diffs], Wed Oct 11 15:42:44 2023 UTC (7 months, 3 weeks ago) by cheloha
Branch: MAIN
Changes since 1.87: +2 -2 lines
Diff to previous 1.87 (colored)

kernel: expand fixed clock interrupt periods to 64-bit values

Technically, all the current fixed clock interrupt periods fit within
an unsigned 32-bit value.  But 32-bit multiplication is an accident
waiting to happen.  So, expand the fixed periods for hardclock,
statclock, profclock, and roundrobin to 64-bit values.

One exception: statclock_mask remains 32-bit because random(9) yields
32-bit values.  Update the initclocks() comment to make it clear that
this is not an accident.

Revision 1.87 / (download) - annotate - [select for diffs], Sun Sep 17 13:02:24 2023 UTC (8 months, 2 weeks ago) by cheloha
Branch: MAIN
CVS Tags: OPENBSD_7_4_BASE, OPENBSD_7_4
Changes since 1.86: +12 -21 lines
Diff to previous 1.86 (colored)

scheduler_start: move static timeout structs into callback functions

Move the schedcpu() and update_loadavg() timeout structs from
scheduler_start() into their respective callback functions and
statically initialize them with TIMEOUT_INITIALIZER(9).

The structs are already hidden from the global namespace and the
timeouts are already self-managing, so we may as well fully
consolidate things.

Thread: https://marc.info/?l=openbsd-tech&m=169488184019047&w=2

"Sure." claudio@

Revision 1.86 / (download) - annotate - [select for diffs], Sun Sep 10 03:08:05 2023 UTC (8 months, 3 weeks ago) by cheloha
Branch: MAIN
Changes since 1.85: +2 -2 lines
Diff to previous 1.85 (colored)

clockintr: support an arbitrary callback function argument

Callers can now provide an argument pointer to clockintr_establish().
The pointer is kept in a new struct clockintr member, cl_arg.  The
pointer is passed as the third parameter to clockintr.cl_func when it
is executed during clockintr_dispatch().  Like the callback function,
the callback argument is immutable after the clockintr is established.

At present, nothing uses this.  All current clockintr_establish()
callers pass a NULL arg pointer.  However, I am confident that dt(4)'s
profile provider will need this in the near future.

Requested by dlg@ back in March.

Revision 1.85 / (download) - annotate - [select for diffs], Wed Aug 30 09:02:38 2023 UTC (9 months ago) by claudio
Branch: MAIN
Changes since 1.84: +2 -2 lines
Diff to previous 1.84 (colored)

Preempt a running proc even if there is no other process/thread queued
on that CPU's runqueue. This way mi_switch() is invoked which is necessary
to a) signal srm that the cpu changed context b) runtime stats are updated
c) requests to stop the CPU are checked.
This should fix the issue reported by Eric Wong (e at 80x24 org) that
RLIMIT_CPU is unreliable on idle systems.
OK kettenis@ cheloha@

Revision 1.84 / (download) - annotate - [select for diffs], Tue Aug 29 16:19:34 2023 UTC (9 months ago) by claudio
Branch: MAIN
Changes since 1.83: +3 -3 lines
Diff to previous 1.83 (colored)

Remove p_rtime from struct proc and replace it by passing the timespec
as argument to the tuagg_locked function.

- Remove incorrect use of p_rtime in other parts of the tree. p_rtime was
almost always 0 so including it in any sum did not alter the result.
- In main() the update of time can be further simplified since at that time
only the primary cpu is running.
- Add missing nanouptime() call in cpu_hatch() for hppa
- Rename tuagg_unlocked to tuagg_locked like it is done in the rest of
  the tree.

OK cheloha@ dlg@

Revision 1.83 / (download) - annotate - [select for diffs], Sat Aug 19 11:14:11 2023 UTC (9 months, 2 weeks ago) by claudio
Branch: MAIN
Changes since 1.82: +9 -12 lines
Diff to previous 1.82 (colored)

Refetch the spc pointer after cpu_switchto() since the value is stale
after the proc switch. With the value refetched the rest of the code
can be simplified.
Input guenther@, OK cheloha@, miod@

Revision 1.82 / (download) - annotate - [select for diffs], Fri Aug 18 09:18:52 2023 UTC (9 months, 2 weeks ago) by claudio
Branch: MAIN
Changes since 1.81: +48 -2 lines
Diff to previous 1.81 (colored)

Move the loadavg calculation to sched_bsd.c as update_loadav()

With this uvm_meter() is no more and update_loadav() uses a simple timeout
instead of getting called via schedcpu().

OK deraadt@ mpi@ cheloha@

Revision 1.81 / (download) - annotate - [select for diffs], Mon Aug 14 08:33:24 2023 UTC (9 months, 2 weeks ago) by mpi
Branch: MAIN
Changes since 1.80: +5 -2 lines
Diff to previous 1.80 (colored)

Extend scheduler tracepoints to follow CPU jumping.

- Add two new tracpoints sched:fork & sched:steal
- Include selected CPU number in sched:wakeup
- Add sched:unsleep corresponding to sched:sleep which matches add/removal
of threads on the sleep queue

ok claudio@

Revision 1.80 / (download) - annotate - [select for diffs], Fri Aug 11 22:02:50 2023 UTC (9 months, 3 weeks ago) by cheloha
Branch: MAIN
Changes since 1.79: +8 -9 lines
Diff to previous 1.79 (colored)

hardclock(9), roundrobin: make roundrobin() an independent clock interrupt

- Remove the roundrobin() call from hardclock(9).

- Revise roundrobin() to make it a valid clock interrupt callback.
  It is still periodic and it still runs at one tenth of the hardclock
  frequency.

- Account for multiple expirations in roundrobin(): if two or more
  roundrobin periods have elapsed, set SPCF_SHOULDYIELD on the running
  thread immediately to simulate normal behavior.

- Each schedstate_percpu has its own roundrobin() handle, spc_roundrobin.
  spc_roundrobin is started/advanced during clockintr_cpu_init().
  Intervals elapsed across suspend/resume are discarded.

- rrticks_init and schedstate_percpu.spc_rrticks are now useless:
  delete them.

Tweaked by mpi@.  With input from mpi@ and claudio@.

Thread: https://marc.info/?l=openbsd-tech&m=169127381314651&w=2

ok mpi@ claudio@

Revision 1.79 / (download) - annotate - [select for diffs], Sat Aug 5 20:07:55 2023 UTC (9 months, 4 weeks ago) by cheloha
Branch: MAIN
Changes since 1.78: +13 -3 lines
Diff to previous 1.78 (colored)

hardclock(9): move setitimer(2) code into itimer_update()

- Move the setitimer(2) code responsible for updating the ITIMER_VIRTUAL
  and ITIMER_PROF timers from hardclock(9) into a new clock interrupt
  routine, itimer_update().  itimer_update() is periodic and runs at the
  same frequency as the hardclock.

  + Revise itimerdecr() to run within itimer_mtx instead of entering
    and leaving it.

- Each schedstate_percpu has its own itimer_update() handle, spc_itimer.
  A new scheduler flag, SPCF_ITIMER, indicates whether spc_itimer was
  started during the last mi_switch() and needs to be stopped during the
  next mi_switch() or sched_exit().

- A new per-process flag, PS_ITIMER, indicates whether ITIMER_VIRTUAL
  and/or ITIMER_PROF are running.  Checking the flag is easier than
  entering itimer_mtx to check process.ps_timer[].  The flag is set
  and cleared in a new helper function, process_reset_itimer_flag().

- In setitimer(), call need_resched() when the state of ITIMER_VIRTUAL
  or ITIMER_PROF is changed to force an mi_switch() and update
  spc_itimer.

claudio@ notes that ITIMER_PROF could be implemented as a high-res
timer using the thread's execution time as a guide for when to
interrupt the process and assert SIGPROF.  This would probably work
really well in single-threaded processes.  ITIMER_VIRTUAL would be
more difficult to make high-res, though, as you need to exclude time
spent in the kernel.

Tested on powerpc64 by gkoehler@.  With input from claudio@.

Thread: https://marc.info/?l=openbsd-tech&m=169038818517101&w=2

ok claudio@

Revision 1.78 / (download) - annotate - [select for diffs], Tue Jul 25 18:16:19 2023 UTC (10 months, 1 week ago) by cheloha
Branch: MAIN
Changes since 1.77: +16 -1 lines
Diff to previous 1.77 (colored)

statclock: move profil(2), GPROF code to profclock(), gmonclock()

This patch isolates profil(2) and GPROF from statclock().  Currently,
statclock() implements both profil(2) and GPROF through a complex
mechanism involving both platform code (setstatclockrate) and the
scheduler (pscnt, psdiv, and psratio).  We have a machine-independent
interface to the clock interrupt hardware now, so we no longer need to
do it this way.

- Move profil(2)-specific code from statclock() to a new clock
  interrupt callback, profclock(), in subr_prof.c.  Each
  schedstate_percpu has its own profclock handle.  The profclock is
  enabled/disabled for a given CPU when it is needed by the running
  thread during mi_switch() and sched_exit().

- Move GPROF-specific code from statclock() to a new clock interrupt
  callback, gmonclock(), in subr_prof.c.  Where available, each cpu_info
  has its own gmonclock handle .  The gmonclock is enabled/disabled for
  a given CPU via sysctl(2) in prof_state_toggle().

- Both profclock() and gmonclock() have a fixed period, profclock_period,
  that is initialized during initclocks().

- Export clockintr_advance(), clockintr_cancel(), clockintr_establish(),
  and clockintr_stagger() via <sys/clockintr.h>.  They have external
  callers now.

- Delete pscnt, psdiv, psratio.  From schedstate_percpu, also delete
  spc_pscnt and spc_psdiv.  The statclock frequency is not dynamic
  anymore so these variables are now useless.

- Delete code/state related to the dynamic statclock frequency from
  kern_clockintr.c.  The statclock frequency can still be pseudo-random,
  so move the contents of clockintr_statvar_init() into clockintr_init().

With input from miod@, deraadt@, and claudio@.  Early revisions
cleaned up by claudio.  Early revisions tested by claudio@.  Tested by
cheloha@ on amd64, arm64, macppc, octeon, and sparc64 (sun4v).
Compile- and boot- tested on i386 by mlarkin@.  riscv64 compilation
bugs found by mlarkin@.  Tested on riscv64 by jca@.  Tested on
powerpc64 by gkoehler@.

Revision 1.77 / (download) - annotate - [select for diffs], Tue Jul 11 07:02:43 2023 UTC (10 months, 3 weeks ago) by claudio
Branch: MAIN
Changes since 1.76: +5 -1 lines
Diff to previous 1.76 (colored)

Rework sleep_setup()/sleep_finish() to no longer hold the scheduler lock
between calls.

Instead of forcing an atomic operation across multiple calls use a three
step transaction.
1. setup sleep state by calling sleep_setup()
2. recheck sleep condition to ensure that the event did not fire before
   sleep_setup() registered the proc onto the sleep queue
3. call sleep_finish() to either sleep or keep on running based on the
   step 2 outcome and any possible signal delivery

To make this work wakeup from signals, single thread api and wakeup(9) need
to be aware if a process is between step 1 and step 3 so that the process
is not enqueued back onto the runqueue while going to sleep. Introduce
the p_flag P_WSLEEP to detect this situation.

On top of this remove the spl dance in msleep() which is no longer required.
It is ok to process interrupts between step 1 and 3.

OK mpi@ cheloha@

Revision 1.76 / (download) - annotate - [select for diffs], Wed Jun 21 21:16:21 2023 UTC (11 months, 1 week ago) by cheloha
Branch: MAIN
Changes since 1.75: +2 -2 lines
Diff to previous 1.75 (colored)

Revert "schedcpu, uvm_meter(9): make uvm_meter() an independent timeout"

Sometimes causes boot hang after mounting root partition.

Thread 1: https://marc.info/?l=openbsd-misc&m=168736497407357&w=2
Thread 2: https://marc.info/?l=openbsd-misc&m=168737429214370&w=2

Revision 1.75 / (download) - annotate - [select for diffs], Tue Jun 20 16:30:30 2023 UTC (11 months, 1 week ago) by cheloha
Branch: MAIN
Changes since 1.74: +2 -2 lines
Diff to previous 1.74 (colored)

schedcpu, uvm_meter(9): make uvm_meter() an independent timeout

uvm_meter(9) should not base its periodic uvm_loadav() call on the UTC
clock.  It also no longer needs to periodically wake up proc0 because
proc0 doesn't do any work.  schedcpu() itself may change or go away,
but as kettenis@ notes we probably can't completely remove the concept
of a "load average" from OpenBSD, given its long Unix heritage.

So, (1) remove the uvm_meter() call from schedcpu(), (2) make
uvm_meter() an independent timeout started alongside schedcpu() during
scheduler_start(), and (3) delete the vestigial periodic proc0 wakeup.

With input from deraadt@, kettenis@, and claudio@.  deraadt@ cautions
that this change may confuse administrators who hold the load average
in high regard.

Thread: https://marc.info/?l=openbsd-tech&m=168710929409153&w=2

general agreement with this direction from kettenis@
ok claudio@

Revision 1.74 / (download) - annotate - [select for diffs], Sat Feb 4 19:33:03 2023 UTC (15 months, 3 weeks ago) by cheloha
Branch: MAIN
CVS Tags: OPENBSD_7_3_BASE, OPENBSD_7_3
Changes since 1.73: +4 -14 lines
Diff to previous 1.73 (colored)

kernel: stathz is always non-zero after cpu_initclocks()

Now that the clockintr switch is complete, cpu_initclocks() always
initializes stathz to a non-zero value.  We don't call statclock()
from hardclock(9) anymore and, more broadly, we don't need to test
whether stathz is non-zero before using it.

With input from kettenis@.

Link: https://marc.info/?l=openbsd-tech&m=167434223309668&w=2

ok kettenis@ miod@

Revision 1.73 / (download) - annotate - [select for diffs], Mon Dec 5 23:18:37 2022 UTC (17 months, 3 weeks ago) by deraadt
Branch: MAIN
Changes since 1.72: +2 -2 lines
Diff to previous 1.72 (colored)

zap a pile of dangling tabs

Revision 1.72 / (download) - annotate - [select for diffs], Sun Aug 14 01:58:27 2022 UTC (21 months, 2 weeks ago) by jsg
Branch: MAIN
CVS Tags: OPENBSD_7_2_BASE, OPENBSD_7_2
Changes since 1.71: +1 -2 lines
Diff to previous 1.71 (colored)

remove unneeded includes in sys/kern
ok mpi@ miod@

Revision 1.71 / (download) - annotate - [select for diffs], Tue May 10 22:18:06 2022 UTC (2 years ago) by solene
Branch: MAIN
Changes since 1.70: +3 -3 lines
Diff to previous 1.70 (colored)

make the CPU frequency scaling duration relative to the load

in the pre-change behavior, if the CPU frequency is raised, it will stay up
for 5 cycles minimum (with one cycle being run every 100ms).
With this change, the time to keep the frequency raised is incremented at
each cycle up to 5. This mean short load need triggering the frequency
increase will last less than the current minimum of 500ms.

this only affect the automatic mode when on battery, extending the battery
life for most interactive use scenarios and idling loads.

tested by many with good results
ok ketennis@

Revision 1.70 / (download) - annotate - [select for diffs], Sat Oct 30 23:24:48 2021 UTC (2 years, 7 months ago) by deraadt
Branch: MAIN
CVS Tags: OPENBSD_7_1_BASE, OPENBSD_7_1
Changes since 1.69: +38 -27 lines
Diff to previous 1.69 (colored)

Change hw.perfpolicy=auto by default, at startup.  If the system has AC
power connected (default is yes when no driver differentiates) then default
to 100% performance. On battery, use the existing auto algorithm (which is
admittedly somewhat unrefined).
This change overrides the system/BIOS speed and puts OpenBSD in control.
As this happens very early during boot, besides speedups in all usage usage
patterns, some surprises: unhibernate and sysupgrade times are cut in half.
note: on a few architectures, the setperf fn pointer is changed late, and
thus the auto algorithm stops timeing out.  kettenis and i will look for
a solution.
in snaps for more than a week.
ok kettenis

Revision 1.69 / (download) - annotate - [select for diffs], Thu Sep 9 18:41:39 2021 UTC (2 years, 8 months ago) by mpi
Branch: MAIN
CVS Tags: OPENBSD_7_0_BASE, OPENBSD_7_0
Changes since 1.68: +2 -2 lines
Diff to previous 1.68 (colored)

Add THREAD_PID_OFFSET to tracepoint arguments that pass a TID to userland.

Bring these values in sync with the `tid' builtin which already include
the offset.  This is necessary to build script comparing them, like:

tracepoint:sched:enqueue
{
@ts[arg0] = nsecs;
}

tracepoint:sched:on__cpu
/@ts[tid]/
{
latency = nsecs - @ts[tid];
}

Discussed with and ok bluhm@

Revision 1.68 / (download) - annotate - [select for diffs], Mon Aug 2 15:15:47 2021 UTC (2 years, 10 months ago) by tb
Branch: MAIN
Changes since 1.67: +5 -2 lines
Diff to previous 1.67 (colored)

Don't call cpu_setperf() when reading hw.setperf.

"makes perfect sense to me" chris
ok gnezdo jca

Revision 1.67 / (download) - annotate - [select for diffs], Mon May 10 18:01:24 2021 UTC (3 years ago) by mpi
Branch: MAIN
Changes since 1.66: +7 -1 lines
Diff to previous 1.66 (colored)

Revert previous, it introduced a regression with breakpoints in gdb.

Revision 1.66 / (download) - annotate - [select for diffs], Thu May 6 09:33:22 2021 UTC (3 years ago) by mpi
Branch: MAIN
Changes since 1.65: +1 -7 lines
Diff to previous 1.65 (colored)

Refactor routines to stop/unstop processes and save the corresponding signal.

- Move the "hack" involving P_SINTR to avoid grabbing the SCHED_LOCK()
recursively closer to where it is necessary, in proc_stop()

- Introduce proc_unstop(), the symmetric routine to proc_stop(), which
manipulates `ps_xsig' and use it whenever a SSTOPed thread needs to be
awaken.

- Manipulate `ps_xsig' only in proc_stop/unstop()

ok kettenis@

Revision 1.65 / (download) - annotate - [select for diffs], Thu Dec 10 04:26:50 2020 UTC (3 years, 5 months ago) by gnezdo
Branch: MAIN
CVS Tags: OPENBSD_6_9_BASE, OPENBSD_6_9
Changes since 1.64: +7 -13 lines
Diff to previous 1.64 (colored)

Use sysctl_int_bounded for sysctl_hwsetperf

Removed some trailing whitespace while there.

ok gkoehler@

Revision 1.64 / (download) - annotate - [select for diffs], Thu Oct 15 07:49:55 2020 UTC (3 years, 7 months ago) by mpi
Branch: MAIN
Changes since 1.63: +1 -3 lines
Diff to previous 1.63 (colored)

Stop asserting that the NET_LOCK() shouldn't be held in yield().

This create too many false positive when setting pool_debug=2.

Prodded by deraadt@, ok mvs@

Revision 1.63 / (download) - annotate - [select for diffs], Sat May 30 14:42:59 2020 UTC (4 years ago) by solene
Branch: MAIN
CVS Tags: OPENBSD_6_8_BASE, OPENBSD_6_8
Changes since 1.62: +3 -1 lines
Diff to previous 1.62 (colored)

In automatic performance mode on systems with offline CPUs because of SMT
mitigation the algorithm was still accounting the offline CPUs, leading to
a code path that would never be reached.

This should allow better frequency scaling on systems with many CPUs.
The frequency should scale up if one of two condition is true.
    - if at least one CPU has less than 25% of idle cpu time
    - if the average of all idle time is under 33%

The second condition was never met because offline CPU are always accounted as
100% idle.

A bit more explanations about the auto scaling in case someone want to improve
this later: When one condition is met, CPU frequency is set to maximum and a
counter set to 5, then the function will be run again 100ms later and decrement
the counter if both conditions are not met anymore. Once the counter reach 0
the frequency is set to minimum. This mean that it can take up to 100ms to
scale up and up to 500ms to scale down.

ok brynet@
looks good tedu@

Revision 1.62 / (download) - annotate - [select for diffs], Thu Jan 30 08:51:27 2020 UTC (4 years, 4 months ago) by mpi
Branch: MAIN
CVS Tags: OPENBSD_6_7_BASE, OPENBSD_6_7
Changes since 1.61: +11 -12 lines
Diff to previous 1.61 (colored)

Split `p_priority' into `p_runpri' and `p_slppri'.

Using different fields to remember in which runqueue or sleepqueue
threads currently are will make it easier to split the SCHED_LOCK().

With this change, the (potentially boosted) sleeping priority is no
longer overwriting the thread priority.  This let us get rids of the
logic required to synchronize `p_priority' with `p_usrpri'.

Tested by many, ok visa@

Revision 1.61 / (download) - annotate - [select for diffs], Tue Jan 21 16:16:23 2020 UTC (4 years, 4 months ago) by mpi
Branch: MAIN
Changes since 1.60: +6 -1 lines
Diff to previous 1.60 (colored)

Import dt(4) a driver and framework for Dynamic Profiling.

The design is fairly simple: events, in the form of descriptors on a
ring, are being produced in any kernel context and being consumed by
a userland process reading /dev/dt.

Code and hooks are all guarded under '#if NDT > 0' so this commit
shouldn't introduce any change as long as dt(4) is disable in GENERIC.

ok kettenis@, visa@, jasper@, deraadt@

Revision 1.60 / (download) - annotate - [select for diffs], Wed Dec 11 07:30:09 2019 UTC (4 years, 5 months ago) by guenther
Branch: MAIN
Changes since 1.59: +7 -6 lines
Diff to previous 1.59 (colored)

Replace p_xstat with ps_xexit and ps_xsig
Convert those to a consolidated status when needed in wait4(), kevent(),
	and sysctl()
Pass exit code and signal separately to exit1()
(This also serves as prep for adding waitid(2))

ok mpi@

Revision 1.59 / (download) - annotate - [select for diffs], Mon Nov 4 18:06:03 2019 UTC (4 years, 6 months ago) by visa
Branch: MAIN
Changes since 1.58: +0 -1 lines
Diff to previous 1.58 (colored)

Restore the old way of dispatching dead procs through idle proc.
The new way needs more thought.

Revision 1.58 / (download) - annotate - [select for diffs], Sat Nov 2 05:31:20 2019 UTC (4 years, 7 months ago) by visa
Branch: MAIN
Changes since 1.57: +2 -1 lines
Diff to previous 1.57 (colored)

Move dead procs to the reaper queue immediately after context switch.
This eliminates a forced context switch to the idle proc. In addition,
sched_exit() no longer needs to sum proc runtime because mi_switch()
will do it.

OK mpi@ a while ago

Revision 1.57 / (download) - annotate - [select for diffs], Fri Nov 1 20:58:01 2019 UTC (4 years, 7 months ago) by mpi
Branch: MAIN
Changes since 1.56: +1 -27 lines
Diff to previous 1.56 (colored)

Kill resched_proc() and instead call need_resched() when a thread is
added to the runqueue of a CPU.

This fix out-of-sync cases when the priority of a thread wasn't reflecting
the runqueue it was sitting in leading to unnecessary context switch.

ok visa@

Revision 1.56 / (download) - annotate - [select for diffs], Tue Oct 15 10:05:43 2019 UTC (4 years, 7 months ago) by mpi
Branch: MAIN
Changes since 1.55: +5 -12 lines
Diff to previous 1.55 (colored)

Reduce the number of places where `p_priority' and `p_stat' are set.

This refactoring will help future scheduler locking, in particular to
shrink the SCHED_LOCK().

No intended behavior change.

ok visa@

Revision 1.55 / (download) - annotate - [select for diffs], Mon Jul 15 20:44:48 2019 UTC (4 years, 10 months ago) by mpi
Branch: MAIN
CVS Tags: OPENBSD_6_6_BASE, OPENBSD_6_6
Changes since 1.54: +5 -4 lines
Diff to previous 1.54 (colored)

Stop calling resched_proc() after changing the nice(3) value of a process.

Changing the scheduling priority of a process happens rarely, so it isn't
strictly necessary to update the current priority of every threads
instantly.

Moreover resched_proc() isn't well suited to perform this action: it doesn't
consider the state of each thread nor move them to another runqueue.

ok visa@

Revision 1.54 / (download) - annotate - [select for diffs], Mon Jul 8 18:53:18 2019 UTC (4 years, 10 months ago) by mpi
Branch: MAIN
Changes since 1.53: +35 -33 lines
Diff to previous 1.53 (colored)

Untangle code setting the scheduling priority of a thread.

- `p_estcpu' and `p_usrpri' represent the priority and are now only set
in a single function.

- Call resched_proc() after updating the priority and stop calling it
from schedclock() since `spc_curpriority' should match curproc's priority.

- Rename updatepri() to match decay_cpu() and stop updating per-thread
member.

- Merge two resched_proc() in one inside setrunnable().

Tweak and ok visa@

Revision 1.53 / (download) - annotate - [select for diffs], Sat Jun 1 14:11:17 2019 UTC (5 years ago) by mpi
Branch: MAIN
Changes since 1.52: +4 -2 lines
Diff to previous 1.52 (colored)

Revert to using the SCHED_LOCK() to protect time accounting.

It currently creates a lock ordering problem because SCHED_LOCK() is taken
by hardclock().  That means the "priorities" of a thread should be moved
out of the SCHED_LOCK() first in order to make progress.

Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com
via anton@ as well as by kettenis@

Revision 1.52 / (download) - annotate - [select for diffs], Fri May 31 19:51:10 2019 UTC (5 years ago) by mpi
Branch: MAIN
Changes since 1.51: +2 -4 lines
Diff to previous 1.51 (colored)

Use a per-process mutex to protect time accounting instead of SCHED_LOCK().

Note that hardclock(9) still increments p_{u,s,i}ticks without holding a
lock.

ok visa@, cheloha@

Revision 1.51 / (download) - annotate - [select for diffs], Sat May 25 18:11:10 2019 UTC (5 years ago) by mpi
Branch: MAIN
Changes since 1.50: +2 -2 lines
Diff to previous 1.50 (colored)

Do not account spinning time as running time when a thread crosses a
tick boundary of schedlock().

This reduces the contention on the SCHED_LOCK() when the current thread
is already spinning.

Prompted by deraadt@, ok visa@

Revision 1.50 / (download) - annotate - [select for diffs], Tue Feb 26 14:24:21 2019 UTC (5 years, 3 months ago) by visa
Branch: MAIN
CVS Tags: OPENBSD_6_5_BASE, OPENBSD_6_5
Changes since 1.49: +4 -1 lines
Diff to previous 1.49 (colored)

Introduce safe memory reclamation, a mechanism for reclaiming shared
objects that readers can access without locking. This provides a basis
for read-copy-update operations.

Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.

The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.

The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.

Discussed with many
OK mpi@ sashan@

Revision 1.49 / (download) - annotate - [select for diffs], Mon Jan 28 11:48:13 2019 UTC (5 years, 4 months ago) by mpi
Branch: MAIN
Changes since 1.48: +13 -1 lines
Diff to previous 1.48 (colored)

Stop accounting/updating priorities for Idle threads.

Idle threads are never placed on the runqueue so their priority doesn't
matter.

This fixes an accounting bug where top(1) would report a high CPU usage
for Idle threads of secondary CPUs right after booting.  That's because
schedcpu() would give 100% CPU time to the Idle thread until "real"
threads get scheduled on the corresponding CPU.

Issue reported by bluhm@, ok visa@, kettenis@

Revision 1.48 / (download) - annotate - [select for diffs], Sun Jan 6 12:59:45 2019 UTC (5 years, 4 months ago) by visa
Branch: MAIN
Changes since 1.47: +1 -19 lines
Diff to previous 1.47 (colored)

Fix unsafe use of ptsignal() in mi_switch().

ptsignal() has to be called with the kernel lock held. As ensuring the
locking in mi_switch() is not easy, and deferring the signaling using
the task API is not possible because of lock order issues in
mi_switch(), move the CPU time checking into a periodic timer where
the kernel can be locked without issues.

With this change, each process has a dedicated resource check timer.
The timer gets activated only when a CPU time limit is set. Because the
checking is not done as frequently as before, some precision is lost.

Use of timers adapted from FreeBSD.

OK tedu@

Reported-by: syzbot+2f5d62256e3280634623@syzkaller.appspotmail.com

Revision 1.47 / (download) - annotate - [select for diffs], Mon Dec 4 09:38:20 2017 UTC (6 years, 5 months ago) by mpi
Branch: MAIN
CVS Tags: OPENBSD_6_4_BASE, OPENBSD_6_4, OPENBSD_6_3_BASE, OPENBSD_6_3
Changes since 1.46: +2 -2 lines
Diff to previous 1.46 (colored)

Use _kernel_lock_held() instead of __mp_lock_held(&kernel_lock).

ok visa@

Revision 1.46 / (download) - annotate - [select for diffs], Tue Feb 14 10:31:15 2017 UTC (7 years, 3 months ago) by mpi
Branch: MAIN
CVS Tags: OPENBSD_6_2_BASE, OPENBSD_6_2, OPENBSD_6_1_BASE, OPENBSD_6_1
Changes since 1.45: +2 -8 lines
Diff to previous 1.45 (colored)

Convert most of the manual checks for CPU hogging to sched_pause().

The distinction between preempt() and yield() stays as it is usueful
to know if a thread decided to yield by itself or if the kernel told
him to go away.

ok tedu@, guenther@

Revision 1.45 / (download) - annotate - [select for diffs], Thu Feb 9 10:27:03 2017 UTC (7 years, 3 months ago) by mpi
Branch: MAIN
Changes since 1.44: +1 -2 lines
Diff to previous 1.44 (colored)

Do no select a CPU to execute the current thread when being preempt()ed.

Calling sched_choosecpu() at this moment often result in moving the thread
to a different CPU.  This does not help the scheduler and creates a domino
effect, resulting in kernel thread moving to other CPUs.

Tested by many without performance impact.  Simon Mages measured a small
performance improvement and a smaller variance with an http proxy.

Discussed with kettenis@, ok martijn@, beck@, visa@

Revision 1.44 / (download) - annotate - [select for diffs], Wed Jan 25 06:15:50 2017 UTC (7 years, 4 months ago) by mpi
Branch: MAIN
Changes since 1.43: +3 -1 lines
Diff to previous 1.43 (colored)

Enable the NET_LOCK(), take 2.

Recursions are currently known and marked a XXXSMP.

Please report any assert to bugs@

Revision 1.43 / (download) - annotate - [select for diffs], Wed Mar 9 13:38:50 2016 UTC (8 years, 2 months ago) by mpi
Branch: MAIN
CVS Tags: OPENBSD_6_0_BASE, OPENBSD_6_0
Changes since 1.42: +7 -11 lines
Diff to previous 1.42 (colored)

Correct some comments and definitions, from Michal Mazurek.

Revision 1.42 / (download) - annotate - [select for diffs], Sun Nov 8 20:45:57 2015 UTC (8 years, 6 months ago) by naddy
Branch: MAIN
CVS Tags: OPENBSD_5_9_BASE, OPENBSD_5_9
Changes since 1.41: +2 -2 lines
Diff to previous 1.41 (colored)

keep all the setperf timeout(9) handling in one place; ok tedu@

Revision 1.41 / (download) - annotate - [select for diffs], Sat Mar 14 03:38:50 2015 UTC (9 years, 2 months ago) by jsg
Branch: MAIN
CVS Tags: OPENBSD_5_8_BASE, OPENBSD_5_8
Changes since 1.40: +1 -2 lines
Diff to previous 1.40 (colored)

Remove some includes include-what-you-use claims don't
have any direct symbols used.  Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@

Revision 1.40 / (download) - annotate - [select for diffs], Sat Dec 13 21:05:33 2014 UTC (9 years, 5 months ago) by doug
Branch: MAIN
CVS Tags: OPENBSD_5_7_BASE, OPENBSD_5_7
Changes since 1.39: +3 -3 lines
Diff to previous 1.39 (colored)

yet more mallocarray() changes.

ok tedu@ deraadt@

Revision 1.39 / (download) - annotate - [select for diffs], Wed Nov 12 22:27:45 2014 UTC (9 years, 6 months ago) by tedu
Branch: MAIN
Changes since 1.38: +5 -2 lines
Diff to previous 1.38 (colored)

take a few more ticks to actually throttle down. hopefully helps in
situations where e.g. web browsing is cpu intense but intermittently idle.
subject to further refinement and tuning.

Revision 1.38 / (download) - annotate - [select for diffs], Mon Nov 3 03:08:00 2014 UTC (9 years, 7 months ago) by deraadt
Branch: MAIN
Changes since 1.37: +3 -2 lines
Diff to previous 1.37 (colored)

pass size argument to free()
ok doug tedu

Revision 1.37 / (download) - annotate - [select for diffs], Fri Oct 17 15:34:55 2014 UTC (9 years, 7 months ago) by deraadt
Branch: MAIN
Changes since 1.36: +7 -7 lines
Diff to previous 1.36 (colored)

cpu_setperf and perflevel must remain exposed, otherwise a bunch of
MD code needs excess #ifndef SMALL_KERNEL

Revision 1.36 / (download) - annotate - [select for diffs], Fri Oct 17 01:51:39 2014 UTC (9 years, 7 months ago) by tedu
Branch: MAIN
Changes since 1.35: +151 -1 lines
Diff to previous 1.35 (colored)

redo the performance throttling in the kernel.
introduce a new sysctl, hw.perfpolicy, that governs the policy.
when set to anything other than manual, hw.setperf then becomes read only.
phessler was heading in this direction, but this is slightly different. :)

Revision 1.35 / (download) - annotate - [select for diffs], Fri Jul 4 05:58:31 2014 UTC (9 years, 11 months ago) by guenther
Branch: MAIN
CVS Tags: OPENBSD_5_6_BASE, OPENBSD_5_6
Changes since 1.34: +1 -2 lines
Diff to previous 1.34 (colored)

Track whether a process is a zombie or not yet fully built via flags
PS_{ZOMBIE,EMBRYO} on the process instead of peeking into the process's
thread data.  This eliminates the need for the thread-level SDEAD state.

Change kvm_getprocs() (both the sysctl() and kvm backends) to report the
"most active" scheduler state for the process's threads.

tweaks kettenis@
feedback and ok matthew@

Revision 1.34 / (download) - annotate - [select for diffs], Thu May 15 03:52:25 2014 UTC (10 years ago) by guenther
Branch: MAIN
Changes since 1.33: +1 -2 lines
Diff to previous 1.33 (colored)

Move from struct proc to process the reference-count-holding pointers
to the process's vmspace and filedescs.  struct proc continues to
keep copies of the pointers, copying them on fork, clearing them
on exit, and (for vmspace) refreshing on exec.
Also, make uvm_swapout_threads() thread aware, eliminating p_swtime
in kernel.

particular testing by ajacoutot@ and sebastia@

Revision 1.33 / (download) - annotate - [select for diffs], Mon Jun 3 16:55:22 2013 UTC (11 years ago) by guenther
Branch: MAIN
CVS Tags: OPENBSD_5_5_BASE, OPENBSD_5_5, OPENBSD_5_4_BASE, OPENBSD_5_4
Changes since 1.32: +10 -10 lines
Diff to previous 1.32 (colored)

Convert some internal APIs to use timespecs instead of timevals

ok matthew@ deraadt@

Revision 1.32 / (download) - annotate - [select for diffs], Sun Jun 2 20:59:09 2013 UTC (11 years ago) by guenther
Branch: MAIN
Changes since 1.31: +4 -3 lines
Diff to previous 1.31 (colored)

Use long long and %lld for printing tv_sec values

ok deraadt@

Revision 1.31 / (download) - annotate - [select for diffs], Thu Mar 28 16:55:25 2013 UTC (11 years, 2 months ago) by deraadt
Branch: MAIN
Changes since 1.30: +1 -2 lines
Diff to previous 1.30 (colored)

do not include machine/cpu.h from a .c file; it is the responsibility of
.h files to pull it in, if needed
ok tedu

Revision 1.30 / (download) - annotate - [select for diffs], Mon Jul 9 17:27:32 2012 UTC (11 years, 10 months ago) by haesbaert
Branch: MAIN
CVS Tags: OPENBSD_5_3_BASE, OPENBSD_5_3, OPENBSD_5_2_BASE, OPENBSD_5_2
Changes since 1.29: +2 -11 lines
Diff to previous 1.29 (colored)

Tedu old comment concerning cpu affinity which does not apply anymore.

ok blambert@ krw@ tedu@ miod@

Revision 1.29 / (download) - annotate - [select for diffs], Fri Mar 23 15:51:26 2012 UTC (12 years, 2 months ago) by guenther
Branch: MAIN
Changes since 1.28: +12 -6 lines
Diff to previous 1.28 (colored)

Make rusage totals, itimers, and profile settings per-process instead
of per-rthread.  Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.

ok kettenis@

Revision 1.28 / (download) - annotate - [select for diffs], Mon Feb 20 22:23:39 2012 UTC (12 years, 3 months ago) by guenther
Branch: MAIN
Changes since 1.27: +2 -2 lines
Diff to previous 1.27 (colored)

First steps for making ptrace work with rthreads:
 - move the P_TRACED and P_INEXEC flags, and p_oppid, p_ptmask, and
   p_ptstat member from struct proc to struct process
 - sort the PT_* requests into those that take a PID vs those that
   can also take a TID
 - stub in PT_GET_THREAD_FIRST and PT_GET_THREAD_NEXT

ok kettenis@

Revision 1.27 / (download) - annotate - [select for diffs], Thu Jul 7 18:00:33 2011 UTC (12 years, 10 months ago) by guenther
Branch: MAIN
CVS Tags: OPENBSD_5_1_BASE, OPENBSD_5_1, OPENBSD_5_0_BASE, OPENBSD_5_0
Changes since 1.26: +1 -6 lines
Diff to previous 1.26 (colored)

Functions used in files other than where they are defined should be
declared in .h files, not in each .c.  Apply that rule to endtsleep(),
scheduler_start(), updatepri(), and realitexpire()

ok deraadt@ tedu@

Revision 1.26 / (download) - annotate - [select for diffs], Wed Jul 6 01:49:42 2011 UTC (12 years, 11 months ago) by art
Branch: MAIN
Changes since 1.25: +5 -3 lines
Diff to previous 1.25 (colored)

Stop using the P_BIGLOCK flag to figure out when we should release the
biglock in mi_switch and just check if we're holding the biglock.

The idea is that the first entry point into the kernel uses KERNEL_PROC_LOCK
and recursive calls use KERNEL_LOCK. This assumption is violated in at
least one place and has been causing confusion for lots of people.

Initial bug report and analysis from Pedro.

kettenis@ beck@ oga@ thib@ dlg@ ok

Revision 1.25 / (download) - annotate - [select for diffs], Mon Mar 7 07:07:13 2011 UTC (13 years, 3 months ago) by guenther
Branch: MAIN
Changes since 1.24: +3 -2 lines
Diff to previous 1.24 (colored)

The scheduling 'nice' value is per-process, not per-thread, so move it
into struct process.

ok tedu@ deraadt@

Revision 1.24 / (download) - annotate - [select for diffs], Fri Sep 24 13:21:30 2010 UTC (13 years, 8 months ago) by matthew
Branch: MAIN
CVS Tags: OPENBSD_4_9_BASE, OPENBSD_4_9
Changes since 1.23: +2 -1 lines
Diff to previous 1.23 (colored)

Add stricter asserts to DIAGNOSTIC kernels to help catch mutex and
rwlock misuse.  In particular, this commit makes the following
changes:

  1. i386 and amd64 now count the number of active mutexes so that
assertwaitok(9) can detect attempts to sleep while holding a mutex.

  2. i386 and amd64 check that we actually hold mutexes when passed to
mtx_leave().

  3. Calls to rw_exit*() now call rw_assert_{rd,wr}lock() as
appropriate.

ok krw@, oga@; "sounds good to me" deraadt@; assembly bits double
checked by pirofti@

Revision 1.23 / (download) - annotate - [select for diffs], Wed Jun 30 22:38:17 2010 UTC (13 years, 11 months ago) by art
Branch: MAIN
CVS Tags: OPENBSD_4_8_BASE, OPENBSD_4_8
Changes since 1.22: +2 -2 lines
Diff to previous 1.22 (colored)

This comment is unnecessarily confusing.

Revision 1.22 / (download) - annotate - [select for diffs], Sun Jan 3 19:17:33 2010 UTC (14 years, 5 months ago) by kettenis
Branch: MAIN
CVS Tags: OPENBSD_4_7_BASE, OPENBSD_4_7
Changes since 1.21: +6 -7 lines
Diff to previous 1.21 (colored)

Use atomic operations to access the per-cpu scheduler flags.

Revision 1.21 / (download) - annotate - [select for diffs], Tue Apr 14 09:13:25 2009 UTC (15 years, 1 month ago) by art
Branch: MAIN
CVS Tags: OPENBSD_4_6_BASE, OPENBSD_4_6
Changes since 1.20: +3 -1 lines
Diff to previous 1.20 (colored)

Some tweaks to the cpu affinity code.
 - Split up choosing of cpu between fork and "normal" cases. Fork is
   very different and should be treated as such.
 - Instead of implicitly choosing a cpu in setrunqueue, do it outside
   where it actually makes sense.
 - Just because a cpu is marked as idle doesn't mean it will be soon.
   There could be a thundering herd effect if we call wakeup from an
   interrupt handler, so subtract cpus with queued processes when
   deciding which cpu is actually idle.
 - some simplifications allowed by the above.

kettenis@ ok (except one bugfix that was not in the intial diff)

Revision 1.20 / (download) - annotate - [select for diffs], Mon Mar 23 13:25:11 2009 UTC (15 years, 2 months ago) by art
Branch: MAIN
Changes since 1.19: +6 -4 lines
Diff to previous 1.19 (colored)

Processor affinity for processes.
 - Split up run queues so that every cpu has one.
 - Make setrunqueue choose the cpu where we want to make this process
   runnable (this should be refined and less brutal in the future).
 - When choosing the cpu where we want to run, make some kind of educated
   guess where it will be best to run (very naive right now).
Other:
 - Set operations for sets of cpus.
 - load average calculations per cpu.
 - sched_is_idle() -> curcpu_is_idle()

tested, debugged and prodded by many@

Revision 1.19 / (download) - annotate - [select for diffs], Thu Nov 6 22:11:36 2008 UTC (15 years, 6 months ago) by art
Branch: MAIN
CVS Tags: OPENBSD_4_5_BASE, OPENBSD_4_5
Changes since 1.18: +3 -5 lines
Diff to previous 1.18 (colored)

Some paranoia and deconfusion.
 - setrunnable should never be run on SIDL processes. That's a bug and will
   cause all kinds of trouble. Change the switch statement to panic
   if that happens.
 - p->p_stat == SRUN implies that p != curproc since curproc will always be
   SONPROC. This is a leftover from before SONPROC.

deraadt@ "commit"

Revision 1.18 / (download) - annotate - [select for diffs], Wed Sep 10 14:01:23 2008 UTC (15 years, 8 months ago) by blambert
Branch: MAIN
Changes since 1.17: +2 -2 lines
Diff to previous 1.17 (colored)

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@

Revision 1.17 / (download) - annotate - [select for diffs], Fri Jul 18 23:43:31 2008 UTC (15 years, 10 months ago) by art
Branch: MAIN
CVS Tags: OPENBSD_4_4_BASE, OPENBSD_4_4
Changes since 1.16: +3 -1 lines
Diff to previous 1.16 (colored)

Add a macro that clears the want_resched flag that need_resched sets.
Right now when mi_switch picks up the same proc, we didn't clear the
flag which would mean that every time we service an AST we would attempt
a context switch. For some architectures, amd64 being probably the
most extreme, that meant attempting to context switch for every
trap and interrupt.

Now we clear_resched explicitly after every context switch, even if it
didn't do anything. Which also allows us to remove some more code
in cpu_switchto (not done yet).

miod@ ok

Revision 1.16 / (download) - annotate - [select for diffs], Thu May 22 14:07:14 2008 UTC (16 years ago) by thib
Branch: MAIN
Changes since 1.15: +2 -4 lines
Diff to previous 1.15 (colored)

kill 2 bogus ARGUSED and use the LIST_FOREACH() macro
instead of handrolling...

ok miod@

Revision 1.15 / (download) - annotate - [select for diffs], Mon Nov 26 17:15:29 2007 UTC (16 years, 6 months ago) by art
Branch: MAIN
CVS Tags: OPENBSD_4_3_BASE, OPENBSD_4_3
Changes since 1.14: +4 -2 lines
Diff to previous 1.14 (colored)

Move the implementation of __mp_lock (biglock) into machine dependent
code. At this moment all architectures get the copy of the old code
except i386 which gets a new shiny implementation that doesn't spin
at splhigh (doh!) and doesn't try to grab the biglock when releasing
the biglock (double doh!).

Shaves 10% of system time during kernel compile and might solve a few
bugs as a bonus.

Other architectures coming shortly.

miod@ deraadt@ ok

Revision 1.14 / (download) - annotate - [select for diffs], Thu Oct 11 10:34:08 2007 UTC (16 years, 7 months ago) by art
Branch: MAIN
Changes since 1.13: +1 -15 lines
Diff to previous 1.13 (colored)

sched_lock_idle and sched_unlock_idle are obsolete now.

Revision 1.13 / (download) - annotate - [select for diffs], Wed Oct 10 15:53:53 2007 UTC (16 years, 7 months ago) by art
Branch: MAIN
Changes since 1.12: +29 -39 lines
Diff to previous 1.12 (colored)

Make context switching much more MI:
 - Move the functionality of choosing a process from cpu_switch into
   a much simpler function: cpu_switchto. Instead of having the locore
   code walk the run queues, let the MI code choose the process we
   want to run and only implement the context switching itself in MD
   code.
 - Let MD context switching run without worrying about spls or locks.
 - Instead of having the idle loop implemented with special contexts
   in MD code, implement one idle proc for each cpu. make the idle
   loop MI with MD hooks.
 - Change the proc lists from the old style vax queues to TAILQs.
 - Change the sleep queue from vax queues to TAILQs. This makes
   wakeup() go from O(n^2) to O(n)

there will be some MD fallout, but it will be fixed shortly.
There's also a few cleanups to be done after this.

deraadt@, kettenis@ ok

Revision 1.12 / (download) - annotate - [select for diffs], Fri May 18 16:10:15 2007 UTC (17 years ago) by art
Branch: MAIN
CVS Tags: OPENBSD_4_2_BASE, OPENBSD_4_2
Changes since 1.11: +4 -6 lines
Diff to previous 1.11 (colored)

Widen the SCHED_LOCK in two cases to protect p_estcpu and p_priority.

kettenis@ ok

Revision 1.11 / (download) - annotate - [select for diffs], Wed May 16 17:27:30 2007 UTC (17 years ago) by art
Branch: MAIN
Changes since 1.10: +1 -78 lines
Diff to previous 1.10 (colored)

The world of __HAVEs and __HAVE_NOTs is reducing. All architectures
have cpu_info now, so kill the option.

eyeballed by jsg@ and grange@

Revision 1.10 / (download) - annotate - [select for diffs], Tue Feb 6 18:42:37 2007 UTC (17 years, 3 months ago) by art
Branch: MAIN
CVS Tags: OPENBSD_4_1_BASE, OPENBSD_4_1
Changes since 1.9: +2 -2 lines
Diff to previous 1.9 (colored)

Use atomic.h operation for manipulating p_siglist in struct proc. Solves
the problem with lost signals in MP kernels.

miod@, kettenis@ ok

Revision 1.9 / (download) - annotate - [select for diffs], Wed Nov 29 12:24:18 2006 UTC (17 years, 6 months ago) by miod
Branch: MAIN
Changes since 1.8: +3 -8 lines
Diff to previous 1.8 (colored)

Kernel stack can be swapped. This means that stuff that's on the stack
should never be referenced outside the context of the process to which
this stack belongs unless we do the PHOLD/PRELE dance. Loads of code
doesn't follow the rules here. Instead of trying to track down all
offenders and fix this hairy situation, it makes much more sense
to not swap kernel stacks.

From art@, tested by many some time ago.

Revision 1.8 / (download) - annotate - [select for diffs], Wed Nov 15 17:25:40 2006 UTC (17 years, 6 months ago) by jmc
Branch: MAIN
Changes since 1.7: +2 -2 lines
Diff to previous 1.7 (colored)

typos; from bret lambert

Revision 1.7 / (download) - annotate - [select for diffs], Sat Oct 21 02:18:00 2006 UTC (17 years, 7 months ago) by tedu
Branch: MAIN
Changes since 1.6: +6 -6 lines
Diff to previous 1.6 (colored)

tbert sent me a diff to change some 0 to NULL
i got carried away and deleted a whole bunch of useless casts
this is C, not C++.  ok md5

Revision 1.6 / (download) - annotate - [select for diffs], Mon Oct 9 00:31:11 2006 UTC (17 years, 7 months ago) by tedu
Branch: MAIN
Changes since 1.5: +15 -21 lines
Diff to previous 1.5 (colored)

bret lambert sent a patch removing register.  i made it ansi.

Revision 1.5 / (download) - annotate - [select for diffs], Fri Jun 17 22:33:34 2005 UTC (18 years, 11 months ago) by niklas
Branch: MAIN
CVS Tags: OPENBSD_4_0_BASE, OPENBSD_4_0, OPENBSD_3_9_BASE, OPENBSD_3_9, OPENBSD_3_8_BASE, OPENBSD_3_8
Changes since 1.4: +7 -14 lines
Diff to previous 1.4 (colored)

A second approach at fixing the telnet localhost & problem
(but I tend to call it ssh localhost & now when telnetd is
history).  This is more localized patch, but leaves us with
a recursive lock for protecting scheduling and signal state.
Better care is taken to actually be symmetric over mi_switch.
Also, the dolock cruft in psignal can go with this solution.
Better test runs by more people for longer time has been
carried out compared to the c2k5 patch.

Long term the current mess with interruptible sleep, the
default action on stop signals and wakeup interactions need
to be revisited.  ok deraadt@, art@

Revision 1.4 / (download) - annotate - [select for diffs], Sun May 29 03:20:41 2005 UTC (19 years ago) by deraadt
Branch: MAIN
Changes since 1.3: +42 -39 lines
Diff to previous 1.3 (colored)

sched work by niklas and art backed out; causes panics

Revision 1.3 / (download) - annotate - [select for diffs], Thu May 26 18:10:40 2005 UTC (19 years ago) by art
Branch: MAIN
Changes since 1.2: +2 -1 lines
Diff to previous 1.2 (colored)

Fix yield() to change p_stat from SONPROC to SRUN.
yield() is not used anywhere yet, that's why we didn't notice this.
Noticed by tedu@ who just started using it.

Revision 1.2 / (download) - annotate - [select for diffs], Wed May 25 23:17:47 2005 UTC (19 years ago) by niklas
Branch: MAIN
Changes since 1.1: +39 -42 lines
Diff to previous 1.1 (colored)

This patch is mortly art's work and was done *a year* ago.  Art wants to thank
everyone for the prompt review and ok of this work ;-)  Yeah, that includes me
too, or maybe especially me.  I am sorry.

Change the sched_lock to a mutex. This fixes, among other things, the infamous
"telnet localhost &" problem.  The real bug in that case was that the sched_lock
which is by design a non-recursive lock, was recursively acquired, and not
enough releases made us hold the lock in the idle loop, blocking scheduling
on the other processors.  Some of the other processors would hold the biglock though,
which made it impossible for cpu 0 to enter the kernel...  A nice deadlock.
Let me just say debugging this for days just to realize that it was all fixed
in an old diff noone ever ok'd was somewhat of an anti-climax.

This diff also changes splsched to be correct for all our architectures.

Revision 1.1 / (download) - annotate - [select for diffs], Thu Jul 29 06:25:45 2004 UTC (19 years, 10 months ago) by tedu
Branch: MAIN
CVS Tags: OPENBSD_3_7_BASE, OPENBSD_3_7, OPENBSD_3_6_BASE, OPENBSD_3_6

put the scheduler in its own file.  reduces clutter, and logically separates
"put this process to sleep" and "find a process to run" operations.
no functional change.  ok art@

This form allows you to request diff's between any two revisions of a file. You may select a symbolic revision name using the selection box or you may type in a numeric name using the type-in text box.