OpenBSD CVS

CVS log for src/sys/sys/timetc.h


[BACK] Up to [local] / src / sys / sys

Request diff between arbitrary revisions


Default branch: MAIN


Revision 1.14 / (download) - annotate - [select for diffs], Sat Feb 4 19:19:35 2023 UTC (16 months ago) by cheloha
Branch: MAIN
CVS Tags: OPENBSD_7_5_BASE, OPENBSD_7_5, OPENBSD_7_4_BASE, OPENBSD_7_4, OPENBSD_7_3_BASE, OPENBSD_7_3, HEAD
Changes since 1.13: +1 -9 lines
Diff to previous 1.13 (colored)

timecounting: remove incomplete PPS support

The timecounting code has had stubs for pulse-per-second (PPS) polling
since it was imported in 2004.  At this point it seems unlikely that
anyone is going to finish adding PPS support, so let's remove the stubs:

- Delete the dead tc_poll_pps() call from tc_windup().
- Remove all tc_poll_pps symbols from the kernel.

Link: https://marc.info/?l=openbsd-tech&m=167519035723210&w=2

ok miod@

Revision 1.13 / (download) - annotate - [select for diffs], Fri Aug 12 02:20:36 2022 UTC (21 months, 4 weeks ago) by cheloha
Branch: MAIN
CVS Tags: OPENBSD_7_2_BASE, OPENBSD_7_2
Changes since 1.12: +2 -1 lines
Diff to previous 1.12 (colored)

amd64: simplify TSC synchronization testing

Computing a per-CPU TSC skew value is error-prone, especially on
multisocket machines and VMs.  My best guess is that larger latencies
appear to the current skew measurement test as TSC desync, and so the
TSC is demoted to a kernel timecounter on these machines or marked
non-monotonic.

This patch eliminates per-CPU TSC skew values.  Instead of trying to
measure and correct for TSC desync we only try to detect desync, which
is less error-prone.  This approach should allow a wider variety of
machines to use the TSC as a timecounter when running OpenBSD.

In the new sync test, both CPUs repeatedly try to detect whether their
TSC is trailing the other CPU's TSC.  The upside to this approach is
that it yields no false positives.  The downside to this approach is
that it takes more time than the current skew measurement test.  Each
test round takes 1ms, and we run up to two rounds per CPU, so this
patch slows boot down by 2ms per AP.

If any CPU fails the sync test, the TSC is marked non-monotonic and a
different timecounter is activated.  The TC_USER flag remains intact.
There is no middle ground where we fall back to only using the TSC in
the kernel.

Before running the test, we check for the IA32_TSC_ADJUST register and
reset it if necessary.  This is a trivial way to work around firmware
bugs that desync the TSC before we reach the kernel.  Unfortunately,
at the moment this register appears to only be available on Intel
processors.  I cannot find an equivalent but differently-named MSR for
AMD processors.

Because there is no per-CPU skew value, there is also no concept of
TSC drift anymore.

Miscellaneous notes:

- This patch adds a new timecounter utility function, tc_reset_quality().
  Used after sync test failure to mark the TSC non-monotonic.

- I have left TSC_DEBUG enabled for now.  Unsure if we should leave it
  enabled for release or not.  If we disable it we no longer run the
  sync test after failing it once.  Running the test even after failure
  provides information about the desync on every CPU.

- Taking 1ms per test round is fairly conservative.  We can experiment
  with and discuss shorter test rounds.  My main goal with a relatively
  long test round is ensuring VMs actually run the test.  It would be
  bad if a hypervisor interrupted the test for so long that it concealed
  desync.

- The use of two test rounds is mostly a diagnostic tool: it would be
  very strange if a CPU passed the first round but failed the second.
  If we ever saw this in the wild it would indicate something odd.

- Most of the desync seen in test reports is on Ryzen CPUs.  I
  believe, but cannot prove, that this is due to a widespread
  firmware bug on AMD motherboards.  Hopefully AMD and/or the
  downstream vendors fix it.

- Fixing TSC desync by writing the TSC directly with WRMSR is very
  difficult.  The TSC is a moving target incrementing very quickly and
  compensating for WRMSR overhead is non-trivial.  We can experiment
  with this, but my confidence is low that we can make it work reliably.

Prompted by deraadt@ and kettenis@ in 2021. Shepherded along by
deraadt@ throughout.  Reprompted by Yuichiro Naito several times.
With input from Yuichiro Naito, naddy@, sthen@, dv@, and deraadt@.

Tested by florian@, gnezdo@, sthen@, Josh Rickmar, dv@, Mohamed Aslan,
Hrvoje Popovski, Yuichiro Naito, semarie@, mlarkin@, asou@, jmatthew@,
Renato Aguiar, and Timo Myyra.

Patch v1: https://marc.info/?l=openbsd-tech&m=164330092208035&w=2
Patch v2: https://marc.info/?l=openbsd-tech&m=164558519712957&w=2
Patch v3: https://marc.info/?l=openbsd-tech&m=165698681018991&w=2
Patch v4: https://marc.info/?l=openbsd-tech&m=165835507113680&w=2
Patch v5: https://marc.info/?l=openbsd-tech&m=165923705118770&w=2

"just commit it" deraadt@

Revision 1.12 / (download) - annotate - [select for diffs], Mon Jul 6 13:33:09 2020 UTC (3 years, 11 months ago) by pirofti
Branch: MAIN
CVS Tags: OPENBSD_7_1_BASE, OPENBSD_7_1, OPENBSD_7_0_BASE, OPENBSD_7_0, OPENBSD_6_9_BASE, OPENBSD_6_9, OPENBSD_6_8_BASE, OPENBSD_6_8
Changes since 1.11: +26 -2 lines
Diff to previous 1.11 (colored)

Add support for timeconting in userland.

This diff exposes parts of clock_gettime(2) and gettimeofday(2) to
userland via libc eliberating processes from the need for a context
switch everytime they want to count the passage of time.

If a timecounter clock can be exposed to userland than it needs to set
its tc_user member to a non-zero value. Tested with one or multiple
counters per architecture.

The timing data is shared through a pointer found in the new ELF
auxiliary vector AUX_openbsd_timekeep containing timehands information
that is frequently updated by the kernel.

Timing differences between the last kernel update and the current time
are adjusted in userland by the tc_get_timecount() function inside the
MD usertc.c file.

This permits a much more responsive environment, quite visible in
browsers, office programs and gaming (apparently one is are able to fly
in Minecraft now).

Tested by robert@, sthen@, naddy@, kmos@, phessler@, and many others!

OK from at least kettenis@, cheloha@, naddy@, sthen@

Revision 1.11 / (download) - annotate - [select for diffs], Sat Jul 4 08:06:08 2020 UTC (3 years, 11 months ago) by anton
Branch: MAIN
Changes since 1.10: +4 -4 lines
Diff to previous 1.10 (colored)

It's been agreed upon that global locks should be expressed using
capital letters in locking annotations. Therefore harmonize the existing
annotations.

Also, if multiple locks are required they should be delimited using
commas.

ok mpi@

Revision 1.10 / (download) - annotate - [select for diffs], Sat Oct 26 21:16:38 2019 UTC (4 years, 7 months ago) by cheloha
Branch: MAIN
CVS Tags: OPENBSD_6_7_BASE, OPENBSD_6_7
Changes since 1.9: +4 -1 lines
Diff to previous 1.9 (colored)

clock_getres(2): actually return the resolution of the given clock

Currently we return (1000000000 / hz) from clock_getres(2) as the
resolution for every clock.  This is often untrue.

For CPUTIME clocks, if we have a separate statclock interrupt the
resolution is (1000000000 / stathz).  Otherwise it is as we currently
claim: (1000000000 / hz).

For the REALTIME/MONOTONIC/UPTIME/BOOTTIME clocks the resolution is
that of the active timecounter.  During tc_init() we can compute the
precision of a timecounter by examining its tc_counter_mask and store
it for lookup later in a new member, tc_precision.  The resolution of
a clock backed by a timecounter "tc" is then

	tc.tc_precision * (2^64 / tc.tc_frequency)

fractional seconds.

While here we can clean up sys_clock_getres() a bit.

Standards input from guenther@.  Lots of input, feedback from
kettenis@.

ok kettenis@

Revision 1.9 / (download) - annotate - [select for diffs], Wed May 22 19:59:37 2019 UTC (5 years ago) by cheloha
Branch: MAIN
CVS Tags: OPENBSD_6_6_BASE, OPENBSD_6_6
Changes since 1.8: +4 -2 lines
Diff to previous 1.8 (colored)

SLIST-ify the timecounter list.

Call it "tc_list" instead of "timecounters", which is too similar to
the variable "timecounter" for my taste.

ok mpi@ visa@

Revision 1.8 / (download) - annotate - [select for diffs], Mon Mar 25 23:32:00 2019 UTC (5 years, 2 months ago) by cheloha
Branch: MAIN
CVS Tags: OPENBSD_6_5_BASE, OPENBSD_6_5
Changes since 1.7: +21 -11 lines
Diff to previous 1.7 (colored)

MP-safe timecounting: new rwlock: tc_lock

tc_lock allows adjfreq(2) and the kern.timecounter.hardware sysctl(2)
to read/write the active timecounter pointer and the .tc_adj_freq
member of the active timecounter safely.  This eliminates any possibility
of a torn read/write for the .tc_adj_freq member when we drop the
KERNEL_LOCK from the timecounting layer.  It also ensures the active
timecounter does not change in the midst of an adjfreq(2) call.

Because these are not high-traffic paths, we can get away with using
tc_lock in write-mode to ensure combination read/write adjtime(2) calls
are relatively atomic (a) to other writer adjtime(2) calls, and (b) to
settimeofday(2)/clock_settime(2) calls, which cancel ongoing adjtime(2)
adjustment.

When the KERNEL_LOCK is dropped, an unprivileged user will be able to
create some tc_lock contention via adjfreq(2); it is very unlikely to
ever be a problem.  If it ever is actually a problem a lockless read
could be added to address it.

While here, reorganize sys_adjfreq()/sys_adjtime() to minimize code
under the lock.  Also while here, make tc_adjfreq() void, as it cannot
fail under any circumstance.  Also also while here, annotate various
globals/struct members with lock ordering details.

With lots of input from mpi@ and visa@.

ok visa@

Revision 1.7 / (download) - annotate - [select for diffs], Sun Mar 10 21:16:15 2019 UTC (5 years, 3 months ago) by cheloha
Branch: MAIN
Changes since 1.6: +2 -1 lines
Diff to previous 1.6 (colored)

Move adjtimedelta from kern_time.c to kern_tc.c.

This will simplify upcoming MP-safety diffs for the timecounting layer.

adjtimedelta is now accessed nowhere outside of kern_tc.c, so we can
remove its extern declaration from kernel.h.  Zeroing adjtimedelta
within timecounter_mtx before we jump the real-time clock is also a
bit safer than what we do now, as we are not racing a simultaneous
tc_windup() call from hardclock(), which itself can modify adjtimedelta
via ntp_update_second().

Discussed with visa@ and mpi@.

ok visa@

Revision 1.6 / (download) - annotate - [select for diffs], Mon May 28 18:05:42 2018 UTC (6 years ago) by guenther
Branch: MAIN
CVS Tags: OPENBSD_6_4_BASE, OPENBSD_6_4
Changes since 1.5: +3 -3 lines
Diff to previous 1.5 (colored)

Constipate a bunch of time functions

ok tb@ kettenis@

Revision 1.5 / (download) - annotate - [select for diffs], Thu Apr 3 17:58:31 2014 UTC (10 years, 2 months ago) by beck
Branch: MAIN
CVS Tags: OPENBSD_6_3_BASE, OPENBSD_6_3, OPENBSD_6_2_BASE, OPENBSD_6_2, OPENBSD_6_1_BASE, OPENBSD_6_1, OPENBSD_6_0_BASE, OPENBSD_6_0, OPENBSD_5_9_BASE, OPENBSD_5_9, OPENBSD_5_8_BASE, OPENBSD_5_8, OPENBSD_5_7_BASE, OPENBSD_5_7, OPENBSD_5_6_BASE, OPENBSD_5_6
Changes since 1.4: +2 -0 lines
Diff to previous 1.4 (colored)

fix $OpenBSD$, noticed by philip

Revision 1.4 / (download) - annotate - [select for diffs], Thu Apr 3 15:55:29 2014 UTC (10 years, 2 months ago) by beck
Branch: MAIN
Changes since 1.3: +18 -9 lines
Diff to previous 1.3 (colored)

I have discussed these licenses with Poul-Henning Kamp and he has agreed to
this license change. We will remember that we all still like beer.

Revision 1.3 / (download) - annotate - [select for diffs], Thu May 24 07:17:42 2012 UTC (12 years ago) by guenther
Branch: MAIN
CVS Tags: OPENBSD_5_5_BASE, OPENBSD_5_5, OPENBSD_5_4_BASE, OPENBSD_5_4, OPENBSD_5_3_BASE, OPENBSD_5_3, OPENBSD_5_2_BASE, OPENBSD_5_2
Changes since 1.2: +2 -1 lines
Diff to previous 1.2 (colored)

On resume, run forward the monotonic and realtimes clocks instead of jumping
just the realtime clock, triggering and adjusting timeouts to reflect that.

ok matthew@ deraadt@

Revision 1.2 / (download) - annotate - [select for diffs], Mon Oct 30 20:19:33 2006 UTC (17 years, 7 months ago) by otto
Branch: MAIN
CVS Tags: OPENBSD_5_1_BASE, OPENBSD_5_1, OPENBSD_5_0_BASE, OPENBSD_5_0, OPENBSD_4_9_BASE, OPENBSD_4_9, OPENBSD_4_8_BASE, OPENBSD_4_8, OPENBSD_4_7_BASE, OPENBSD_4_7, OPENBSD_4_6_BASE, OPENBSD_4_6, OPENBSD_4_5_BASE, OPENBSD_4_5, OPENBSD_4_4_BASE, OPENBSD_4_4, OPENBSD_4_3_BASE, OPENBSD_4_3, OPENBSD_4_2_BASE, OPENBSD_4_2, OPENBSD_4_1_BASE, OPENBSD_4_1
Changes since 1.1: +4 -1 lines
Diff to previous 1.1 (colored)

Timecounter based implementation of adjfreq(2). Largely from art@
Tested by various using not (yet) committed amd64 timecounter code.
ok deraadt@

Revision 1.1 / (download) - annotate - [select for diffs], Wed Jul 28 17:15:12 2004 UTC (19 years, 10 months ago) by tholo
Branch: MAIN
CVS Tags: OPENBSD_4_0_BASE, OPENBSD_4_0, OPENBSD_3_9_BASE, OPENBSD_3_9, OPENBSD_3_8_BASE, OPENBSD_3_8, OPENBSD_3_7_BASE, OPENBSD_3_7, OPENBSD_3_6_BASE, OPENBSD_3_6

This touches only MI code, and adds new time keeping code.  The
code is all conditionalized on __HAVE_TIMECOUNTER, and not
enabled on any platforms.

adjtime(2) support exists, courtesy of nordin@, sysctl(2) support
and a concept of quality for each time source attached exists.

High quality time sources exists for PIIX4 ACPI timer as well as
some AMD power management chips.  This will have to be redone
once we actually add ACPI support (at that time we need to use
the ACPI interfaces to get at these clocks).

ok art@ ken@ miod@ jmc@ and many more

This form allows you to request diff's between any two revisions of a file. You may select a symbolic revision name using the selection box or you may type in a numeric name using the type-in text box.