[BACK]Return to error.h CVS log [TXT][DIR] Up to [local] / src / usr.bin / make

File: [local] / src / usr.bin / make / error.h (download)

Revision 1.2, Fri Jun 23 16:27:29 2000 UTC (23 years, 11 months ago) by espie
Branch: MAIN
Changes since 1.1: +5 -1 lines

This is the speed-up patch, which doubles make speed (almost).

Use the open hashing functions for global contexts instead of List in
var.c.

All the preliminary work to trim down local contexts means that we don't
suffer from the heavy initialization work that a hash table entails.

There is some make kludgery to:
- build the hashing functions as a library,
- recreate hashconsts.h, even if make depend was not invoked.

One point of the hashing scheme written was to separate the computation
of the hash function, and the hash lookup itself. This is very convenient
for make, because of those pesky special variables. hashconsts.h is there
to pre-hash the correct values, which replaces a few expensive string
comparisons with quick hash value comparisons, followed by one expensive
string comparison. The modulus MAGICSLOTS chosen in the Makefile is
ad-hoc: it is small enough to write a small switch without collision,
and will need changing if the hash function changes...

The function quick_lookup is the most important:
it either returns an index, for a local variable, or it does compute a
hashing value, and returns -1.

Another somewhat controversial decision is the use of string intervals.
This avoids either copying a string, or twiddling with a byte for cases
such as ${VAR}.

Finally, the variable name is stored within the variable itself. Since
a given variable name never changes, this makes sense. All that was needed
was a hash library with support for this.  Note that the hashing table
holds only a variable pointer AND the corresponding hashing value, WITHOUT
a modulo hashtablesize. Two reasons:
- hash resizes can be done faster, without having to recompute hashing values.
- locality of access. The hash table fits into memory without problem. Once
a candidate slot is found, we check the complete hashing value. Probability
of a collision is very small (32 bits...). So bringing up the whole
variable in memory at once is good: the name will almost always match, in
which case we want the variable value as well, so it makes sense to put
them together.

The ohash functions implement open hashing, as described in Knuth, but with
a variable table size.  Choosing powers of 2 sizes does not yield more
collisions, but it makes the hashing scheme much simpler. The thresholds at
which to expand/shrink the tables seem to work well in practice. The
default sizes were chosen such that the tables hardly ever shrink or expand
anyways (though I've tried with smaller/larger sizes to verify that the
shrinking/expanding worked correctly): larger Makefiles hold roughly
500/600 variables, which fits without trouble into a 1024-sized variable.

Disregard #ifdef STATS_HASH, this is some internal scaffolding I'm using
to measure make performance.

The only known issue with open-hashing is that deletions cannot create
empty slots, but do leave slots marked as `occupied once' so that lookup
works.  We use a well-known optimization which records those pseudo-empty
slots while looking up values. If the value is not found, the pseudo-empty
slot is returned to be filled. If the value is found, it is swapped with
the pseudo-empty slot. This is an improvement in both cases, since this
shortens the length of lookup chains, eventually pushing the pseudo-empty
slots to the end.

Reviewed by millert@ and miod@

/* $OpenBSD: error.h,v 1.2 2000/06/23 16:27:29 espie Exp $ */

/*-
 * Copyright (c) 1988, 1989, 1990, 1993
 *	The Regents of the University of California.  All rights reserved.
 * Copyright (c) 1989 by Berkeley Softworks
 * All rights reserved.
 *
 * This code is derived from software contributed to Berkeley by
 * Adam de Boor.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 * 3. All advertising materials mentioning features or use of this software
 *    must display the following acknowledgement:
 *	This product includes software developed by the University of
 *	California, Berkeley and its contributors.
 * 4. Neither the name of the University nor the names of its contributors
 *    may be used to endorse or promote products derived from this software
 *    without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 *
 *	from: @(#)nonints.h	8.3 (Berkeley) 3/19/94
 */
#ifndef ERROR_H
#define ERROR_H
void *emalloc __P((size_t));
char *estrdup __P((const char *));
void *erealloc __P((void *, size_t));
void *ecalloc __P((size_t, size_t));
int eunlink __P((const char *));

/* efree(x) works when x==NULL. STDC behavior, may need some different
 * definition for cross-builds on deficient systems */
#define efree	free

void *hash_alloc __P((size_t, void *));
void hash_free __P((void *, size_t, void *));
void *element_alloc __P((size_t, void *));

#endif	/* ERROR_H */