Ulrich Drepper

Closing

udrepper — Wed, 18 Dec 2013 04:34:12 GMT

I will not use this blog anymore. Instead I am hosting one on my own server with a much simpler (self-written) platform. Use the RSS file here.

udrepper — Fri, 01 Jun 2012 01:49:56 GMT

The original plan was to have some program sI wrote to be added to the procps or util-linux package but the maintainers haven't been responsive. Therefore here they are in a package on their own.

I call the package putils (available from my private server) and the following programs are available so far:

plimit
: Show or set the limits of a process
pfiles: Show information about the files open inside a process

These programs will be familiar to Solaris users. There are likely a few more programs to follow.

pagein

udrepper — Thu, 31 May 2012 23:51:09 GMT

I've updated the pagein tool to compile with a recent valgrind version. The tarball also contains a .spec file. I had to work around a bug in valgrind in Fedora 16 and 17.

The tarball

Cancellation and C++ Exceptions

udrepper — Thu, 05 Aug 2010 01:38:07 GMT

Cancellation and C++ Exceptions

In NPTL thread cancellation is implemented using exceptions. This does not in general conflict with the mixed use of cancellation and exceptions in C++ programs. This works just fine. Some people, though, write code which doesn't behave as they expect. This is a short example:

#include <cstdlib>
#include <iostream>
#include <pthread.h>

static pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER;
static pthread_cond_t c = PTHREAD_COND_INITIALIZER;

static void *tf (void *)
{
  try {
    ::pthread_mutex_lock(&m);
    ::pthread_cond_wait (&c, &m);
  } catch (...) {
    // do something
  }
}

int main ()
{
  pthread_t th;
  ::pthread_create (&th, NULL, tf, NULL);
  // do some work; simulate using sleep
  std::cout << "Wait a bit" << std::endl;
  sleep (1);
  // cancel the child thread
  ::pthread_cancel (th);
  // wait for it
  ::pthread_join (th, NULL);
}

The problem is in function tf. This function contains a catch-all clause which does not rethrow the exception. This is possible to expect but should really never happen in any code. The rules C++ experts developed state that catch-all cases must rethrow. If not then strange things can happen since one doesn't always know exactly what exceptions are thrown. The code above is just one example. Running it will produce a segfault:

$ ./test
Wait a bit
FATAL: exception not rethrown
Aborted (core dumped)

The exception used for cancellation is special, it cannot be ignored. This is why the program aborts.

Simply adding the rethrow will cure the problem:

@@ -13,6 +13,7 @@
     ::pthread_cond_wait (&c, &m);
   } catch (...) {
     // do something
+    throw;
   }
 }

But this code might not have the expected semantics. Therefore the more general solution is to change the code as such:

@@ -1,6 +1,7 @@
 #include <cstdlib>
 #include <iostream>
 #include <pthread.h>
+#include <cxxabi.h>
 
 static pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER;
 static pthread_cond_t c = PTHREAD_COND_INITIALIZER;
@@ -11,6 +12,8 @@
   try {
     ::pthread_mutex_lock(&m);
     ::pthread_cond_wait (&c, &m);
+  } catch (abi::__forced_unwind&) {
+    throw;
   } catch (...) {
     // do something
   }

The header cxxabi.h comes with gcc since, I think, gcc 4.3. It defines a special tag which corresponds to the exception used in cancellation. This exception is not catchable, as already said, which is why it is called __forced::unwind.

That's all. That is needed. This code can easily be added to existing code, maybe even with a single hidden use:

#define CATCHALL catch (abi::__forced_unwind&) { throw; } catch (...)

This macro can be defined predicated on the gcc version and the platform.

I still think it is better to always rethrow the execption, though.

IDN Support

udrepper — Fri, 07 May 2010 05:47:15 GMT

glibc has IDN support for getaddrinfo and getnameinfo for quite some time. It has to be explicitly enabled in the calls, though. Now that I can actually test this I’ve enabled IDN support in the getent program which comes with glibc. This screenshot shows getent in action. Apparently the site doesn’t use an IDN CNAME yet.

The changes are minimal and pushed into the git archive. IDN is enabled by default, non-IDN names still continue to work. In case there is some sort of problem the --no-idn option can be used to disable IDN support.

It is quite easy for other programs to enable IDN as well. Whether it all should just work automatically is another question. There is the problem of look-alike characters in the Unicode range which might undermine the certificate system.

Fedora and USB Mobile Broadband

udrepper — Tue, 04 May 2010 16:27:06 GMT

Outside the US I use a USB stick for Internet access. It is a Huawei E161, similar enough to other sticks like E160 etc.

Inserting it into a standard Fedora 12 system causes only the simulated CDROM to be mounted. This dual-mode is the root of the problem.

Looking for a solution one comes across many different solutions. The provider I use, Fonic, has something to download for Linux (a plus, even though they don’t provide any support for it). This seems to be an NDIS driver. There are several other ways (including wvdial and some KDE programs) which are documented.

It’s all much simpler with recent Fedora distributions. Just make sure you have the usb_modeswitch and the accompanying usb_modeswitch-data package installed. Version 1.1.2-3 is what I use. Make sure you reboot before trying to use it.

The usb_modeswitch package contains a program which switches the USB stick from mass storage mode into modem mode. It also contains appropriate udev rules to make this automatic if the device is known in the config files. If it is not known it’s quite simple to add.

Anyway, when inserting the stick the mode should then automatically be switched and then NetworkManager takes over. It recognizes the modem and the built-in rules for wireless broadband providers guide you through the rest of the installation process. You only need to know the provider and possibly the plan. That’s it.

Now when I insert the stick all I get asked is the PIN which has to be provided every time. Why, I don’t know, it should IMO be stored in the keyring just like all the other access information.

Anyway, for everybody with wireless broadband devices on Fedora, make sure usb_modeswitch is installed. There is an open bug in the Red Hat bugzilla to make NetworkManager depend on this package so that everything just works for more people.

glibc 2.10 news

udrepper — Sat, 18 Apr 2009 00:42:54 GMT

I might need a bit more space to explain the new features in glibc 2.10 than can reasonably be written down in the release notes. Therefore I’ll take some time to describe them here.

POSIX 2008

We (= Austin Group) have finished work on the 2008 revision of POSIX some time ago. In glibc 2.10 I’ve added the necessary feature select macros and more to glibc to support POSIX 2008. Most of it, at least. This was quite easy. A large part of the work which went into POSIX 2008 was to add functions which have been in glibc to POSIX. The Unix world catches up with Linux.

I had to implement one new function: psiginfo. This function is similar to psignal but instead of printing information for a simple signal it prints information for a real-time signal context.

A few things are left to be done. What I know right now is the implementation of the O_SEARCH and O_EXEC flags. This needs kernel support.

C++ compliance

The C standard defines functions like strchr in a pretty weak way because C has no function overloading:

char *strchr(const char *, int)

The string parameter and the return value type are as weak as possible. Non-constant strings can be passed as parameters and the result can be assigned to a constant string variable.

The problem of this is that the const-ness of the parameter is not preserved and reflected in the return value. This would be the right thing to do since the return value, if not NULL, is pointing somewhere in the parameter string.

C++ with its function overloading can do better. This is why C++ 1998 actually defines two functions:

char *strchr(char *, int)
const char *strchr(const char *, int)

These functions do preserve the const-ness. This is possible because these functions actually have different names after mangling. Actually, in glibc we use a neat trick of gcc to avoid defining any function with C++ binding but this is irrelevant here.

Anyway, the result of this change is that some incorrect C++ programs, which worked before, will now fail to compile.

const char *in = “some string”;
char *i = strchr(in, ‘i’);

This code will fail because the strchr version selected by the compiler is the second one which returns a constant string pointer. It is an error (not only a source of a warning) in C++ when a const pointer is assigned to a non-const pointer.

As I wrote, this is incorrect C++ code. But it might trip up some people.

C++ 201x support

There is one interface in the upcoming C++ revision which needs support in the C library, at least to be efficient. C++ 201x defines yet another set of interface to terminate a process and to register handlers which are run when this happens:

int at_quick_exit(void (*)(void))
void quick_exit(int)

The handlers installed with at_quick_exit will only be run when quick_exit is used and not when exit is used. No global destructors are run either. That’s the whole purpose of this new interface. If the process is in a state where the global destructors cannot be run anymore and the process would crash, quick_exit should be used.

DNS NSS improvement

In glibc 2.9 I already implemented an improvement to the DNS NSS module which optimizes the lookup of IPv4 and IPv6 addresses for the same host. This can improve the response time of the lookup due to parallelism. It also fixes a bug in name lookup where the IPv4 and IPv6 addresses could be returned for different hosts.

The problem with this change was that there are broken DNS servers and broken firewall configurations which prevented the two results from being received successfully. Some broken DNS servers (especially those in cable modems etc) only send one reply. For this reason Fedora had this change disabled in F10.

For F11 I’ve added a work-around for broken servers. The default behavior is the same as described above. I.e., we get the improved performance for working DNS servers. In case the program detects a broken DNS server or firewall because it received only one reply the resolver switches into a mode where the second request is sent only after the first reply has been received. We still get the benefit of the bug fix described above, though.

The drawback is that a timeout is needed to detect the broken servers or firewalls. This delay is experienced once per process start and could be noticeable. But the broken setups of the few people affected must not prevent the far larger group of people with working setups to experience the advantage of the parallel lookup.

There are also ways to avoid the delays, some old, some new:

Install a caching name server on this machine or somewhere on the local network. bind is known to work correctly.
Run nscd on the local machine. In this case the delay is incurred once per system start (i.e., at the first lookup nscd performs).
Add “single-request” to the options in /etc/resolv.conf. This selects the compatibility mode from the start.

All of these work-arounds are easy to implement. Therefore there is no reason to not have the fast mode the default which in any case will work for 99% of the people.

Use NSS in libcrypt

The NSS I refer to here is the Network Security Services packages. It provides libraries with implementations of crypto and hash functions, among other things. In RHEL the NSS package is certified and part of the EAL feature set.

To get compliance for the whole system every implementation of the crypto and hash functions would have to be certified. This is an expensive and time-consuming process. The alternative is to use everywhere the same implementation. This is what a change to libcrypt now allows.

Since NSS is already certified we can just use the implementation of the hash functions from the NSS libraries in the implementation of crypt(3) in libcrypt. Bob Relyea implemented a set of new interfaces in the libfreebl3 library to allow the necessary low-level access and freed libfreebl3 from dependencies on NSPR.

By default libcrypt is built as before. Only with the appropriate configure option is libfreebl3 used. There are no visible changes (except the dependency on libfreebl3) so users should not have to worry at all.

Combine this with the new password hashing I’ve developed almost two years ago and we have now fully certified password handling.

printf hooks

Certain special interest groups subverted the standardization process (again) and pressed through changes to introduce in the C programming language extensions to support decimal floating point computations. 99.99% of all the people will never use this stuff and still we have to live with it.

I refuse to add support for this to glibc because these extensions are not (yet) in the official language standard. And maybe even after that we’ll have it separately.

But the DFP extension call for support in printf. The normal floating-point formats cannot be used. New modifiers are needed.

The printf in glibc has for the longest time a way to extend it. One can install handlers for additional format specifiers. Unfortunately, this extension mechanism isn’t generic enough for the purpose of supporting DFP.

After a couple of versions of a patch from Ryan Arnold I finally finished the work and added a generic framework which allows installing additional modifiers and format specifiers.

int register_printf_specifier (int, printf_function,
                                        printf_arginfo_size_function)
int register_printf_modifier (wchar_t *)
int register_printf_type (printf_va_arg_function)

With these interfaces DFP printing functions can live outside glibc and still work as if the support were built in. For an example see my code to print XMM values.

`malloc` scalability

A change which is rather small in the number of lines it touches went in to make malloc more scalable. Before, malloctried to emulate a per-core memory pool. Every time when contention for all existing memory pools was detected a new pool is created. Threads stay with the last used pool if possible.

This never worked 100% because a thread can be descheduled while executing a malloc call. When some other thread tries to use the memory pool used in the call it would detect contention. A second problem is that if multiple threads on multiple core/sockets happily use malloc without contention memory from the same pool is used by different cores/on different sockets. This can lead to false sharing and definitely additional cross traffic because of the meta information updates. There are more potential problems not worth going into here in detail.

The changes which are in glibc now create per-thread memory pools. This can eliminate false sharing in most cases. The meta data is usually accessed only in one thread (which hopefully doesn’t get migrated off its assigned core). To prevent the memory handling from blowing up the address space use too much the number of memory pools is capped. By default we create up to two memory pools per core on 32-bit machines and up to eight memory per core on 64-bit machines. The code delays testing for the number of cores (which is not cheap, we have to read /proc/stat) until there are already two or eight memory pools allocated, respectively.

Using environment variables the implementation can be changed. If MALLOC_ARENA_TEST_ is set the test for the number of cores is only performed once the number of memory pools in use reaches the value specified by this envvar. If MALLOC_ARENA_MAX_ is used it sets the maximum number of memory pools used, regardless of the number of cores.

While these changes might increase the number of memory pools which are created (and thus increase the address space they use) the number can be controlled. Because using the old mechanism there could be a new pool being created whenever there are collisions the total number could in theory be higher. Unlikely but true, so the new mechanism is more predictable.

The important thing to realize, though, is when the old mechanism was developed. My machine at the time when I added Wolfram’s dlmalloc to glibc back in 1995 (I think) had 64MB of memory. We’ve come a long way since then. Memory use is not that much of a premium anymore and most of the memory pool doesn’t actually require memory until it is used, only address space. We have plenty of that on 64-bit machines. 32-bit machines are a different story. But this is why I limit the number of memory pools on 32-bit machines so drastically to two per core.

The changes include a second improvement which allow the free function to avoid locking the memory pool in certain situations.

We have done internally some measurements of the effects of the new implementation and they can be quite dramatic.

Information about `malloc`

There is an obscure SysV interface in glibc called mallinfo. It allows the caller to get some information about the state of the malloc implementation. Data like total memory allocated, total address space, etc. There are multiple problems with this interface, though.

The first problem is that it is completely unsuitable for 64-bit machines. The data types required by the SysV spec don’t allow for values larger 2^31 bytes (all fields in the structure are ints). The second problem is that the data structure is really specific to the malloc implementation SysV used at that time.

The implementation details of malloc implementations will change over time. It is therefore a bad idea to codify a specific implementation in the structures which export statistical information.

The new malloc_info function therefore does not export a structure. Instead it exports the information in a self-describing data structure. Nowadays the preferred way to do this is via XML. The format can change over time (it’s versioned), some fields will stay the same, other will change. No breakage. The reader just cannot assume that all the information will forever be available in the same form. There is no reader in glibc. This isn’t necessary, it’s easy enough to write outside glibc using one of the many XML libraries.

Automatic use of optimized function

Processor vendors these days spend time fine tuning the instruction sets of their products. Specialized instructions are introduced which can be used to accelerate the implementation of specific functions. One problem holding back the adoption of such instructions is that people want their binaries to work everywhere.

One example for such application-specific instructions are the SSE4.2 extensions Intel introduced in their Nehalem core. This core features special instructions for string handling. They allow optimized implementations of functions like strlen or strchr etc.

It would of course be possible to start the implementation of these functions with a test for this feature and then use the old or the new implementation. For functions where the total time a call takes is just a couple of dozen cycles this overhead is noticeable, though.

Therefore I’ve designed an ELF extension which allows to make the decision about which implementation to use once per process run. It is implemented using a new ELF symbol type (STT_GNU_IFUNC). Whenever the a symbol lookup resolves to a symbol with this type the dynamic linker does not immediately return the found value. Instead it is interpreting the value as a function pointer to a function that takes no argument and returns the real function pointer to use. The code called can be under control of the implementer and can choose, based on whatever information the implementer wants to use, which of the two or more implementations to use.

This feature is not yet enabled in Fedora 11. There is some more binutils work needed and then prelink has to be changed. My guess is that F11 will go out without glibc taking advantage of this feature itself. But we will perhaps enable it after the release, once binutils and prelink caught up.

Fedora 10 a little bit more secure

udrepper — Sat, 03 Jan 2009 00:21:44 GMT

Fedora 10 comes with filesystem capability support. Unfortunately it is not used by default in the packages which can take advantage of it. I think the excuse is that there people who build their own kernels and disable it. That's nonsense since there are many other options we rely on and which can be compiled out.

Anyway, you can do the following by hand. Unfortunately you have to do it every time the program is updated again.

sudo chmod u-s /bin/ping
sudo /usr/sbin/setcap cap_net_raw=ep /bin/ping
sudo chmod u-s /bin/ping6
sudo /usr/sbin/setcap cap_net_raw=ep /bin/ping6

Voilà, ping and ping6 are no SUID binaries anymore. Note that ls still signals (at least when you're using --color) that there is something special with the file, namely, there are filesystem attributes.

These are two easy cases. Other SUID programs need some research to see whether they can use filesystem capabilities as well and which capabilities they need.

Secure File Descriptor Handling

udrepper — Fri, 01 Aug 2008 23:25:50 GMT

During the 2.6.27 merge window a number of my patches were merge and now we are at the point where we can securely create file descriptors without the danger of possibly leaking information. Before I go into the details let's get some background information.

A file descriptor in the Unix/POSIX world has lots of state associated with it. One bit of information determines whether the file descriptor is automatically closed when the process executes an exec call to start executing another program. This is useful, for instance, to establish pipelines. Traditionally, when a file descriptor is created (e.g., with the default open() mode) this close-on-exec flag is not set and a programmer has to explicitly set it using

   fcntl(fd, F_SETFD, FD_CLOEXEC);

Closing the descriptor is a good idea for two main reasons:

the new program's file descriptor table might fill up. For every open file descriptor resources are consumed.
more importantly, information might be leaked to the second program. That program might get access to information it normally wouldn't have access to.

It is easy to see why the latter point is such a problem. Assume this common scenario:

A web browser has two windows or tabs open, both loading a new page (maybe triggered through Javascript). One connection is to your bank, the other some random Internet site. The latter contains some random object which must be handled by a plug-in. The plug-in could be an external program processing some scripting language. The external program will be started through a fork() and exec sequence, inheriting all the file descriptors open and not marked with close-on-exec from the web browser process.

The result is that the plug-in can have access to the file descriptor used for the bank connection. This is especially bad if the plug-in is used for a scripting language such a Flash because this could make the descriptor easily available to the script. In case the author of the script has malicious intentions you might end up losing money.

Until not too long ago the best programs could to is to set the close-on-exec flag for file descriptors as quickly as possible after the file descriptor has been created. Programs would break if the default for new file descriptors would be changed to set the bit automatically.

This does not solve the problem, though. There is a (possibly brief) period of time between the return of the open() call or other function creating a file descriptor and the fcntl() call to set the flag. This is problematic because the fork() function is signal-safe (i.e., it can be called from a signal handler). In multi-threaded code a second thread might call fork() concurrently. It is theoretically possible to avoid these races by blocking all signals and by ensuring through locks that fork() cannot be called concurrently. This very quickly get far too complicated to even contemplate:

To block all signals, each thread in the process has to be interrupted (through another signal) and in the signal handler block all the other signals. This is complicated, slow, possibly unreliable, and might introduce deadlocks.
Using a lock also means there has to be a lock around fork() itself. But fork() is signal safe. This means this step also needs to block all signals. This by itself requires additional work since child processes inherit signal masks.
Making all this work in projects which come from different sources (and which non-trivial program doesn't use system or third-party libraries?) is virtually impossible.

It is therefore necessary to find a different solution. The first set of patches to achieve the goal went into the Linux kernel in 2.6.23, the last, as already mentioned, will be in the 2.6.27 release. The patches are all rather simple. They just extend the interface of various system calls so that already existing functionality can be taken advantage of.

The simplest case is the open() system call. To create a file descriptor with the close-on-exec flag atomically set all one has to do is to add the O_CLOEXEC flag to the call. There is already a parameter which takes such flags.

The next more complicated is the solution chosen to extend the socket() and socketcall() system calls. No flag parameter is available but the second parameter to these interfaces (the type) has a very limited range requirement. It was felt that overloading the parameter is an acceptable solution. It definitely makes using the new interfaces simpler.

The last group are interfaces where the original interface simply doesn't provide a way to pass additional parameters. In all these cases a generic flags parameter was added. This is preferable to using specialized new interfaces (like, for instance, dup2_cloexec) because we do and will need other flags. O_NONBLOCK is one case. Hopefully we'll have non-sequential file descriptors at some point and we can then request them using the flags, too.

The (hopefully complete) list of interface changes which were introduced is listed below. Note: these are the userlevel change. Inside the kernel things look different.

Userlevel Interface	What changed?
open	O_CLOEXEC flag added
fcntl	F_DUPFD_CLOEXEC command added
recvmsg	MSG_CMSG_CLOEXEC flag for transmission of file descriptor over Unix domain socket which has close-on-exec set atomically
dup3	New interface taking an addition flag parameter (O_CLOEXEC, O_NONBLOCK)
pipe2	New interface taking an addition flag parameter (O_CLOEXEC, O_NONBLOCK)
socket	SOCK_CLOEXEC and SOCK_NONBLOCK flag added to type parameter
socketpair	SOCK_CLOEXEC and SOCK_NONBLOCK flag added to type parameter
paccept	New interface taking an addition flag parameter (SOCK_CLOEXEC, SOCK_NONBLOCK) and a temporary signal mask
fopen	New mode 'e' to open file with close-on-exec set
popen	New mode 'e' to open pipes with close-on-exec set
eventfd	Take new flags EFD_CLOEXEC and EFD_NONBLOCK
signalfd	Take new flags SFD_CLOEXEC and SFD_NONBLOCK
timerfd	Take new flags TFD_CLOEXEC and TFD_NONBLOCK
epoll_create1	New interface taking a flag parameter. Support EPOLL_CLOEXEC and EPOLL_NONBLOCK
inotify_init1	New interface taking a flag parameter (IN_CLOEXEC, IN_NONBLOCK)

When should these interfaces be used? The answer is simple: whenever the author is not sure that no asynchronous fork()+exec can happen or a concurrently running threads executes fork()+exec (or posix_spawn(), BTW).

Application writers might have control over this. But I'd say that in all library code one has to play it safe. In glibc we do now in almost all interfaces open the file descriptor with the close-on-exec flag set. This means a lot of work but it has to be done. Applications also have to change (see this autofs bug, for instance).

dual head xrandr configuration

udrepper — Fri, 23 May 2008 17:53:00 GMT

ajax told me that extra wide screens now work with the latest Fedora 9 binaries for X11. So I had to try it out and after some experimenting I got it to work. To save others the work here is what I did.

Hardware:

ATI FireGL V3600
2x Dell 3007FPW

I use the free driver, of course. No need for 3D here.

The old way to get a spanning desktop was to use Xinerama. This has been replaced by xrandr nowadays. xrandr is not just for external screens of laptops and to change the resolution. One can assign the origin of various screens and therefore display different parts of a bigger virtual desktop. This is the whole trick here. The /etc/X11/xorg.conf file I use is this:

Section "ServerLayout"
	Identifier     "dual head configuration"
	Screen      0  "Screen0" 0 0
	InputDevice    "Keyboard0" "CoreKeyboard"
EndSection

Section "InputDevice"
	Identifier  "Keyboard0"
	Driver      "kbd"
	Option	    "XkbModel" "pc105"
	Option	    "XkbLayout" "us+inet"
EndSection

Section "Device"
	Identifier  "Videocard0"
	Driver      "radeon"
	Option	    "monitor-DVI-0" "dvi0"
	Option	    "monitor-DVI-1" "dvi1"
EndSection

Section "Monitor"
	Identifier "dvi0"
	Option "Position" "2560 0"
EndSection

Section "Monitor"
	Identifier "dvi1"
	Option "LeftOf" "dvi0"
EndSection

Section "Screen"
	Identifier "Screen0"
	Device     "Videocard0"
	DefaultDepth     16
	SubSection "Display"
		Viewport   0 0
		Depth     16
		Modes	"2560x1600"
		Virtual	5120 1600
	EndSubSection
EndSection

Fortunately X11 configuration got much easier since I had to edit the file by hand. I started from the most basic setup for a single screen which the installer or config-system-display will be happy to create for you. The important changes on top of this initial version are these:

	Option	    "monitor-DVI-0" "dvi0"
	Option	    "monitor-DVI-1" "dvi1"

These lines in the Device section announce the two screens. It is unfortunately not well (at all?) documented that the first parameter strings are magic. If you ran xrandr -q on your system with two screens attached you'll see the identifiers assigned to the screens by the system. In my case:

$ xrandr -q
Screen 0: minimum 320 x 200, current 5120 x 1600, maximum 5120 x 1600
DVI-1 connected 2560x1600+0+0 (normal left inverted right x axis y axis) 646mm x 406mm
...
DVI-0 connected 2560x1600+2560+0 (normal left inverted right x axis y axis) 646mm x 406mm
...

Add to the names DVI-0 and DVI-1 the magic prefix monitor- and add as the second parameter string an arbitrary identifier. Do not drop or change the monitor- prefix, that's the main magic which seems to make all this work. Then create two monitor sections in the xorg.conf file, one for each screen:

Section "Monitor"
	Identifier "dvi0"
	Option "Position" "2560 0"
EndSection

Section "Monitor"
	Identifier "dvi1"
	Option "LeftOf" "dvi0"
EndSection

The Identifier lines must of course match the identifiers used in the Device section. The rest are options which determine what the screens show. Since the LCDs have a resolution of 2560x1600 and since I want to have a spanning desktop and the DVI-0 connector is used for the display on the right side, I'm using an x-offset of 2560 and an y-offset of 0 for that screen. Then just tell the server to place the second screen at the left of it and the server will figure out the rest.

What remains to be done is to tell the server how large the screen in total is. That's done using

		Virtual	5120 1600

The numbers should explain themselves. Now the two screens show non-overlapping regions of the total desktop with no area not displayed, all due to the correct arithmetic in the calculation of the total screen size and the offset.

Note: there is only one Screen section. That's something which is IIRC different from the last Xinerama setup I did years ago.

Producing PDFs

udrepper — Thu, 22 Nov 2007 02:35:21 GMT

I don't want to throw this in with the announcement of the availability of the paper on memory and cache handling but I also don't want to forget it. So, here we go.

I write all the text I can using TeX (PDFLaTeX to be exact). This leads directly to a PDF document without intermediate steps. The graphics are done using Metapost because I'm better at programming than at drawing. Metapost produces Postscript-like files which some LaTeX macros then read and directly integrate into the PDF output.

The result in this case is a PDF with 114 pages which is only 934051 bytes in size. Just about 8kB for each page. Given that the text is multi-column and the numerous graphics in the text this is amazingly small.

I mentioned before how badly OO.org sucks at exporting graphics. I bad all the other word processor, spreadsheets, etc suck just as badly. Also generated PDFs for text is much, much bigger.

My guess is that if I'd written the document with OOO.org the size would be north of 4MB, probably significantly more. I cannot understand why people do this to themselves and, more importantly, to others.

Memory and Cache Paper

udrepper — Thu, 22 Nov 2007 02:09:09 GMT

Well, it's finally done. I've uploaded the PDF of the memory and cache paper to my home page. You can download it but do not re-publish it or make it available in any form to others. I do not want multiple copies flying around, at least not while I'm still intending to maintain the document.

With Jonathan Corbet's help the text should actually be readable. I had to change some of the text in the end to accommodate line breaks in the PDF. So I might have introduced problems, don't think bad about Jonathan's abilities. Aside, this is a large document. You simply go blind after a while, I know I do.

Which brings me to the next point. Even though I intend to maintain the document, don't expect me to do much in the near future. I've been working on it for far too long now and need a break. Integrating all the editing Jonathan produced plus today's line breaking have given me the rest. I haven't even integrated all the comments I've received. I know the structure of the document is in a few places a bit weak, esp section 5 which contains a lot of non-NUMA information. But it was simply too much work so far. Maybe some day.

The Evils of pkgconfig and libtool

udrepper — Tue, 13 Nov 2007 02:05:25 GMT

If you need more proof that this insane just look at some of the packages using it. I recently was looking at krb5-auth-dialog. The output of ldd -u -r on the original binary shows 26 unused DSOs.

This can be changed quite easily: add -Wl,--as-needed to link line. Do this in case of this package all but one of the unused dependencies is going away. This has several benefits:

The binary size is actually measurably reduced.

   text    data     bss     dec     hex filename
  35944    6512      64   42520    a618 src/krb5-auth-dialog-old
  35517    6112      64   41693    a2dd src/krb5-auth-dialog

That’s a 2% improvement. Note that all the saved dependencies are all recursive dependencies. The runtime is therefore not much effected (only a little). The saved data is pure overhead. Multiply the number by the thousands of binaries and DSOs which are shipped and the savings are significant.

The second problem to mention here is that not all unused dependencies are gone because somebody thought s/he is clever and uses -pthread in one of the pkgconfig files instead of linking with -lpthread. That’s just stupid when combined with the insanity called libtool. The result is that the -Wl,--as-needed is not applied to the thread library.

Just avoid libtool and pkgconfig. At the very least fix up the pkgconfig files to use -Wl,--as-needed.

Energy saving is everybody's business

udrepper — Fri, 09 Nov 2007 04:44:21 GMT

With the wide acceptance of laptop and even smaller devices more and more people have been exposed to devices limited by energy consumption. Still, programmers don't pay much attention to this aspect.

This statement is not entirely accurate: there has been a big push towards energy conservation in the kernel world (at least in the Linux kernel). With the tickless kernels we have the infrastructure to sleep for long times (long is a relative term here). Other internal changes avoid unnecessary wakeups. It is now realy up to the userlevel world to do its part.

The situation is pretty dire here. There are some projects (e.g., PowerTOP) which highlight the problems. Still, not much happens.

I've been somewhat guilty myself. nscd (part of glibc) was waking up every 5 seconds to clean up its cache, even if often was to be done. This program structure has several reasons. Good ones, but not ultimate reason. So I finally bit the bullet and changed the program structure significantly to better enable wakeup. The result is that now nscd at all times determines when the next cache cleanup is due and sleeps until then. Cache cleanups might be many hours out, so the code improved from one wakeups every 5 seconds to one wakeup every couple of hours.

nscd is a very small drop in the bucket, though. Just look at your machine and examine the running processes and those which are regularly started. PowerTOP cannot realy help here (Arjan said something will be coming soon though).

There is a tool which can help, though: systemtap. Simply create a small script which traps syscalls the violators will use and disply process information. The syscalls to use include: open, stat, access, poll, epoll, select, nanosleep, futex. For the latter five it is a matter of small timeout values which is the problem.

I'll post a script to do this soon (just not now). But the guilty parties probably already know who they are. Just don't do this quasi busy waiting!

If a program has to react to a file change or removal or creation, use inotify
for internal cleanups, choose reasonable values and then compute the timeout so that you don't wake up when nothing has to be done.

If you want to see how not to do it, look at something like the flash player (the proprietary one). If you inadvertently have started it it'll remain active (even if no flash page is displayed) and it is basically busy waiting on something.

Let's show the proprietary software world we can do better.

Part 2 released

udrepper — Mon, 01 Oct 2007 15:09:36 GMT

Jonathan and crew published part 2 of the paper. If you have an LWN subscription you can read it here.

Directory Reading

udrepper — Thu, 27 Sep 2007 17:45:53 GMT

In the last weeks I have seen far too much code which reads directory content in horribly inefficient ways to let this slide. Programmers really have to learn doing this efficiently. Some of the instances I've seen are in code which runs frequently. Frequently as in once per second. Doing it right can make a huge difference.

The following is an exemplary piece of code. Not taken from an actual project but it shows some of the problems quite well, all in one example. I drop the error handling to make the point clearer.

  DIR *dir = opendir(some_path);
  struct dirent *d;
  struct dirent d_mem;
  while (readdir_r(d, &d_mem, &d) == 0) {
    char path[PATH_MAX];
    snprintf(path, sizeof(path), "%s/%s/somefile", some_path, d->d_name);
    int fd = open(path, O_RDONLY);
    if (fd != -1) {
      ... do something ...
      close (fd);
    }
  }
  closedir(dir);

How many things are inefficient at best and outright problematic in some cases?

Let's enumerate:

Why use readdir_r?
Even the use of readdir is dangerous.
Creating a path string might exceed the PATH_MAX limit.
Using a path like this is racy.
What if the directory contain entries which are not directories?

readdir_r is only needed if multiple threads are using the same directory stream. I have yet to see a program where this really is the case. In this toy example the stream (variable dir) is definitely not shared between different threads. Therefore the use of readdir is just fine. Should this matter? Yes, it should, since readdir_r has to copy the data in into the buffer provided by the user while readdir has the possibility to avoid that.

Instead of readdir code should in fact use readdir64. The definition of the dirent structure comes from an innocent time when hard drive with a couple of dozen MB of capacity were huge. Things change and we need larger values for inode numbers etc. Modern (i.e., 64-bit) ABIs do this by default but if the code is supposed to be used on 32-bit machines as well the *64 variants should always be used.

Path length limits are becoming an ever-increasing problem. Linux, like most Unix implementations, imposes a length limit on each filename string which is passed to a system call. But this does not mean that in general path names have any length limit. It just means that longer names have to be implicitly constructed through the use of multiple relative path names. In the example above, what happens if some_path is already close to PATH_MAX bytes in size? It means the snprintf call will truncate the output. This can and should of course be caught but this doesn't help the program. It is crippled.

Any use of filenames with path components (i.e., with one or more slashes in the name) is racy and an attacker change any of the contained path components. This can lead to exploits. In the example, the some_path string itself might be long and traverse multiple directories. A change in any of these will lead to the open call not reaching the desired file or directory.

Finally, while the code above works (the open call will fail if d->d_name does not name a directory) it is anything but efficient. In fact, the open system calls are quite expensive. Before any work is done, the kernel has to reserve a file descriptor. Since file descriptors are a shared resource this requires coordination and synchronization which is expensive. Synchronization also reduces parallelism, which might be a big issue in some code. The open call then has to follow the path which also is not free.

To make a long story short, here is how the code should look like (again, sans error handling):

  DIR *dir = opendir(some_path);
  int dfd = dirfd(dir);
  struct dirent64 *d;
  while ((d = readdir64(dir)) != NULL) {
    if (d->d_type != DT_DIR && d->d_type != DT_UNKNOWN)
      continue;
    char path[PATH_MAX];
    snprintf(path, sizeof(path), "%s/somefile", d->d_name);
    int fd = openat(dfd, path, O_RDONLY);
    if (fd != -1) {
      ... do something ...
      close (fd);
    }
  }
  closedir(dir);

This rewrite addresses all the issues. It uses readdir64 which will do just fine in this case and it is safe when it comes to huge disk drives. It uses the d_type field of the dirent64 to check whether we already know the file is no directory. Most of Linux's directories today fill in the d_type field correctly (including all the pseudo filesystems like sysfs and proc). Those file systems which do not have the information handy fill in DT_UNKNOWN which is why the code above allows this case, too. In some program one also might want to allow DT_LNK since a symbolic link might point to a directory. But more often enough this is not the case and not following symlinks is a security measure.

Finally, the new code uses openat to open the file. This avoids the length path lookup and it closes most of the races of the original open call since the pathname lookup starts at the directory read by readdir64. Any change to the filesystem below this directory has no effect on the openat call. Also, since now the generated path is very short (just the maximum of 256 bytes for d_name plus 10 we know that the buffer path is sufficient.

It is easy enough to apply these changes to all the places which read directories. The result will be small, faster, and safer code.

The Series is Underway

udrepper — Fri, 21 Sep 2007 20:41:39 GMT

Jon Corbet has edited the first two sections of the document I mentioned earlier here and here.

The document will be published in multiple installments, beginning with Sections 1 and 2 which are available now. Since LWN is a business the reasonable limitation is put in place that for the first week only subscribers have access to it.

So, get a subscription to LWN.

If you find mistakes in the text let me know directly, either as a comment here or as a personal mail. Don't bother J on with that.

SHA for crypt

udrepper — Wed, 19 Sep 2007 21:55:10 GMT

Just a short note: I added SHA support to the Unix crypt implementation in glibc. The reason for all this (including replies to the extended "NIH" complaints) can be found here.

Publishing Update

udrepper — Tue, 14 Aug 2007 03:43:33 GMT

A few weeks back I asked how I should publish the document on memory and cache handling. I got quite some feedback.

There was the usual it doesn't matter but I want it for free crowd.
Then there was the even $8 for a book is too much for me. These are people from outside the US and $8 translated to local currency and income is certainly far too much for many people. I do not throw this group in with the first.
Several people (all or mostly US-based) thought the idea of printed paper to be nice. The price was no issue.
Most people said a freely PDF is more important than a printed copy. Some derogatory comments about lecturers who require books were heard. Others said editing isn't important.

Because of this first obnoxious group of people I would probably have gone with a print-only route. This attitude that just because somebody works on free software he always has to make everything available for free makes me sick. These are most probably the same people who never in their life produced anything that other found of value or they are the criminals working on (mostly embedded) project exploiting free software.

But since I really want the document to be widely distributed and available to places where $8 is too much money I will release the PDF for free. But this won't happen right away. Unlike some of the people making comments I do think that editing is important. Fortunately having professional editing and a free PDF don't exclude each other.

I'll not go with a publisher (esp not these $%# at O'Reilly, as several people suggested). This would in most cases have precluded retaining the copyright and making the text available for free.

Instead the nice people at LWN, Jonathan Corbet and crew, will edit the document. They will then serialize it, I guess, along with the weekly edition. It's up to Jon to make this decision. The document has 8 large section including introduction which means my guess is that after 7 installments the whole document is published. Once this has happened I'll then make the whole updated and edited PDF available.

This means if you think it's worth it, get a subscription to the LWN instead of waiting a week to read it for free.

So in summary, I get professional editing, keep the copyright, and might be able to help getting some more subscribers for the LWN. Win, win, win. If the L in LWN bothers you I've news for you: the document itself is very Linux-centric.

I haven't forgotten the printed version. I've read a bit more of the Lulu documentation. Apparently there is a model where I don't have to pay anything. People ordering the book pay a per-copy price and that's it (apparently with discounts for larger orders). If I submit it in letter/A4 format I don't have to do any reformatting and the price is less (for the color print) since there are fewer pages.

I'll probably try to do this after the PDF is freely available. People who like to have something in their hands will have their wishes. The only problem I see right now is that Lulu has a stupid requirement that the PDF documents must be generated with proprietary tools from Adobe. Of course I don't do this, I use pdfTeX. If this proves to be the case I guess I'll have to have a word with Bob Young...

Increasing Virtualization Insanity

udrepper — Mon, 13 Aug 2007 23:52:58 GMT

People are starting to realize how broken the Xen model is with its privileged Dom0 domain. But the actions they want to take are simply ridiculous: they want to add the drivers back into the hypervisor. There are many technical reasons why this is a terrible idea. You'd have to add (back, mind you, Xen before version 2 did this) all the PCI handling and lots of other lowlevel code which is now maintained as part of the Linux kernel. This would of course play nicely into Xensource's (the company) pocket. Their technical people so far turn this down but I have no faith in this group: sooner or later they want to be independent of OS vendors and have their own mini-OS in the hypervisor. Adios remaining few advantages of the hypervisor model. But this is of course also the direction of VMWare who loudly proclaim that in the future we won't have OS as they exist today. Instead only domains with mini-OS which are ideally only hooks into the hypervisor OS where single applications run.

I hope everybody realizes the insanity of this:

If they really mean single application this must also mean single-process. If not, you'll have to implement an OS which can provide multi-process services. But this means that you either have no support to create processes or you rely on an mini-OS which is a front for the hypervisor. In VMWare's case this is some proprietary mini-OS and I imagine Xensource would like to do the very same.
Imagine that you have such application domains. All nicely separated because replicated. The result is a maintainance nightmare. What if a component which is needed in all application domains has to be updated? In a traditional system you update the one instance per machine/domain. With application domains you have to update every single one and not forget one.

And worst of all:

Don't people realize that this is the KVM model just implemented much poorer and more proprietary? If you invite drivers and all the infrastructure into the hypervisor it is not small enough anymore to have a complete code review. I.e., you end up with a full OS which is too large for that. Why not use one which already works: Linux.

I fear I have to repeat myself over and over again until the last person recognizes that the hypervisor model does not work for the type of virtualization for commodity hardware we try to achieve. Using a hypervisor was simply the first idea which popped into people's head since it was already done before in quite different environments. The change from Xen v1 to v2 should have shown how rotten the model is. Only when you take a step back you can see the whole picture and realize the KVM model is not only better, it's the only logical choice.

I know people have invested into Xen and that KVM is not yet there yet but a) there has been a lot of progress in KVM-land and b) the performance is constantly improving and especially with next year's processor updates hardware virtualization costs will go down even further.

For sysadmin types this means: do what you have to do with Xen for now. But keep the investments small. For developers this means: don't let yourself be tied to a platform. Use an abstraction layer such as libvirt to bridge over the differences. For architects this means: don't looking to Xen for answers, base your new designs on KVM.

How to publish?

udrepper — Mon, 25 Jun 2007 17:08:04 GMT

That is meant as a question to the readers. The problem I have right now is that I have more or less finished the paper accompanying one of the talks I gave at the Red Hat Summit in Nashville last year. The slides for the talk about CPU Caches are available. But quite honestly, as most slide sets, they don't do the topic any justice. I had to compress things to < 45 mins which is of course not enough. The paper covers everything I can currently think of and which makes sense with relation to CPU caches and CPU memory, as far as programmers are concerned (nothing for hardware people). The title I currently use it

What Every Programmer Should Know About Memory

and I think this is adequate.

For this reason I usually write a paper on the important topics I talk about. And this topic qualifies. I consider the topic especially important since it's almost never treated in the software world at all. College grads today in most cases have not the slightest clue about this topic. Ideally I'd like the paper be picked up by some lecturers (like they do for many of my other publications) and use it in a course. Heck, I'm even willing to teach it myself if that is what it takes to get credibility.

The problem I'm facing is that the document is (using my usual paper style, two column etc) around 100 densely packed pages long. Some of the people I've shown it to suggested that it should rather be published as a book. I'm a bit unsure about this. I have a few publisher who for a long time keep pestering me about writing something for them (some even prematurely submitted titles to distributors!). One I talked to would be willing to print it even though it's thin for a book. But there are a lot of pluses and minuses all around:

My PDF only: Going this route means the document is easy to change and extend. The format is exactly as I want it. The visibility is restricted, not in the print market. No professional review. Due to the size (and use of color) it is hard to print.
Go with a publisher: Professional editing, maybe a college edition, visibility through listing in catalogs etc. Additionally available as e-book. But it likely means the color has to go (printing in color is expensive) and there will be no free-of-charge copy. Getting a revision out will be almost impossible.
Go with Lulu: The alternative publishing route: I could submit an appropriately formatted PDF to Lulu and have them publish it. Demand printing, ISBN available. B&W and color printing possible. Even e-books if anybody cares. No professional editing.

Going with Lulu has the advantages I want but it's quite an effort. And there are costs associated with it. I do not plan to make money out of all this but I'd have to recover the costs. Excess gains would probably go to charity (in my case this is the Monterey Bay Aquarium in case anybody is interested).

So, the questions I have and would like to get some feedback on are:

Are printed copies wanted at all? Especially for those teaching, is it a prerequisite?
If yes, do you prefer a professional, more expensive book?
Or perhaps an amateur-ish publication which is either B&W and cheap (I guess not much more than $10)...
... or a colored print for around $30. The paper has currently around 60 diagrams and color helps.

If you have an opinion and a mail or add a comment to the blog (which won't be published). I know it is not easy to answer given that you haven't seen the material. But this is the same for most books, isn't it? Look at the slides and assume 100 times more details. I doubt I'll find many people who know all these details now (I had to do research myself).

grep and color

udrepper — Fri, 01 Jun 2007 13:53:06 GMT

I cannot believe there are still people who are surprised they see me working with the command line on my machine or when I tell them otherwise the the output of grep can use highlighting. Just add --color to the command line (with the optional argument just like ls). I've implemented that more than six years ago. In my .bashrc I have the following:

alias egrep='egrep --color=tty -d skip'
alias egrpe='egrep --color=tty -d skip'
alias fgrep='fgrep --color=tty -d skip'
alias fgrpe='fgrep --color=tty -d skip'
alias grep='grep --color=tty -d skip'
alias grpe='grep --color=tty -d skip'

Yes, I mistype grep often enough to warrant the extra aliases. Using tty as the color mode mean that if I pipe the output into another program there won't be any color escape sequences added which could irritate those programs.

Just make your life easier and add such aliases, too.

pthread_t and similar types

udrepper — Tue, 22 May 2007 18:46:46 GMT

Constantly people complain that the runtime does not catch their mistakes. They are hiding behind this requirement in the POSIX specification (for pthread_join in this case, also applies to pthread_kill and similar functions):

       The pthread_join() function shall fail if:
       [...]

       ESRCH  No thread could be found corresponding to that specified by the given thread ID.

The glibc implementation follows this requirement to the letter. *IFF* we can detect that the thread descriptor is invalid we do return ESRCH.

But: the above does not mean that all uses of invalid thread descriptors must result in ESRCH errors. The reason is simple: the standard does not restrict the implementation in any way in the definition of the type pthread_t. It does not even have to be an arithmetic type. This means it is valid to use a pointer type and this is just what NPTL does.

Nobody argues that functions like strcpy should not dump a core in case the buffer is invalid. The same for pthread_attr_t references passed to pthread_attr_init etc. The use of pthread_t when defined as a pointer is no different. The only complication is in the understanding that pthread_t can be a pointer type. This is obvious for void* etc.

In the POSIX committee we discussed several times changing the pthread_join and pthread_kill man pages. The ESRCH errors could be marked as may fail. But

this really is not necessary, see above.
it would mean we have to go through the entire specification and treat every other place where this is an issue the same way.

If somebody wants to do the work associated with the second step above and we have confidence in the results, we (= Austin Group) might make the change at some later date. But it is a rather high risk for no real gain. Programmers have to educate themselves anyway.

What remains is the question: how can programs avoid these mistakes? It is actually pretty simple: the program should make sure that no calls to pthread_kill, for instance, can happen when the thread is exiting. One way to solve this problem is:

Associate a variable running of some sort and a mutex with each thread.
In the function started by pthread_create (the thread function) set running to true.
Before returning from the thread function or calling pthread_exit or in a cancellation handler acquire the mutex, set running to false, unlock the mutex, and proceed.
Any thread trying to use pthread_kill etc first must get the mutex for the target thread, if running is true call pthread_kill, and finally unlock the mutex.

This ensures that no invalid descriptor is used. But I can already hear people complain:

This is too expensive!

That is ridiculous. The implementation would have to do something similar if it would try to catch bad thread descriptors. In fact, it would have to do more. What is important is to recognize that this price would have to be paid by every program, not just the buggy ones. This is wrong. Only those people who need this extra protection should pay the price.

But I don't have control over the code calling pthread_create!

Boo hoo, cry me a river. Don't expect sympathy for using proprietary software. I will never allow good free software to be shackled because of proprietary code. If you cannot get this changed in the code you pay good money for this just means it is time to find a new supplier or, even better, use free software.

In summary, this is entirely a problem of the programs which experience them. Existing Linux systems are proof that it is possible to write complex programs without requiring the implementation to help incompetent programmers. We will have a few more words in the next revision of the POSIX specification which talk about this issue. But I expect they will be ignored anyway and all focus remains on the shall fail errors of pthread_kill etc.

The Growing Importance of Parallel Programming

udrepper — Sat, 12 May 2007 17:49:32 GMT

At the 2007 Red Hat Summit in San Diego which just which just wrapped up yesterday I gave a talk about parallel programming which the marketing folks retitled Programming for tomorrow's high speed processors, today.

The crux of the talk is that programmers in the future cannot always rely on improving hardware to make their programs run faster. This is summarized nicely in the following graph which I generated from performance data for x86 processors.

The crucial part is the divergence of the two lines going forward and the flattening of the blue line. This means programs which are not able to take advantage of ever increasing numbers of processing cores simply won't run (much) faster.

Parallel programming is hard. There are algorithms to change to allow more than one thread in parallel. Well, not necessarily thread, especially on Linux one should use processes if the sharing requirement between the processes makes this feasible.

There are data structures to lay out correctly to allow a) vectorization and b) data parallelization. Vectorization is important if one wants to come even close to the peak performance listed for the processor. But when you do this you also have to know a lot about CPU design (pipelines etc), caches, and memory.

And then there is something people might have heard about but didn't really register: co-processors are back. Intel's Geneseo and AMD's Torrenza are technologies to couple 3rd party processors tightly to the existing processor-memory mash.

In general I think the industry is entirely ill-prepared for these upcoming changes. Many/most programmers are not able to write code with these requirements. Companies and other organizations will have to invest into education. The system provides (like Red Hat) have to find ways to make parallel programming easier.

One big step in the right direction is OpenMP. Officially supported in gcc 4.2 Red Hat has backported the changes to our gcc 4.1 used in RHEL5 and Fedora Core 6 and later. Not only does OpenMP allow relatively easy conversion of existing code, it also frees the programmer from dealing with all the details of thread lifetime handling, thread stacks, etc. Even mutual exclusion happens at a higher level. All this is good, It will make programmers more productive if only it is used more often.

But there is one more thing: the OpenMP runtime is basically in complete control. It can decide on using just one thread or many threads. It can decide where to run threads and many more things. All these details are hidden from the programmer. This is a good thing since it allows the runtime to perform optimizations. I'll have more about this at a later date.

In summary, programmers have to learn, re-learn or for the first time, about parallelism. I think the topic of this talk is very important. If you are a Red Hat customer you could potentially ask for somebody from Red Hat to come in and talk about these issues. I'll give the slides and the details to our consulting organization and possibly also sales engineers. I cannot make any promises but I'll encourage those gals and guys to be willing to talk about this. If you're a big enough customer and you demand it, I might (have to) come out myself, if this is wanted. Or somebody can organize gatherings in places I have to go to anyway and have me speak there.

nscd and DNS TTL

udrepper — Sat, 12 May 2007 17:04:35 GMT

Recently some people spread their non-existing knowledge about nscd (Name Service Cache Daemon) by claiming it ignores the TTL (time-to-live) value a DNS server returns. As far as I know this rampant ignorance is especially wide-spread in the ubuntu world. They claim that for this reason one has to run a local, caching DNS server. This is complete nonsense. nscd does handle TTL for a long time now (committed to the public CVS on 2004-9-15). All reasonable requests are handled, i.e., all getaddrinfo requests.

As I have pointed out many times before (here and here and in other places), it is completely unacceptable today to use gethostbyname etc. These functions simply don't work. Which is why I found it unnecessary to make the implementation of nscd more complicated and add more compatiblity and maintenance problems just to fix one of the many problems these interfaces have. Just don't use them and convert all your programs (e.g., I think we've done just that for all of RHEL and Fedora nowadays). Also don't use

  getent hosts some.host

You have to use

  getent ahosts some.host

For all getaddrinfo lookups the TTL value from DNS replies takes precedence over the TTL value from /etc/nscd.conf. The latter is used for services which do not provide a TTL themselves (today all other services).

Ulrich Drepper

Closing

pagein

Cancellation and C++ Exceptions

Cancellation and C++ Exceptions

IDN Support

Fedora and USB Mobile Broadband

glibc 2.10 news

POSIX 2008

C++ compliance

C++ 201x support

DNS NSS improvement

Use NSS in libcrypt

printf hooks

malloc scalability

Information about malloc

Automatic use of optimized function

Fedora 10 a little bit more secure

Secure File Descriptor Handling

dual head xrandr configuration

Producing PDFs

Memory and Cache Paper

The Evils of pkgconfig and libtool

Energy saving is everybody's business

Part 2 released

Directory Reading

The Series is Underway

SHA for crypt

Publishing Update

Increasing Virtualization Insanity

How to publish?

grep and color

pthread_t and similar types

The Growing Importance of Parallel Programming

nscd and DNS TTL

`malloc` scalability

Information about `malloc`