statx() v3
LWN.net needs you!Some developments just take a long time to truly come to fruition. That has proved to be the case for the proposed statx() system call — at least, the "long time" part has, even if we may still be waiting for "fruition". By most accounts, though, this extension to the stat() system call would appear to be getting closer to being ready. Recent patches show the current state of statx() and where the remaining sticking points are.Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing.
The stat() system call, which returns metadata about a file, has a long history, having made its debut in the Version 1 Unix release in 1971. It has changed little in the following 45 years, even though the rest of the operating system has changed around it. Thus, it's unsurprising that stat() tends to fall short of current requirements. It is unable to represent much of the information relevant to files now, including generation and version numbers, file creation time, encryption status, whether they are stored on a remote server, and so on. It gives the caller no choice about which information to obtain, possibly forcing expensive operations to obtain data that the application does not need. The timestamp fields have year-2038 problems. And so on.
David Howells has been sporadically working on replacing stat() since 2010; his version 3 patch (counting since he restarted the effort earlier this year) came out on November 23. While the proposed statx() system call looks much the same as it did when we looked at it in May, there have been a few changes.
The prototype for statx() is still:
int statx(int dfd, const char *filename, unsigned atflag, unsigned mask,
struct statx *buffer);
Normally, dfd is a file descriptor identifying a directory, and filename is the name of the file of interest; that file is expected to be found relative to the given directory. If filename is passed as NULL, then dfd is interpreted as referring directly to the file being queried. Thus, statx() supersedes the functionality of both stat() and fstat().
The atflag argument modifies the behavior of the system call. It handles a couple of flags that already exist in current kernels: AT_SYMLINK_NOFOLLOW to return information about a symbolic link rather than following it, and AT_NO_AUTOMOUNT to prevent the automounting of remote filesystems. A set of new flags just for statx() controls the synchronization of data with remote servers, allowing applications to adjust the balance between I/O activity and accurate results. AT_STATX_FORCE_SYNC will force a synchronization with a remote server, even if the local kernel thinks its information is current, while AT_STATX_DONT_SYNC inhibits queries to the remote server, yielding fast results that may be out-of-date or entirely unavailable.
The atflag parameter, thus, controls what statx() will do to obtain the data; mask, instead, controls which data is obtained. The available flags here allow the application to request file permissions, type, number of links, ownership, timestamps, and more. The special value STATX_BASIC_STATS returns everything stat() would, while STATX_ALL returns everything available. Reducing the amount of information requested might reduce the amount of I/O required to execute the system call, but some reviewers worry that developers will just use STATX_ALL to avoid the need to think about it.
The final argument, buffer, contains a structure to be filled with the relevant information; in this version of the patch this structure looks like:
struct statx {
__u32 stx_mask; /* What results were written [uncond] */
__u32 stx_blksize; /* Preferred general I/O size [uncond] */
__u64 stx_attributes; /* Flags conveying information about the file [uncond] */
__u32 stx_nlink; /* Number of hard links */
__u32 stx_uid; /* User ID of owner */
__u32 stx_gid; /* Group ID of owner */
__u16 stx_mode; /* File mode */
__u16 __spare0[1];
__u64 stx_ino; /* Inode number */
__u64 stx_size; /* File size */
__u64 stx_blocks; /* Number of 512-byte blocks allocated */
__u64 __spare1[1];
struct statx_timestamp stx_atime; /* Last access time */
struct statx_timestamp stx_btime; /* File creation time */
struct statx_timestamp stx_ctime; /* Last attribute change time */
struct statx_timestamp stx_mtime; /* Last data modification time */
__u32 stx_rdev_major; /* Device ID of special file [if bdev/cdev] */
__u32 stx_rdev_minor;
__u32 stx_dev_major; /* ID of device containing file [uncond] */
__u32 stx_dev_minor;
__u64 __spare2[14]; /* Spare space for future expansion */
};
Here, stx_mask indicates which fields are actually valid; it will be the intersection of the information requested by the application and what the filesystem is able to provide. stx_attributes contains flags describing the state of the file; they indicate whether the file is compressed, encrypted, immutable, append-only, not to be included in backups, or an automount point.
The timestamp fields contain this structure:
struct statx_timestamp {
__s64 tv_sec;
__s32 tv_nsec;
__s32 __reserved;
};
The __reserved field was added in the version 3 patch as the result of one of the strongest points of disagreement in recent discussions about statx(). Dave Chinner suggested that, at some point in the future, nanosecond resolution may no longer be adequate; he said that the interface should be able to handle femtosecond timestamps. He was mostly alone on that point; other participants, such as Alan Cox, said that the speed of light will ensure that we never need timestamps below nanosecond resolution. Chinner insisted, though, so Howells added the __reserved field with the idea that it can be pressed into service should the need arise in the future.
Chinner had a number of other objections about the interface, some of which have not yet been addressed. These include the definition of the STATX_ATTR_ flags, which shadow a set of existing flags used with the FS_IOC_GETFLAGS and FS_IOC_SETFLAGS ioctl() calls. Reusing the flags allows a micro-optimization of the statx() code but, Chinner says, it perpetuates some interface mistakes made in the past. Ted Ts'o offered similar advice when reviewing a 2015 version of the patch set, but version 3 retains the same flag definitions.
The largest of Chinner's objections, though, may well be the absence of a comprehensive set of tests for statx(). This code, he said, should not go in until those tests are provided:
This position has been echoed by others (Michael Kerrisk, for example) recently. The kernel does have a long history of merging new system calls that do not work as advertised, with corresponding pain resulting later on. Howells will likely end up providing such tests, but not yet:
The rate of change of the patch set does seem to be slowing so, perhaps,
its final form is beginning to come into focus. The history of this work
suggests that it would not be wise to predict its merging in the near
future, though. The stat() system call has been with us for a
long time; it's reasonable to expect that statx() will last for
just as long. A bit of extra "bikeshedding" to get the interface right
seems understandable in that context.
| Index entries for this article | |
|---|---|
| Kernel | Filesystems/stat() |
| Kernel | System calls |
