As
The Linux Programming Interface
went to press in August 2010, it was up to date with the then current
versions of the Linux kernel (2.6.35),
glibc (2.12),
and the POSIX.1/Single UNIX Standard (POSIX.1-2008/SUSv4).
Because the developers of both the Linux kernel and
glibc
are committed to maintaining
ABI
compatibility,
virtually all of the details provided in TLPI should
remain accurate in the future.
However, (a few) new features are added to the kernel and
glibc
with each release.
As each new release of the Linux kernel and
glibc occurs,
this page will attempt to note new interface features that are
relevant to the subject area of the book.
In addition, this page provides links to information
about subsequent updates to the POSIX/SUS standard.
See also:
LWN
articles on the kernel 6.13 merge window
(...)
and the Kernel Newbies
kernel 6.13 summary.
Linux 6.12 (17 November 2024)
API changes include the following:
…
See also:
LWN
articles on the kernel 6.12 merge window
(1,
2)
and the Kernel Newbies
kernel 6.12 summary.
Linux 6.11 (15 Sep 2024)
API changes include the following:
…
See also:
LWN
articles on the kernel 6.11 merge window
(1,
2)
and the Kernel Newbies
kernel 6.11 summary.
Linux 6.10 (14 July 2024)
API changes include the following:
…
See also:
LWN
articles on the kernel 6.10 merge window
(1,
2)
and the Kernel Newbies
kernel 6.10 summary.
Linux 6.9 (12 May 2024)
API changes include the following:
…
See also:
LWN
articles on the kernel 6.9 merge window
(1,
2)
and the Kernel Newbies
kernel 6.9 summary.
Linux 6.8 (10 March 2024)
API changes include the following:
…
See also:
LWN
articles on the kernel 6.8 merge window
(1,
2)
and the Kernel Newbies
kernel 6.8 summary.
Linux 6.7 (7 January 2024)
API changes include the following:
…
See also:
LWN
articles on the kernel 6.7 merge window
(1,
2)
and the Kernel Newbies
kernel 6.7 summary.
Linux 6.6 (29 October 2023)
API changes include the following:
…
See also:
LWN
articles on the kernel 6.6 merge window
(1,
2)
and the Kernel Newbies
kernel 6.6 summary.
Linux 6.5 (27 August 2023)
API changes include the following:
…
See also:
LWN
articles on the kernel 6.5 merge window
(1,
2)
and the Kernel Newbies
kernel 6.5 summary.
Linux 6.4 (25 June 2023)
API changes include the following:
…
See also:
LWN
articles on the kernel 6.4 merge window
(1,
2)
and the Kernel Newbies
kernel 6.4 summary.
Linux 6.3 (23 April 2023)
API changes include the following:
…
See also:
LWN
articles on the kernel 6.3 merge window
(1,
2)
and the Kernel Newbies
kernel 6.3 summary.
Linux 6.2 (19 February 2023)
API changes include the following:
…
See also:
LWN
articles on the kernel 6.2 merge window
(1,
2)
and the Kernel Newbies
kernel 6.2 summary.
Linux 6.1 (11 December 2022)
API changes include the following:
…
See also:
LWN
articles on the kernel 6.1 merge window
(1,
2)
and the Kernel Newbies
kernel 6.1 summary.
Linux 6.0 (2 October 2022)
API changes include the following:
…
See also:
LWN
articles on the kernel 6.0 merge window
(1,
2)
and the Kernel Newbies
kernel 6.0 summary.
Linux 5.19 (31 July 2022)
API changes include the following:
…
See also:
LWN
articles on the kernel 5.19 merge window
(1,
2)
and the Kernel Newbies
kernel 5.19 summary.
Linux 5.18 (22 May 2022)
API changes include the following:
…
See also:
LWN
articles on the kernel 5.18 merge window
(1,
2)
and the Kernel Newbies
kernel 5.18 summary.
Linux 5.17 (20 March 2022)
API changes include the following:
…
See also:
LWN
articles on the kernel 5.17 merge window
(1,
2)
and the Kernel Newbies
kernel 5.17 summary.
Linux 5.16 (9 January 2022)
API changes include the following:
…
See also:
LWN
articles on the kernel 5.16 merge window
(1,
2)
and the Kernel Newbies
kernel 5.16 summary.
Linux 5.15 (31 October 2021)
API changes include the following:
…
See also:
LWN
articles on the kernel 5.15 merge window
(1,
2)
and the Kernel Newbies
kernel 5.15 summary.
Linux 5.14 (29 August 2021)
API changes include the following:
The
futex()
system call adds a new operation,
FUTEX_LOCK_PI2.
This flag is similar to the existing
FUTEX_LOCK_PI
operation, but differs in that the caller can select
the clock against which the operation sleeps.
Details can be found in the
futex()
manual page
The seccomp user-space notification mechanism adds
a new flag,
SECCOMP_ADDFD_FLAG_SEND,
that provides functionality equivalent to an atomic
combination of the existing
SECCOMP_IOCTL_NOTIF_ADDFD
and
SECCOMP_IOCTL_NOTIF_SENDioctl(2)
operations.
Details can be found in the
seccomp_unotify(2)
manual page.
…
See also:
LWN
articles on the kernel 5.14 merge window
(1,
2)
and the Kernel Newbies
kernel 5.14 summary.
Linux 5.13 (27 June 2021)
API changes include the following:
…
See also:
LWN
articles on the kernel 5.13 merge window
(1,
2)
and the Kernel Newbies
kernel 5.13 summary.
Linux 5.12 (25 April 2021)
API changes include the following:
The
openat2()
system call adds a new flag,
RESOLVE_CACHED,
which causes the open to fail unless all path components
are already present in the kernel's lookup cache.
Details can be found in the
openat2()
manual page, and in Jonathan Corbet's LWN,net article,
Avoiding blocking file-name lookups.
The
mount_setattr(2)
system call, which changes the properties of an existing
mount or mount tree, has been added.
Details can be found in the
mount_setattr()
manual page.
The kernels adds support for a new feature, ID-mapped mounts,
which makes it possible to change the ownership (UIDs and GIDs)
of files when accessed via a specific mount.
Details can be found in the
mount_setattr()
manual page and in Jon Corbet's LWN.net article,
ID mapping for mounted filesystems.
…
See also:
LWN
articles on the kernel 5.12 merge window
(1,
2)
and the Kernel Newbies
kernel 5.12 summary.
Linux 5.11 (14 February 2021)
API changes include the following:
The
epoll_pwait2(2)
system call,
which (by contrast with
epoll_pwait(2))
provides nanosecond precision for its
timeout
argument, has been added.
Further details can be found in the
epoll_pwait2(2)
manual page.
The Syscall User Dispatch mechanism has been added.
This mechanism allows an application to selectively
intercept system calls so that they can be handled within the
application itself.
Further details can be found in the
prctl(2)
manual page (look for
PR_SET_SYSCALL_USER_DISPATCH)
and in Jon Corbet's LWN.net article,
Emulating Windows system calls, take 2.
The
close_range()
system call adds a new flag,
CLOSE_RANGE_CLOEXEC,
to set the close-on-exec flag on the specified
set of file descriptors.
Details can be found in the
close_range(2)
manual page.
The
sigaction(2)
system call adds two new flags:
SA_UNSUPPORTED,
which can be used for probe for flags support, and
SA_EXPOSE_TAGBITS,
which causes architecture-specific tag bits to be preserved in the
si_addr
field of the
siginfo_t
structure.
Details can be found in the
sigaction(2)
manual page.
…
See also:
LWN
articles on the kernel 5.11 merge window
(1,
2)
and the Kernel Newbies
kernel 5.11 summary.
The
PIDFD_NONBLOCK
flag can now be used when calling
pidfd_open(2)
in order to create a nonblocking PID file descriptor.
When passed to
waitid(2),
the call will return the error
EAGAIN,
rather than blocking,
if the corresponding process has not yet terminated.
Further details can be found in the
pidfd_open(2)
and
waitid(2)
manual pages.
The
mount(2)
system call supports a new flag,
MS_NOSYMFOLLOW,
which prevents dereferencing of symbolic links when resolving pathnames.
Further details can be found in the
mount(2)
manual page.
…
See also:
LWN
articles on the kernel 5.10 merge window
(1,
2)
and the Kernel Newbies
kernel 5.10 summary.
Linux 5.9 (11 October 2020)
API changes include the following:
The
close_range(2)
system call,
which allows a set of file descriptors to be closed using a single call,
has been added.
Further details can be found in the
close_range(2)
manual page.
The
setns(2)
system call now supports moving into a time namespace.
Further details can be found in the
setns(2)
manual page.
A new capability has been added:
CAP_CHECKPOINT_RESTORE.
Further details can be found in the
capabilities(7)
manual page and Jonathan Corbet's LWN.net article,
A crop of new capabilities
(written at a time when the proposed name for the capability was
CAP_RESTORE).
The seccomp user-space notification facility adds a new
ioctl(2)
operation,
SECCOMP_IOCTL_NOTIF_ADDFD,
which allows a supervisor process to allocate a file descriptor
and then install that file descriptor in the target process.
This allows the supervisor to emulate system calls in the target
that allocate file descriptors.
Further details can be found in the
seccomp_unotify(2)
manual page, and in Christian Brauner's blog post,
The
Seccomp Notifier - New Frontiers in Unprivileged Container
Development.
See also:
LWN
articles on the kernel 5.9 merge window
(1,
(2)
and the Kernel Newbies
kernel 5.9 summary.
Linux 5.8 (2 August 2020)
API changes include the following:
The
setns(2)
system call can now take a PID file descriptor as its
file descriptor argument,
to allow the caller to move into the same namespace(s)
as the process referred to by the file descriptor.
Further details can be found in the
setns(2)
manual page.
Two new capabilities have been added:
CAP_BPF
and
CAP_PERFMON.
Further details can be found in the
capabilities(7)
manual page and Jonathan Corbet's LWN.net article,
A crop of new capabilities.
A new
faccessat2()
system call allows for a correct implementation of
faccessat()
in the GNU C library.
Further details can be found in the
faccessat(2)
manual page.
…
See also:
LWN
articles on the kernel 5.8 merge window
(1,
2)
and the Kernel Newbies
kernel 5.8 summary.
Linux 5.7 (31 May 2020)
API changes include the following:
The
mremap()
system call adds the
MAP_DONTUNMAP
flag, which, used in conjunction with the
MREMAP_DONTUNMAP
flag,
can be used to remap a mapping to a new address
while at the same time not removing the original mapping.
Details can be found in the
mremap(2)
manual page
The
clone3()
system call now provides the
CLONE_INTO_CGROUP
flag, which allows the parent process to place a new child
process directly into a specified version 2 cgroup.
Details can be found in the
clone3(2)
manual page.
…
See also:
LWN
articles on the kernel 5.7 merge window
(1,
2)
and the Kernel Newbies
kernel 5.7 summary.
Linux 5.6 (29 March 2020)
API changes include the following:
A new namespace has been added to the kernel: time namespaces.
Details can be found in the
time_namespaces(7)
manual page
and in Jonathan Corbet's LWN.net article
Time namespaces.
A new
openat2()
system call had been added.
This system call extends the functionality of the existing
openat()
system call (which itself was an extension of the traditional
open()
system call).
The notable new feature provided with
openat2()
is the ability to restrict how untrusted paths are resolved.
Details can be found in the
openat2()
manual page
and in Jonathan Corbet's LWN.net article
Restricting path name lookup with openat2().
A new
pidfd_getfd()
system call can be used to obtain a copy of
a file descriptor held by another process.
Details can be found in the
pidfd_getfd(2)
manual page
and in Jonathan Corbet's LWN.net article
Grabbing file descriptors with pidfd_getfd().
The
prctl()
system call adds new
PR_GET_IO_FLUSHER
and
PR_SET_IO_FLUSHER
commands to get and set the IO_FLUSHER state of the calling process.
Details can be found in the
prctl(2)
manual page.
Cgroups version 2 now has a HugeTLB controller.
…
See also:
LWN
articles on the kernel 5.6 merge window
(1,
2)
and the Kernel Newbies
kernel 5.6 summary.
Linux 5.5 (26 January 2020)
API changes include the following:
The
CLONE_CLEAR_SIGHAND
flag can be used when calling
clone3()
in order to create a child process where all signals
that were handled in the parent are reset to their
default dispositions in the child.
Details can be found in the
clone3(2)
manual page.
When creating a new child process with
clone3(),
a privileged caller can now choose
which PID will be assigned to the new process
in each of the PID namespaces in which it resides.
Details can be found in the
clone3(2)
manual page.
The seccomp user-space notification mechanism adds a new flag,
SECCOMP_USER_NOTIF_FLAG_CONTINUE,
which allows the supervisor process to tell
the kernel that it may execute the target process's
system call.
Details can be found in the
seccomp_unotify(2)
manual page.
…
See also:
LWN
articles on the kernel 5.5 merge window
(1,
2)
and the Kernel Newbies
kernel 5.5 summary.
Linux 5.4 (25 November 2019)
API changes include the following:
The
waitid(2)
system call can now be used to wait on a child referred to
by a PID file descriptor.
Details can be found in the
waitid(2)
manual page.
The
waitid(2)
system call adds functionality that has long been present in
waitpid(2):
the ability to wait on a child in the same process group
as the parent without having to first discover the parent
process process group ID.
Details can be found in the
waitid(2)
manual page.
The
prctl(2)
system call adds new ARM64-specific operations,
PR_SET_TAGGED_ADDR_CTRL
and
PR_GET_TAGGED_ADDR_CTRL,
for setting and getting the tagged address mode.
Details can be found in the
prctl(2)
manual page.
The Kernel Lockdown feature has been merged.
Further information can be found in Jon Corbet's LWN.net article
and in the
kernel_lockdown(7)
manual page.
The
madvise()
system call adds two new flags:
MADV_COLD,
which deactivates a given range of pages, and
MADV_PAGEOUT,
which tells the kernel to reclaim pages.
Further details can be found in the
madvise(2)
manual page,
and Jon Corbet's LWN.net article,
process_madvise(), pidfd capabilities, and the revenge of the PIDs.
…
See also:
LWN
articles on the kernel 5.4 merge window
(1,
2)
and the Kernel Newbies
kernel 5.4 summary.
Linux 5.3 (15 September 2019)
API changes include the following:
A new
pidfd_open(2)
system call can be used to obtain a PID file descriptor
that refers to the process whose PID is specified as
an argument to the call.
This file descriptor can be used to send a signal to the process
using the
pidfd_send_signal(2)
system call added in Linux 5.2
and (starting in Linux 5.4) to wait on the process using
waitid(2)).
Further details can be found in the
pidfd_open(2))
manual page.
See also the next point.
The PID file descriptors returned by
clone(2)CLONE_PIDFD
and
pidfd_open(2))
can now be monitored with
poll(2),
select(2),
and
epoll(7).
When the process referred to by the file descriptor terminates,
the file descriptor is marked as readable.
Further details can be found in the
pidfd_open(2))
manual page.
A new
clone3(2)
system call
is added, providing a number of API improvements over the older
clone(2)
system call.
The new system call provides for additional flags bits
(thus allowing for future extensions); cleaner separation
in the use of various arguments;
and the ability to specify the size of the child's stack area.
Details can be found in the
clone3(2)
manual page.
A new
ptrace(2)
option,
PTRACE_GET_SYSCALL_INFO,
can be used to retrieve information about the system call
that caused a ptrace stop.
Details can be found in the
ptrace(2)
manual page.
…
See also:
LWN
articles on the kernel 5.3 merge window
(1, 2)
and the Kernel Newbies
kernel 5.3 summary.
Linux 5.2 (7 July 2019)
API changes include the following:
The cgroup v2 freezer
controller is added.
The new
CLONE_PIDFD
flag can be specified when calling
clone(2)
to have the call return a "PID file descriptor"
that refers to the new child process.
This file descriptor can be used to send a signal to the
process (using
pidfd_send_signal(2))
and (starting in Linux 5.4) to wait on the process using
waitid(2)).
Further details can be found in the
clone(2)
manual page and in Jonathan Corbet's LWN.net article,
Rethinking race-free process signaling.
…
See also:
LWN
articles on the kernel 5.2 merge window
(1,
2)
and the Kernel Newbies
kernel 5.2 summary.
Linux 5.1 (5 May 2019)
API changes include the following:
A process's
/proc/PID
directory can now be opened to obtain a file descriptor
that refers to that process.
This file descriptor can then be passed to the
pidfd_send_signal(2)
system call in order to send a signal to the process.
The use of a file descriptor for this purpose allows
the avoidance of race conditions that can occur with
traditional APIs (such as
kill(2))
where a signal may be sent to the wrong process if the original
target process had already terminate and its PID has been recycled.
Details can be found in the
pidfd_send_signal(2)
manual page and Marta Rybczyńska's LWN.net article,
Toward race-free process signaling.
As noted in the
execve(2)
manual page,
the limit on the size of the interpreter string that may
follow the
#!
string at the start of an interpreted file specified to
execve(2))
has been increased from 127 to 255 characters.
A new
F_SEAL_FUTURE_WRITE
operation allows the calling process to continue writing to
the memfd file using existing writable mappings,
but prevents the creation of new writable mappings
and writes to the memfd file.
Details can be found in the
fcntl(2)
and
memfd_create(2)
manual pages.
See also:
LWN
articles on the kernel 5.1 merge window
(1,
2)
and the Kernel Newbies
kernel 5.1 summary.
Linux 5.0 (3 March 2019)
API changes include the following:
The cgroup v2 cpuset controller is added
(with a restricted subset of features).
The cgroup_no_v1=named
kernel boot option can be used to disable the creation of
v1 named hierarchies.
Details can be found in the
cgroups(7)
manual page.
The seccomp mechanism now provides a user-space
notification feature.
Using this feature, a seccomp filter can defer handling
of the system call to another user-space process.
To do this, the filter species a return value of
SECCOMP_RET_USER_NOTIF
Further details can be found in Jon Corbet's LWN.net article
Deferring seccomp decisions to user space,
in the kernel documentation file
Documentation/userspace-api/seccomp_filter.rst,
and the
seccomp_unotify(2)
manual page.
Support has been added for the use of the
MSG_ZEROCOPY
option with UDP sockets.
The new
fanotifyFAN_OPEN_EXEC
and
FAN_OPEN_EXEC_PERM
flags can be used to obtain events when a file
is opened for execution.
Details can be found in the
fanotify_mark(2)
and
fanotify(7)
manual pages.
See also:
LWN
articles on the kernel 5.0 merge window
(1,
2)
and the Kernel Newbies
kernel 5.0 summary.
Linux 4.20 (23 Dec 2018)
API changes include the following:
A new
FAN_MARK_FILESYSTEMfanotify
flag allows an entire filesystem to be marked for
monitoring.
Details can be found in the
fanotify_mark(2)
and
fanotify(7)
manual pages.
A new
FAN_REPORT_TIDfanotify
flag can be used to request that instead of reporting
the process ID of the triggering process,
the thread ID of the triggering thread is reported.
Details can be found in the
fanotify_init(2)
and
fanotify(7)
manual pages.
See also:
LWN
articles on the kernel 4.20 merge window
(1,
2)
and the Kernel Newbies
kernel 4.20 summary.
Linux 4.19 (22 Oct 2018)
API changes include the following:
A new
IN_MASK_CREATE
flag can be used when creating an inotify watch to prevent
clobbering an existing watch mask on an inode.
Further details can be found in the
inotify(7)
manual page.
See also:
LWN
articles on the kernel 4.19 merge window
(1,
2)
and the Kernel Newbies
kernel 4.19 summary.
Linux 4.18 (12 Aug 2018)
API changes include the following:
A new polling interface and associated system call,
io_pgetevents(2)
are added.
Some information can be found in
Jonathan Corbet's LWN.net article,
A new kernel polling interface.
A new
rseq(2)
system call is added.
This system call allows the implementation of restartable sequences,
a technique permits the implementation of update operations on
per-CPU data without requiring the use of locking primitives.
Some information can be found in
Jonathan Corbet's LWN.net article,
Restartable sequences restarted.
See also:
LWN
articles on the kernel 4.18 merge window
(1,
2)
and the Kernel Newbies
kernel 4.18 summary.
Linux 4.17 (3 June 2018)
API changes include the following:
The
mmap()
system call adds the
MAP_FIXED_NOREPLACE
flag that performs a similar task to
MAP_FIXED,
but won't clobber a preexisting mapping.
Details can be found in the
mmap(2)
manual page
and Jonathan Corbet's LWN.net article,
MAP_FIXED_SAFE,
which discussed an earlier version of the patch that
added this feature (then with a different proposed name).
The
msgctl()
system call adds a
MSG_STAT_ANY
command which performs the same task as
MSG_STAT,
but does not require read permission on the message queue,
so that any user can employ this operation
(just as any user may read
/proc/sysvipc/msg
to obtain the same information.
Analogous operations are added for
semctl()
(SEM_STAT_ANY)
and
shmctl()
(SHM_STAT_ANY).
Documentation can be found in the
msgctl(2),
semctl(2),
and
shmctl(2)
manual pages.
The
prctl()
system call adds new
PR_GET_SPECULATION_CTRL
and
PR_SET_SPECULATION_CTRL
commands to get and set the state of speculation misfeatures.
Details can be found in the
prctl(2)
manual page.
See also:
LWN
articles on the kernel 4.17 merge window
(1,
2)
and the Kernel Newbies
kernel 4.17 summary.
Linux 4.16 (1 April 2018)
API changes include the following:
The PowerPC architecture now supports the memory-protection keys
feature that first appeared in Linux 4.9
(which provided support only on the Intel x86 architecture).
The
pwritev2(2)
system call now supports the
RWF_APPEND
flag, which allows data to be appended to a file on
a per-call basis.
For further details, see the
pwritev2(2)
manual page.
The
membarrier()
system call adds support for the following new commands:
MEMBARRIER_CMD_GLOBAL_EXPEDITED,
MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED,
MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE,
and
MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE.
Details can be found in the
membarrier(2)
manual page.
The
sched_setattr()
system call adds a new flag,
SCHED_FLAG_DL_OVERRUN,
that allows an application to get informed about
run-time overruns in
SCHED_DEADLINE
threads.
Documentation can be found in the
sched_setattr(2)
manual page.
See also:
LWN
articles on the kernel 4.16 merge window
(1,
2)
and the Kernel Newbies
kernel 4.16 summary.
Linux 4.15 (28 Jan 2018)
API changes include the following:
The limit on the number of lines that can be written to the
/proc/PID/uid_map,
and
/proc/PID/gid_map
files has been increased from 5 to 340.
Details can be found in the
user_namespaces(7)
manual page.
A
cpu
cgroups controller is now available for cgroups version 2.
A devices
cgroups controller is now available for cgroups version 2.
The
/sys/kernel/cgroup/delegate
file exports a list of the files that must be made writable
when doing delegation in the cgroups v2 hierarchy.
Details can be found in the
cgroups(7)
manual page.
The
/sys/kernel/cgroup/features
file exports a list of the features supported by cgroups v2.
Details can be found in the
cgroups(7)
manual page.
The
mmap()
system call supports two new flags,
MAP_SHARED_VALIDATE
and
MAP_SYNC.
Details can be found in the
mmap(2)
manual page.
The
prctl(2)
system call adds new ARM64-specific operations,
PR_SVE_SET_VL
and
PR_SVE_GET_VL,
for setting and getting SVE vector length.
Details can be found in the
prctl(2)
manual page and in the kernel source file
Documentation/arm64/sve.txt.
The
fanotify_init()
system call supports a new flag,
FAN_ENABLE_AUDIT,
to enable generation of audit log records about access
mediation performed by permission events.
Details can be found in the
fanotify_init(2)
manual page.
See also:
LWN
articles on the kernel 4.15 merge window
(1,
2)
and the Kernel Newbies
kernel 4.15 summary.
Linux 4.14 (12 Nov 2017)
API changes include the following:
The new
memfd_create()MFD_HUGETLB
flag allows the creation of anonymous files
in the RAM-base hugetlbfs filesystem.
For details, see the
memfd_create(2)
manual page.
The new
madvise()MADV_WIPEONFORK
and
MADV_KEEPONFORK
allow a process to set or clear the
"wipe on fork" attribute
for the pages in a specified private anonymous address range.
If this attribute is set,
then the pages in this range are cleared in
a child process created by
fork().
For details, see the
madvise(2)
manual page.
There are multiple additions to the seccomp facility,
all of which are documented in the
seccomp(2)
manual page:
The kernel now provides the ability to log the
actions returned by seccomp filters to the audit log.
All actions other than
SECCOMP_RET_ALLOW
can be logged.
The new
/proc/sys/kernel/seccomp/actions_logged
can be used to limit the set of actions that are logged
to the audit log.
The new
seccomp()SECCOMP_FILTER_FLAG_LOG
flag allows a BPF filter to request that all return
actions (except
SECCOMP_RET_ALLOW)
are logged to the audit log.
The new
SECCOMP_RET_LOG
filter return action permits the system call (like
SECCOMP_RET_ALLOW),
but logs the action to the audit log.
The new
SECCOMP_RET_KILL_PROCESS
filter return action causes the kernel to terminate
all of the threads in a multithreaded process.
This contrasts with the preexisting
SECCOMP_RET_KILL_THREAD
filter return action, which terminates only the thread
that made the system call.
To clearly distinguish the new
SECCOMP_RET_KILL_PROCESS
filter return action from the older
SECCOMP_RET_KILL
action, the name
SECCOMP_RET_KILL_THREAD
has been added as a synonym for
SECCOMP_RET_KILL.
The default treatment for an unrecognized filter
action return value changes from
SECCOMP_RET_KILL_THREAD
to
SECCOMP_RET_KILL_PROCESS.
The new
/proc/sys/kernel/seccomp/actions_avail
file shows a list of the seccomp filter actions
that are supported by the kernel.
The new
seccomp()SECCOMP_GET_ACTION_AVAIL
operation allows a program to ask the kernel whether it
supports a specified filter return action.
A range of new features appear in the cgroups version 2
implementation, all of which are documented in the
cgroups(7)
manual page:
Support is added for the so-called "thread mode",
whereby some restrictions that hitherto existed in
cgroups v2 are relaxed.
The implementation now allows for the creation of
"threaded subtrees", within which the threads of
a multithreaded process may be spread across
different cgroups. Within a threaded subtree,
the "no internal processes" rule is relaxed,
so that a cgroup inside a threaded subtree can both
have member processes and exercise control over
child cgroups.
Only so-called threaded controllers (currently,
cpu,
perf_event,
and
pids)
can be employed within the cgroups
of a threaded subtree.
A new
cgroup.type
file, which appears in each nonroot cgroup and
which was added to support the "thread mode" concept,
can be used to view and change the
"type" of a thread group.
A new
cgroup.threads
file is used with "thread mode" to view the threads that
are members of a cgroup and to move threads to new cgroups.
Two new files that appear in each cgroup,
cgroup.max.depth
and
cgroup.max.descendants,
can be used to limit the depth of a cgroup subtree and
the number of descendant cgroups in the subtree.
A new
cgroup.stat
file exports information about the number of
cgroups under a cgroup subtree.
Version 3 file capabilities were added, in order to allow
the implementation of namespaced file capabilities.
Namespaced file capabilities are a mechanism that
allows a process that has capabilities inside a
noninitial user namespace (but which has no
capabilities in the initial user namespace) to
attach capabilities to an executable file in a way that
means those capabilities will be conferred to a process that
executes the file only if the process resides inside
that user namespace.
Further information can be found in the
capabilities(7)
manual page.
The
membarrier()
system call adds an expedited option
(the
MEMBARRIER_CMD_PRIVATE_EXPEDITED
command).
For further details, see the
membarrier(2)
manual page
and Jonathan Corbet's LWN.net article
Expediting membarrier().
The
preadv2(2)
system call adds support for a new flag,
RWF_NOWAIT,
which can be used to avoid blocking for data that is
not immediately available.
For further details, see the
preadv2(2)
manual page.
See also:
LWN
articles on the kernel 4.14 merge window
(1,
2)
and the Kernel Newbies
kernel 4.14 summary.
Linux 4.13 (3 Sep 2017)
API changes include the following:
The new
kcmp()KCMP_EPOLL_TFD
request can be used to discover whether a specified file descriptor
is present in an epoll instance.
Further details can be found in the
kcmp(2)
manual page.
A set of new
fcntl()
requests
(F_GET_RW_HINT,
F_SET_RW_HINT,
F_GET_FILE_RW_HINT,
F_SET_FILE_RW_HINT)
can be used to get and set file read/write hints
that are associated with open file descriptions or inodes.
Details can be found in
fcntl(2)
manual page, in the section
"File read/write hints".
Given a file descriptor that refers to a pseudoterminal master,
the new
TIOCGPTPEERioctl()
operation opens and returns a new file descriptor that
refers to the peer pseudoterminal slave device.
This operation can be performed regardless of whether
the pathname of the slave device is accessible through the
calling process's mount namespace.
Details can be found in
ioctl_tty(2)
manual page.
A new cgroups v2 mount option
nsdelegate
causes cgroup namespaces to automatically become delegation
boundaries.
Details can be found in the
cgroups(7)
manual page.
The
sched_setattr()
system call adds a new flag,
SCHED_FLAG_RECLAIM,
that allows a
SCHED_DEADLINE
thread to reclaim bandwidth that is unused by other
real-time threads.
Documentation can be found in the
sched_setattr(2)
manual page.
See also:
LWN
articles on the kernel 4.13 merge window
(1,
2)
and the Kernel Newbies
kernel 4.13 summary.
Linux 4.12 (2 Jul 2017)
API changes include the following:
The new
/proc/PID/ns/pid_for_children
file provides a handle that shows which PID namespace the children
of process will be created in.
For details, see the
namespaces(7)
and
pid_namespaces(7)
manual pages.
The new
ioctl()GETFSMAP
retrieves physical extent mappings for a filesystem.
For details, see the
ioctl_getfsmap(2)
manual page.
The
keyctl()
system call adds a new
KEYCTL_RESTRICT_KEYRING
operation
to apply a key-linking restriction to a specified keyring.
Details can be found in the
keyctl(2)
manual page.
The
arch_prctl(2)
system call adds
new options,
ARCH_SET_CPUID
and
ARCH_GET_CPUID
that can be used to modify or fetch the setting
of a flag that enables the
cpuid
instruction for the calling thread.
Further details can be found in the
arch_prctl(2)
manual page.
A new socket option is added,
SO_INCOMING_NAPI_ID.
Documentation can be found in the
socket(7)
manual page.
See also:
LWN
articles on the kernel 4.12 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 4.12 summary.
Linux 4.11 (30 April 2017)
API changes include the following:
A new
statx()
system call has been added.
This system call provides a range of extensions to
the functionality of the older
stat()
system call.
Various enhancements have been made to the
userfaultfd mechanism that was added in Linux 4.3.
Details can be found in the
userfaultfd(2)
and
ioctl_userfaultfd(2)
Two new namespace
ioctl()
operations permit the possibility to
discover details of the namespace set-up on the system:
NS_GET_NSTYPE
can be used to discover the type of namespace referred to
by a file descriptor, and
NS_GET_OWNER_UID
can be used to discover the user ID of the owner of
a user namespace that is referred to by a file descriptor.
Details can be found in the
ioctl_ns(2)
manual page.
A new RDMA cgroups resource controller has been added
(for both version 1 and version 2 cgroups).
(RDMA stands for remote direct memory access,
a technique to copy data directly from the memory of
one computer to the memory of another computer.
RDMA can be used to implement zero-copy networking;
that is, no kernel-user-space buffer copying.)
See also:
LWN
articles on the kernel 4.11 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 4.11 summary.
Linux 4.10 (19 Feb 2017)
API changes include the following:
It is now possible to attach a BPF filter to a cgroup in order to
perform network filtering for all processes within cgroup.
For further information, see Jonathan Corbet's LWN.net article,
Network filtering
for control groups
Support for POSIX timers is now configurable.
Support is enabled by default, but can be disabled via the
CONFIG_POSIX_TIMERS option.
A process's "No new privileges" setting, set via the
prctl()PR_SET_NO_NEW_PRIVS
operation added in Linux 3.5,
is now exposed in the
/proc/PID/status
file.
See also:
LWN
articles on the kernel 4.10 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 4.10 summary.
Linux 4.9 (11 Dec 2016)
API changes include the following:
The memory protection keys interface has been added.
Further details can be found in the following manual pages:
pkeys(7),
mprotect(2),
and
pkey_alloc(2).
See also the LWN.net articles
(1,
2)
by Jon Corbet.
Two new ioctl(2) operations,
NS_GET_USERNS
and
NS_GET_PARENT,
can be used to discover the relationships
between non-user namespaces and their associated user namespaces
and to find the parents of PID and user namespaces.
Details can be found in the
ioctl_ns(2)
manual page and in my blog post
Introspecting namespace relationships.
A set of files added in the
/proc/sys/user
directory can be used to view and modify limits
on the number of namespaces of each type that
can be created by each user inside a user namespace.
Details can be found in the
cgroup_namespaces(7)
manual page.
The list of locks shown in
/proc/locks
is now filtered to show just the locks for the processes in
the PID namespace for which the
/proc
filesystem was mounted.
See also:
LWN
articles on the kernel 4.9 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 4.9 summary.
Linux 4.8 (2 Oct 2016)
API changes include the following:
A new
pids.events
interface file for the
pids
cgroup controller allows notification of events for this cgroup.
This is a key-value file that currently supports one key, named
max,
which shows the number of times that
fork()
failed because the
pids.max
limit for this cgroup was encountered.
This file can be monitored with
inotify(7)
(changes produce
IN_MODIFY
events)
and
poll()
(changes produce
POLLPRI
readiness notifications).
See also:
LWN
articles on the kernel 4.8 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 4.8 summary.
Linux 4.7 (24 July 2016)
API changes include the following:
The
sigaltstack(2)
system call adds a flag,
SS_AUTODISARM,
that disables the alternate signal stack while the signal handler
is running.
This allows the application to safely call
swapcontext(3)
from within the signal handler without
corrupting the stack when subsequent signals are delivered.
Details can be found in the
sigaltstack(2)
manual page.
The
waitid(2)
system call adds support for the
__WCLONE,
__WALL,
and
__WNOTHREAD
flags.
A new Umask field in the
/proc/PID/status
file can be used to inspect a process's umask.
Details can be found in the
umask(2)
manual page.
The
preadv2(2)
and
pwritev2(2)
system calls add support for two new flags,
RWF_SYNC
and
RWF_DSYNC,
although the flags are meaningful only for
pwritev2(2).
These flags provide the per-I/O equivalent of the
O_SYNC
and
O_DSYNC
file status flags (described in the
open(2)
manual page).
For further details, see the
pwritev2(2)
manual page.
See also:
LWN
articles on the kernel 4.7 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 4.7 summary.
Linux 4.6 (15 May 2016)
API changes include the following:
A new
clone()
flag,
CLONE_NEWCGROUP
can be used to create a new process in a new control-group
namespace.
Further details can be found in the
commit message
for the patch that added this feature, as well as the
cgroup_namespaces(7),
clone(2),
unshare(2), and
setns(2)
manual pages.
Two new system calls,
preadv2()
and
pwritev2(),
are like
preadv()
and
pwritev(),
but add a flags argument.
For further information, see the
preadv2(2)
manual page, and Jon Corbet's LWN.net article,
The return of preadv2()/pwritev2().
See also:
LWN
articles on the kernel 4.6 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 4.6 summary.
Linux 4.5 (14 Mar 2016)
API changes include the following:
A new
copy_file_range(2)
system call has been added, permitting fast in-kernel copying
of data between two files without the need to shift data
through user-space buffers.
Details can be found in the
manual page.
A new flag for the
madvise()
system call,
MADV_FREE,
allows a process to advise the kernel that
it no longer needs the pages in a specified address range.
The kernel is then at liberty to (destructively)
free these pages for reuse.
Further details can be found in the
madvise(2)
manual page.
A new event flag for use with the
epoll_ctl()
system call,
EPOLLEXCLUSIVE,
can be used in some circumstances
to avoid thundering herd problems
when multiple processes are monitoring the same file.
Further details can be found in the
epoll_ctl(2)
manual page.
The unified-hierarchy ("version 2") control-group interface,
which has been in development since Linux 3.16
but was hitherto marked as experimental,
is now considered to be officially released.
However, not all controllers support the new interface yet.
Information about the new interface can be found in
the kernel source file
Documentation/cgroup-v2.txt
and in the
cgroups(7)
manual page.
Mandatory file locking is now an optional feature, governed
by a kernel configuration option
(CONFIG_MANDATORY_FILE_LOCKING).
This is the first step toward eventually removing a feature
that is buggy and believed to be little or completely unused.
See also:
LWN
articles on the kernel 4.5 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 4.5 summary.
Linux 4.4 (10 Jan 2016)
API changes include the following:
A new
mlock2()
system call has been added, and a related
MCL_ONFAULT
flag has been added for the
mlockall()
system call.
Details can be found in the
mlock(2) manual page.
The new
ptrace()PTRACE_SECCOMP_GET_FILTER
operation can be used to dump a process's seccomp filters.
Details can be found in the
ptrace(2)
manual page.
See also:
LWN
articles on the kernel 4.4 merge window
(1,
2)
and the Kernel Newbies
kernel 4.4 summary.
Linux 4.3 (1 Nov 2015)
API changes include the following:
A new membarrier()
system call has been added.
Information about this system call can be found in the
membarrier(2)
manual page and in the
commit message.
The motivation for adding this system call,
which has been under development for several years,
is discussed Jon Corbet's LWN.net article,
sys_membarrier().
A new userfaultfd()
system call and some associated
ioctl()
operations have been added.
Further information can be found in Jon Corbet's LWN.net article,
Page faults
in user space
and in the
userfaultfd(2)
and
ioctl_userfaultfd(2)
manual pages.
The ambient capabilities feature has been merged.
Details can be found in the
capabilities(7) manual page
and the description of
CAP_AMBIENT
in the
prctl(2) manual page.
Direct system calls are now provided for the sockets API on x86-32,
rather than multiplexing via the
socketcall(2)
system call
(which continues to be provided for backward compatibility).
This change facilitates
seccomp(2) filtering
of sockets system calls.
(In order to employ such filters, the filtered program
must have been compiled so as to employ the new system calls).
A new
pids
cgroups controller can be used to limit the number of tasks
in a cgroup.
A new ptrace option,
PTRACE_O_SUSPEND_SECCOMP
allows a tracee's seccomp protections.
Details can be found in the
ptrace(2)
manual page.
See also:
LWN
articles on the kernel 4.3 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 4.3 summary.
Linux 4.2 (30 Aug 2015)
API changes include the following:
The
splice()
system call now supports UNIX domain stream sockets.
The
ext4
and
f2fs
filesystems now support the
fallocate()FALLOC_FL_INSERT_RANGE
operation (which first appeared in Linux 4.1).
The limit of 8 recursions while resolving a pathname
containing symbolic links has been lifted.
The only limit now imposed is the maximum of 40 dereferences
while resolving the entire pathname.
Further information can be found in the
path_resolution(7)
manual page.
See also:
LWN
articles on the kernel 4.2 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 4.2 summary.
Linux 4.1 (21 June 2015)
API changes include the following:
The
fallocate()
system call adds the
FALLOC_FL_INSERT_RANGE
command for inserting a hole into the middle of a file.
(The bytes past the point of insertion are shifted in order to
make room for the hole.)
In the initial implementation, this operation is supported
only by the XFS filesystem.
Details can be found in the
fallocate()
man page.
The
/proc/PID/status
file adds four new fields:
NStgid,
NSpid,
NSpgid,
and
NSsid.
These fields show respectively the
process ID, (kernel) thread ID, process group ID, and session ID
in each of the PID namespaces of which the process is a member.
The leftmost entry shows the value with respect to the PID namespace
of the reading process,
followed by the value in successively nested inner namespaces.
Details can be found in the
proc(5)
manual page.
The XFS filesystem adds support for the
renameat2()RENAME_WHITEOUT
flag.
See also:
LWN
articles on the kernel 4.1 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 4.1 summary.
Linux 4.0 (12 April 2015)
API changes include the following:
The
mount(2)
system call adds a new
MS_LAZYTIME
option that
minimizes the number of updates to
file timestamps in the on-disk i-node.
This can provide greatly improved performance in some circumstances.
Further details can be found in the
mount(2)
manual page.
The implementation of the
remap_file_pages(2)
system call, which had already been deprecated in Linux 3.16,
has been replaced by a slower in-kernel implementation.
For information on why this change was made,
see the LWN.net article,
The possible demise
of remap_file_pages().
The
prctl(2)
system call adds new MIPS-specific operations,
PR_SET_FP_MODE
and
PR_GET_FP_MODE,
for setting and getting the floating-point mode.
Details can be found in the
prctl(2)
manual page.
See also:
LWN
articles on the kernel 4.0 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 4.0 summary.
Linux 3.19 (9 Feb 2015)
API changes include the following:
A new
execveat()
system call has been added.
This system call is to
execve()
what
openat()
is to
open().
The primary motivation for adding this system call is
to allow an implementation of the
fexecve()
library function that does not rely on the
/proc
filesystem being mounted.
Further information can be found in the
execveat()
manual page.
The default values for the System V semaphore limits,
SEMMSL,
SEMMNI,
and
SEMOPM,
have been increased.
Details can be found in the
semget(2)
and
semop(2)
manual pages.
A new /proc/PID/setgroups
file has been added, and the behavior of the
setgroups(2)
has been changed in order to close a security loophole
concerning the interaction of
setgroups(2)
and user namespaces.
The background story can be read in Jon Corbet's LWN.net article,
User namespaces and setgroups(),
and the full details on
setgroups(2)
and why it was needed can be found in the
user_namespaces(7)
manual page.
The
prctl(2)
system call adds new x86-specific operations,
PR_MPX_ENABLE_MANAGEMENT
and
PR_MPX_ENABLE_MANAGEMENT,
to enable or disable kernel management of Memory Protection eXtensions (MPX)
bounds tables.
Details can be found in the
prctl(2)
manual page and Jonathan Corbet's LWN.net article,
Supporting Intel MPX in Linux.
A new socket option,
SO_INCOMING_CPU,
can be used to set or get the CPU affinity of a socket.
Details can be found in the
socket(7)
manual page.
A new socket option,
SO_ATTACH_BPF,
can be used to attach an extended BPF
program to a socket for use as a filter of incoming packets.
A corresponding
SO_DETACH_BPF
option, which is added as a synonym for the already existing
SO_DETACH_FILTER
option,
can be used to detach the extended BPF program from a socket.
option has been added
Details can be found in the
socket(7)
manual page.
See also:
LWN
articles on the kernel 3.19 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.19 summary.
The
renameat2()
system call adds a new flag,
RENAME_WHITEOUT,
that is used to support "whiteouts" when renaming
files on overlay/union filesystems.
Details can be found in the
renameat2(2)
manual page.
This operation requires filesystem support,
which is provided by the
ext4
and
shmem
filesystems in the initial implementation.
The
prctl()
system call adds new
PR_SET_MM_MAP
and
PR_SET_MM_MAP_SIZE
flags for the
PR_SET_MM
operation.
Details can be found in the
prctl(2)
manual page.
See also:
LWN
articles on the kernel 3.18 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.18 summary.
Linux 3.17 (5 October 2014)
API changes include the following:
A new getrandom()
system call has been added.
This system call returns randomness from the entropy pool.
Some details can be found in Jake Edge's LWN.net article,
A system call
for random numbers: getrandom()
and in the
getrandom(2)
manual page.
A new seccomp()
system call has been added,
for controlling seccomp filters.
For further information, see the
seccomp(2)
manual page.
A new file-sealing API is implemented for files
inside shared memory filesystems;
this API consists of two
new fcntl() operations,
F_GET_SEALS
and
F_ADD_SEALS,
which get and add seals to a file
(the current seals are
F_SEAL_SHRINK,
F_SEAL_GROW,
F_SEAL_WRITE,
and
F_SEAL_SEAL).
In addition, a new memfd_create()
system call has been added.
This system call can be used to
create anonymous shared memory mappings referred
to via a file descriptor; that file descriptor can used
with the file-sealing API.
More information can be found in Jon Corbet's LWN.net article
that discusses an earlier version of this API,
Sealed files
and in the
memfd_create(2)
and
fcntl(2)
manual pages.
A new kexec_file_load()
system call has been added.
This provides the ability to load a kexec kernel
and initrd filesystem
specified as file descriptors.
For details, see the
kexec_load(2)
manual page.
A new
/proc/thread-self
directory is added in the
/proc
filesystem.
This directory is the threads analog of
/proc/self;
in other words, it is a synonym for
/proc/self/task/TID,
where
TID
is the thread ID of the calling thread.
A new
TFD_IOC_SET_TICKSioctl(2)
operation can be used to adjust the number of expirations
that have occurred on a
timerfd
timer.
Details can be found in the
timerfd_create(2)
manual page.
See also:
LWN
articles on the kernel 3.17 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.17 summary.
A new capability,
CAP_AUDIT_READ,
allows reading the audit log via a multicast netlink socket.
Btrfs now supports the
open()O_TMPFILE
flag that was added in Linux 3.11.
The default values for the System V shared memory limits,
SHMALL
and
SHMMAX,
have been increased.
Details can be found in the
shmget(2)
manual page.
A new
/proc/sys/kernel/sysctl_strict_writes
file determines the behavior when an application
tries to write into a
/proc/sys
file at a nonzero offset.
For details, see the
proc(5)
manual page.
See also:
LWN
articles on the kernel 3.16 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.16 summary.
Linux 3.15 (8 Jun 2014)
API changes include the following:
A new
renameat2()
system call has been added.
This system call extends the existing
renameat()
system call to allow two filenames to be swapped in an
atomic operation.
Further information can be found in Jon Corbet's LWN.net article,
Exchanging two files,
and also in my LWN.net article,
Flags as a system call API design pattern.
Documentation can be found in the
rename(2)
manual page.
Open file description (OFD) locks
(formerly known
as "file private locks") have been added.
This feature improves on some significant deficiencies
in the traditional byte-range locking API.
(That API is described in
Chapter 55 of TLPI, and the limitations are described in
Section 55.3.5.)
Further information on OFD locks can be found in
Jeffrey T. Layton's LWN.net article,
File-private POSIX locks,
and in the
fcntl(2)
manual page.
The
ext4
and
XFS
filesystems implement two new flags for the
fallocate()
system call:
FALLOC_FL_ZERO_RANGE
and
FALLOC_FL_COLLAPSE_RANGE.
Further information on the
FALLOC_FL_COLLAPSE_RANGE
flag can be found in Jon Corbet's LWN.net article
Finding the proper scope of a file collapse operation.
Both new flags are documented in the
fallocate(2)
manual page.
Two new
prctl()
operations,
PR_SET_THP_DISABLE
and
PR_GET_THP_DISABLE,
set and get the value of the calling process's
"THP disable" flag.
Details can be found in the
prctl(2)
manual page.
XFS now supports the
open()O_TMPFILE
flag that was added in Linux 3.11.
timerfd_create()
adds support for the
CLOCK_BOOTTIME
clock.
Details can be found in the
timerfd_create(2)
manual page.
See also:
LWN
articles on the kernel 3.15 merge window
(1,
2)
and the Kernel Newbies
kernel 3.15 summary.
Linux 3.14 (31 Mar 2014)
API changes include the following:
A new deadline scheduling policy
(SCHED_DEADLINE)
has been added.
In order to control the scheduling of processes under
this policy, two new system calls have been added:
sched_setattr()
and
sched_getattr().
These are more generalized versions of the
sched_setscheduler()
and
sched_getscheduler()
system calls: they allow setting scheduling policy and
parameters for all of the previously existing scheduling policies
as well as the new
SCHED_DEADLINE
policy.
Documentation can be found in the
sched_setattr(2)
and
sched(7)
manual pages.
See also Jonathan Corbet's LWN.net article,
Deadline scheduling: coming soon?,
and in the kernel source file
Documentation/scheduler/sched-deadline.txt.
The user-space lockdep feature has been added.
See Jonathan Corbet's LWN.net article,
User-space lockdep
for details.
TCP has a new "autocorking" feature, controlled via
/proc/sys/net/ipv4/tcp_autocorking.
Documentation can be found in the
tcp(7)
manual page.
The
HARD_QUEUESMAX
ceiling (added in Linux 3.5) on the
/proc/sys/fs/mqueue/msgsize_default
limit is removed.
See also:
LWN
articles on the kernel 3.14 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.14 summary.
Linux 3.13 (20 Jan 2014)
API changes include the following:
The TCP Fast Open feature that was added in Linux 3.7
is now enabled by default.
The
keyctl()
system call adds a new
KEYCTL_GET_PERSISTENT
operation
to get the persistent keyring for a specified user and
link it to a specified keyring.
Details can be found in the
keyctl(2)
manual page.
Other new features (yet to be detailed):
SO_MAX_PACING_RATE
socket option.
See also:
LWN
articles on the kernel 3.13 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.13 summary.
Linux 3.12 (3 Nov 2013)
API changes include the following:
A new per-socket option,
TCP_NOTSENT_LOWAT
and a system-wide setting,
/proc/sys/net/ipv4/tcp_notsent_lowat,
can be used to limit the number of unsent bytes in TCP sockets,
in order to reduce usage of kernel memory.
Some details can be found in the
commit message
and the kernel source file
Documentation/networking/ip-sysctl.txt
The
/proc/sys/kernel/core_pattern
file adds a new specifier, %d.
This specifier is replaced by the "dumpable" mode of the process
(the same value as is returned by the
prctl(2)PR_GET_DUMPABLE
operation).
Details can be found in the
core(5)
manual page.
See also:
LWN
articles on the kernel 3.12 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.12 summary.
Linux 3.11 (2 Sep 2013)
API changes include the following:
A new socket option,
SO_BUSY_POLL,
and a
poll()
flag,
POLL_BUSY_LOOP,
allow for low latency, busy polling on sockets.
Some details can be found in the
socket(7)
manual page, the kernel source file
Documentation/sysctl/net.txt,
and Jonathan Corbet's LWN.net articles,
Ethernet polling and patch-pulling latency
and
Low-latency Ethernet device polling.
A new
open()
flag,
O_TMPFILE,
provides an improved method for race-free creation of temporary files.
Details can be found in the
open(2)
manual page.
As of kernel 3.11, support for this flag is provided in the
ext2,
ext3,
ext4,
UDF,
and
minix
filesystems.
Two new ptrace() commands,
PTRACE_GETSIGMASK
and
PTRACE_SETSIGMASK,
can be used to get and set a process's signal mask.
Details can be found in the
ptrace(2)
manual page.
timerfd_create()
adds support for the
CLOCK_BOOTTIME_ALARM
and
CLOCK_REALTIME_ALARM
clocks.
Details can be found in the
timerfd_create(2)
manual page.
See also:
LWN
articles on the kernel 3.11 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.11 summary.
Linux 3.10 (30 Jun 2013)
API changes include the following:
POSIX clocks and timers now support a new clock,
CLOCK_TAI,
that measures against International Atomic Time.
Details can be found in the
clock_getres(2),
clock_nanosleep(2),
and
timer_create(2)
manual pages.
The new
ptrace()PTRACE_PEEKSIGINFO
request can be used to nondestructively retrieve pending
signals.
Signals can be retrieved either from the process-wide queue,
or from the per-thread queue.
Details can be found in the
ptrace(2)
manual page.
A test case for this feature can be found in the kernel source file
tools/testing/selftests/ptrace/peeksiginfo.c
POSIX timer IDs are no longer guaranteed to be unique system-wide.
Each process's timers are now visible via the
/proc/PID/timers
file.
This change was made so that the checkpoint/restore
facility can restore a process's timers with the same IDs.
Details of
/proc/PID/timers
can be found in the
proc(5)
manual page.
Two new files
/proc/sys/vm/admin_reserve_bytes
and
/proc/sys/vm/user_reserve_bytes
influence the behavior of memory overcommitting.
For details, see the
proc(5)
manual page.
A new
>SO_SELECT_ERR_QUEUE
socket option is added.
Details can be found in the
socket(7)
manual page.
See also:
LWN
articles on the kernel 3.10 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.10 summary.
Linux 3.9 (29 Apr 2013)
API changes include the following:
A new
SO_REUSEPORT
socket option allows multiple sockets to be bound to a
UDP or TCP port.
The option improves performance in some
network server designs.
More information can be found in my LWN.net article,
The SO_REUSEPORT socket option
and in the
socket(7)
manual page.
A new
/proc/sys/kernel/sched_rr_timeslice_ms
file can be used to view and set the
SCHED_RR
(realtime scheduling round-robin)
quantum as a millisecond value.
The default value is 100.
Writing 0 to this file resets the quantum to the default value.
A new socket option,
SO_LOCK_FILTER
can be used to prevent changing the filters associated
with a socket.
Details can found in the
socket(7)
manual page.
Other new features (yet to be detailed):
TCP_TIMESTAMP
socket options.
See also:
LWN
articles on the kernel 3.9 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.9 summary.
Linux 3.8 (19 Feb 2013)
API changes include the following:
The
ptrace()
system call supports a new flag,
PTRACE_O_EXITKILL.
If a tracing process sets this flag, a
SIGKILL
signal will be sent to every traced process
if the tracing process exits.
Details can be found in the
ptrace(2)
manual page.
On systems that provide multiple huge page sizes,
shmget()
and
mmap()
can now select the desired page size for an allocation.
For more information, see my LWN.net article,
Supporting variable-sized huge pages.
The user namespaces implementation
has been completed,
allowing unprivileged process to create and work with
user namespaces via
clone(CLONE_NEWUSER),
unshare(CLONE_NEWUSER), and
setns().
For more information, see the manual pages of those system calls,
and the documentation of the
/proc/PID/ns/*,
/proc/PID/uid_map,
and
/proc/PID/gid_map
files in the
user_namespaces(7)
manual page.
See also
my series of LWN.net articles on namespaces
(you can find a directory of the article series
at the end of the first article).
The
setns()
and
unshare()
system calls add support for PID, mount, and user namespaces.
Details can be found in the manual pages.
A new
finit_module()
system call is added.
This system call is like
init_module(),
but loads the module from an open file descriptor.
In addition, the new system call has a
flags
argument that can be used to modify the behavior of the system call.
For more information, see my LWN.net article,
Loading modules from file descriptors,
and the
init_module(2)
manual page.
The
msgrcv()
system call adds a new, Linux-specific flag,
MSG_COPY.
This flag causes the
msgtyp
argument to be interpreted as an ordinal position within the message queue,
and causes the call to nondestructively fetch a copy of the
message at that position in the queue.
Details can be found in the
msgrcv(2)
manual page.
The
ext4
and
tmpfs
filesystems add support for the
lseek()SEEK_HOLE
and
SEEK_DATA
operations.
Reads from
inotify(7)
file descriptors are now restarted if the
SA_RESTART
flag was specified when establishing the signal handler.
In addition, reads from
inotify(7)
file descriptors no longer demonstrate
the Linux-specific oddity of failing with the error
EINTR
when the calling process is resumed after a
stop signal plus
SIGCONT
(see page 445 of TLPI).
Other new features (yet to be detailed):
MPOL_LOCAL
and
MPOL_MF_LAZY
memory policy flags;
SO_GET_FILTER
socket option.
See also:
LWN
articles on the kernel 3.8 merge window
(1,
2)
and the Kernel Newbies
kernel 3.8 summary.
Linux 3.7 (11 Dec 2012)
API changes include the following:
The server-side implementation of the TCP Fast Open feature was merged.
This complements the implementation of the client-side functionality
that was merged in 3.6.
To enable server-side (i.e., passive) TCP Fast Open,
a TCP server must use
setsockopt()
to set the
TCP_FASTOPEN.
For more information, see my LWN.net article,
TCP Fast Open: expediting web services.
The
Btrfs
filesystem adds support for hole punching
(the fallocate(2)FALLOC_FL_PUNCH_HOLE
operation added in Linux 2.6.38).
The
/proc/sys/kernel/core_pattern
file adds a new specifier, %P.
This specifier is replaced by the PID of the process
as seen in the initial PID namespace
(whereas the existing %p specifier
is replaced by the PID in the PID namespace where the
process resides).
Details can be found in the
core(5)
manual page.
See also:
LWN
articles on the kernel 3.7 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.7 summary.
Linux 3.6 (1 Oct 2012)
API changes include the following:
The client-side implementation of the TCP Fast Open feature
was merged. This implements a new flag,
MSG_FASTOPEN,
used with either
sendto()
or
sendmsg()
to initiate a TCP fast open.
The new /proc/sys/net/ipv4/tcp_fastopen
file can be set to enable or disable client (and server) TCP Fast Open
functionality.
For more information, see my LWN.net article,
TCP Fast Open: expediting web services.
Some restrictions on the creation of hard and soft links were added,
in order to improve security.
For more information, see Jonathan Corbet's LWN.net article,
Tightening security: not for the impatient,
and the documentation of
/proc/sys/fs/protected_hardlinks
and
/proc/sys/fs/protected_symlinks
in the
proc(5)
manual page.
The
fcntl()
system call adds support for a new command,
F_GETOWNER_UIDS,
that can be used to retrieve the real and effective user IDs
associated with a previous call to
F_SETOWNER.
(Those UIDs determine the rules for sending
a signal to another process for signal-driven I/O.)
The third argument of the call is of type
uid_t *, and should point
to a two-element array that stores the real user ID and
effective user ID.
This feature is intended for use by the checkpoint/restore
facility and is only provided if the kernel was configured with the
CONFIG_CHECKPOINT_RESTORE option.
A new
hugetlb
cgroups controller can be used to limit HugeTLB usage per cgroup.
See also:
LWN
articles on the kernel 3.6 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.6 summary.
Linux 3.5 (21 Jul 2012)
API changes include the following:
Seccomp filter mode is added. This feature
is designed to allow security conscious applications
to limit the set of system calls that they can make.
For further information, see the
seccomp(2)
manual page,
Jonathan Corbet's LWN.net article,
Yet another new approach to seccomp,
and the kernel source file
Documentation/prctl/seccomp_filter.txt.
The
ptrace(2)
system call adds a new
PTRACE_O_TRACESECCOMP
option for use with the
seccomp(2)SECCOMP_RET_TRACE
filter action.
The
PTRACE_O_TRACESECCOMP
option is described in the
ptrace(2)
manual page.
A new
kcmp()
system call used to determine whether various kernel objects are
shared between tasks.
This is useful for the checkpoint-restore facility.
Some information can be found in Jonathan Corbet's LWN.net article,
Preparing for user-space checkpoint/restore,
and in the
kcmp(2)
manual page.
Some additional
PR_SET_MM_* flags for use with the
PR_SET_MMprctl()
operation added in Linux 3.3.
A new
epollEPOLLWAKEUP
flag prevents system suspend while
epoll
events are ready.
Use of this flag requires that the caller have the newly added
CAP_BLOCK_SUSPEND
capability
(if the caller does not have this capability, then the
EPOLLWAKEUP flag is
silently ignored).
Details can be found in the
epoll_ctl(2)
and
epoll(7)
manual pages.
The
tmpfs
filesystem adds support for hole punching
(the fallocate(2)FALLOC_FL_PUNCH_HOLE
operation added in Linux 2.6.38).
The
XFS
filesystem adds support for the
lseek()SEEK_HOLE
and
SEEK_DATA
operations.
The new
prctl()PR_SET_NO_NEW_PRIVS
operation prevents
execve()
from granting privileges.
For example,
a process will not be able to execute a set-user-ID binary to
change its UID or GID if this flag is set.
The same is true for file capabilities.
A corresponding
PR_GET_NO_NEW_PRIVS
operation can be used to retrieve the state of this
attribute for the caller.
Details can found in the
prctl(2)
manual page.
The new
prctl()PR_GET_TID_ADDRESS
operation allows the caller to retrieve its
clear_child_tid
address
(see set_tid_address(2)).
Details can found in the
prctl(2)
manual page.
If the kernel was configured with
CONFIG_CHECKPOINT_RESTORE,
then a new
/proc/PID/children
file lists the children of a process.
(In Linux 4.2,
the kernel option governing the presence
of this file was changed to
CONFIG_CHECKPOINT_RESTORE.)
Documentation can be found in the
proc(5)
manual page.
Two new
/proc
files can be used to read and modify the
values that are used to provide defaults when
a POSIX message queue is created using an
mq_open()
call in which the
attr
argument is specified as
NULL.
The
/proc/sys/fs/mqueue/msg_default
file defines the default value used for a new queue's
mq_maxmsg
attribute.
The default value in this file is 10.
The
/proc/sys/fs/mqueue/msgsize_default
file defines the default value used for a new queue's
mq_msgsize
attribute.
The default value in this file is 8192.
In addition,
Linux 3.5 changed the interpretation of various files in the
/proc/sys/fs/mqueue/
directory that specify message queue limits.
Full details can be found in the
mq_overview(7)
manual page.
The
keyctl()
system call adds a new
KEYCTL_INVALIDATE
operation to mark a key as invalid.
Details can be found in the
keyctl(2)
manual page.
Other new features (yet to be detailed):
TCP_REPAIR,
TCP_REPAIR_OPTIONS,
TCP_REPAIR_QUEUE,
and
TCP_QUEUE_SEQ
socket options.
See also:
LWN
articles on the kernel 3.5 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.5 summary.
Linux 3.4 (21 May 2012)
API changes include the following:
The
pipe2()
system call permits a new flag,
O_DIRECT,
that creates a pipe that operates in "packet" mode.
Each
write()
(of less than
PIPE_BUF
bytes) to the pipe creates a packet,
and each
read()
reads exactly one packet
(discarding excess bytes if the supplied buffer is too small).
Details can be found in the
pipe(2)
manual page.
The
PR_SET_CHILD_SUBREAPERprctl()
operation allows
a "service manager" process to mark itself as a sort of
'sub-init', able to stay as the parent for all orphaned processes
created by the started services.
All
SIGCHLD
signals will be delivered to the service manager.
There is a corresponding
PR_GET_CHILD_SUBREAPERprctl()
operation.
Details can be found in the
prctl(2)
manual page.
Planned users of this feature include
D-Bus
and
systemd.
The
madvise()MADV_DONTDUMP
operation can be used to specify that an address range should
be excluded from core dumps.
The
MADV_DODUMP
operation
reverses the effect of
MADV_DONTDUMP.
Details can be found in the
madvise(2)
manual page.
The
setsockopt()SO_PEEK_OFF
allows controlling the offset for
peeking at data queued in a socket.
(Currently supported for UNIX domain sockets only.)
Details can found in the
socket(7)
manual page.
A new
prctl() operation,
PR_SET_PTRACER,
is used with the Yama Linux Security Module to control
which processes can ptrace()
the calling process.
Details can found in the
prctl(2)
manual page
and in the kernel source file
Documentation/security/Yama.txt.
UNIX domain sockets now support the use of the
MSG_TRUNC
flag for
recv(2)
and related system calls.
Other new features (yet to be detailed):
SO_NOFCS
socket option.
See also:
LWN
articles on the kernel 3.4 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.4 summary.
Linux 3.3 (19 Mar 2012)
API changes include the following:
A new
prctl() operation,
PR_SET_MM,
intended for use by the checkpoint/restart facility,
allows text, data, and heap sizes to be set
to the values in effect at checkpoint time
when a process is restored.
The caller must have the
CAP_SYS_RESOURCE
capability.
This operation is only supported if the kernel is configured with the
CONFIG_CHECKPOINT_RESTORE
option.
Details can found in the
prctl(2)
manual page.
Two changes related to the /proc
filesystem:
A new
/proc/PID/map_files
directory contains symbolic links
describing the file mappings of the process identified by PID;
documentation can be found in the
proc(5) manual page.
Two new mount options for the
/proc filesystem
(hidepid=
and gid=)
can be used to control the visibility of
/proc/PID
directories.
Documentation can be found in the
proc(5) manual page.
A new
net_prio
cgroups controller allows control of the priority of a cgroup's
outgoing network traffic.
Other new features (yet to be detailed):
SO_WIFI_STATUS
socket option.
See also:
LWN
articles on the kernel 3.3 merge window
(1,
2)
and the Kernel Newbies
kernel 3.3 summary.
Linux 3.2 (5 Jan 2012)
API changes include the following:
The
process_vm_readv()
and
process_vm_writev()
functions, which provide a technique for fast message passing.
Some information can be found in the LWN.net article
Fast interprocess messaging
(describes an early version of the API),
and in the
process_vm_readv(2)
manual page.
Files under
/proc/sys
are now pollable, meaning
that applications can use
poll(),
select(),
and
epoll
to check for changes to
sysctl
parameters.
A new
/proc/sys/kernel/cap_last_cap
file exposes the numerical value of the highest capability
supported by the running kernel;
this can be used to determine the highest bit
that may be set in a capability set.
Extensions to the
cpu
cgroup controller
(governed by CONFIG_CFS_BANDWIDTH)
make it possible to impose a quota on the amount of CPU time
that the processes in a cgroup may consume in each
scheduling period.
Unlike the "shares" mechanism already provided by the
cpu
controller, these quotas apply regardless of whether
there is competition for the CPU.
Within each cgroup, the allocation of the CPU to
processes scheduled under the
SCHED_OTHER
policy can be further controlled using the
nice values of the processes.
For further information, see the kernel source files
Documentation/scheduler/sched-bwc.txt
and
Documentation/scheduler/sched-design-CFS.txt,
the
cgroups(7)
manual page,
and Jonathan Corbet's LWN.net article,
CFS bandwidth control.
See also:
LWN
articles on the kernel 3.2 merge window
(1,
2)
and the Kernel Newbies
kernel 3.2 summary.
Linux 3.1 (24 Oct 2011)
API changes include the following:
Three new operations are added for the
ptrace()
system call:
PTRACE_SEIZE,
PTRACE_INTERRUPT,
and
PTRACE_LISTEN.
Details can be found in the
ptrace(2)
manual page.
Some further information can be found in the LWN.net article,
3.1 merge window part 1.
Two new flags for the
lseek()
system call,
SEEK_HOLE
and
SEEK_DATA,
provide the ability to search for holes in sparsely allocated files.
Some further information can be found in
Jonathan Corbet's LWN.net article
The return of SEEK_HOLE,
and the
lseek(2)
manual page.
For the 3.1 release,
only the Btrfs filesystem supports these operations.
A new
/proc/sys/kernel/shm_rmid_forced
file can be used to control the handling of System V
shared memory segments that have no attached process.
The default value in this file is 0,
which provides the traditional behavior:
unattached segments remain in existence and
can be reattached at a later point in time by another process.
If the value in
shm_rmid_forced
is 1, then the effect is as though an
IPC_RMID
operation is performed on all shared memory segments
that currently exist and that are created in the future.
This means that those segments will be destroyed as soon
as the last process detaches from them.
This can be useful to ensure that shared memory segments
are counted against the resource usage and limits
of at least one process,
but it is nonstandard and has the potential to break
applications that depend on the traditional behavior.
Further details can be found in the
proc(5)
manual page.
See also:
LWN
articles on the kernel 3.1 merge window
(1,
2)
and the Kernel Newbies
kernel 3.1 summary.
A new
setns()
system call allows its caller to join the namespace
specified by its two arguments—a namespace type
(one of a subset of the
CLONE_*
constants given to
clone(2))
and a file descriptor referring to one of the files in a
/proc/PID/ns
directory.
Some further info can be found in Jake Edge's LWN.net article
Namespace file descriptors,
and in the
setns(2)
manual page contributed by Eric Biederman.
A new
sendmmsg()
system call provides multiple message sending facilities
(the analog of the
recvmmsg(2)
system call added in Linux 2.6.33).
For more information, see the
sendmmsg(2)
manual page.
The
timerfd_settime()
system call adds a
TFD_TIMER_CANCEL_ON_SET
flag.
If this flag is set for a
CLOCK_REALTIME
absolute
(TFD_TIMER_ABSTIME)
timer, then the timer is expired if the clock is reset.
For more information, see the
timerfd_create(2)
manual page.
Two new POSIX clocks are added:
CLOCK_BOOTTIME_ALARM
and
CLOCK_REALTIME_ALARM.
According to the commit message,
these clocks behave identically to
CLOCK_REALTIME
and
CLOCK_BOOTTIME,
but the
_ALARM
suffixed clocks will wake the system if it is suspended.
Some further details can be found John Stultz's LWN.net article,
Waking systems from suspend,
and in the
clock_getres(2),
clock_nanosleep(2),
and
timer_create(2)
manual pages.
A new
CAP_WAKE_ALARMcapability
governs the use of the
CLOCK_BOOTTIME_ALARM
and
CLOCK_REALTIME_ALARM
clocks.
The
/proc/sys/kernel/core_pattern
file adds a new specifier, %E.
This specifier is replaced by the pathname of the executable,
with slashes replaced by exclamation marks
(so that the basename of the resulting core dump filename
does not contain slashes).
Details can be found in the
core(5)
manual page.
The ext4 filesystem adds support for the
fallocate()FALLOC_FL_PUNCH_HOLE
See also:
LWN
articles on the kernel 3.0 merge window
(1,
2)
and the Kernel Newbies
kernel 3.0 summary.
Linux 2.6.39 (19 May 2011)
API changes include the following:
New
name_to_handle_at() and
open_by_handle_at()
system calls.
These system calls provide functionality that is useful for
file-system servers that run in user space.
Details can be found in the
open_by_handle_at(2)
manual page that I wrote.
Some details can be found in the LWN.net article,
Open by handle.
A new
O_PATH
flag is added for
open(2).
Some details can be found in the LWN.net article,
2.6.39 merge window part 1.
O_PATH
descriptors can be obtained for symbolic links,
and can be passed via
SCM_RIGHTS
datagrams.
Details can be found in the
open(2)
manual page.
A new
AT_EMPTY_PATH
flag allows empty relative pathnames for
linkat(2),
fchownat(2),
fstatat(2),
and
name_to_handle_at(),
in which case the calls operate on
their directory file descriptor argument.
In addition, an empty pathname can now be supplied to
readlinkat(2),
to produce the same behavior for that call.
Details can be found in the respective manual pages.
A new
clock_adjtime()
system call, analogous to
adjtimex(2),
permits adjustments to POSIX clocks.
Details can be found in the
clock_adjtime(2),
A new
syncfs()
system call, which is similar to
sync(2),
but flushes only the filesystem containing the file
referred to by its file-descriptor argument.
Details in the
syncfs(2)
manual page.
A new POSIX clock,
CLOCK_BOOTTIME,
is identical to
CLOCK_MONOTONIC,
but includes time that the system has been suspended.
This clock is intended for applications that want a
monotonically increasing clock and also want to be aware of
time the system has been suspended.
Details can be found in the
timer_create(2)
manual page;
some background can be found in John Stultz's LWN.net article
Waking systems from suspend.
A thread operating under the
SCHED_IDLEpolicy
is now allowed to upgrade itself to the
SCHED_BATCH
or
SCHED_OTHER
policy if its nice value falls within the range permitted by its
RLIMIT_NICE
resource limit.
The
keyctl()
system call adds two new operations:
KEYCTL_REJECT,
to mark a key as negatively instantiated and set an expiration
timer on the key, and
KEYCTL_INSTANTIATE_IOV,
to instantiate an uninstantiated key with a payload specified
via a vector of buffers.
Details can be found in the
keyctl(2)
manual page.
operations.
A new inode flag,
FS_NOCOW_FL,
can be used to disable copy-on-write semantics on a filesystem
(such as Btrfs) that supports copy-on-write.
For details, see the
ioctl_iflags(2)
manual page.
A new
perf_event
cgroups controller make it possible to do
perf
monitoring per cgroup.
See also:
LWN
articles on the kernel 2.6.39 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 2.6.39 summary.
Linux 2.6.38 (15 Mar 2011)
API changes include the following:
A new
AT_NO_AUTOMOUNT
flag for
fstatat(2),
which can be used to suppress automounting of the terminal
component of the pathname argument.
Further information can be found in the
fstatat(2)
manual page.
A new
CAP_SYSLOGcapability,
used (instead of
CAP_SYS_ADMIN)
to govern privileged
syslog(2)
operations.
Details can be found in the manual pages.
A new
FALLOC_FL_PUNCH_HOLE
operation for
fallocate(2).
This operation creates a hole (see page 83 of TLPI) in the file
in the byte range indicated by the
offset
and
len
arguments.
(The file data in the specified range is lost.)
Filesystem support is required for the
FALLOC_FL_PUNCH_HOLE
operation.
In the initial implementation, support is provided by just the
XFS filesystem.
As currently implemented,
FALLOC_FL_PUNCH_HOLE
must be specified with
FALLOC_FL_KEEP_SIZE,
which means that the size of a file can't change,
even if a hole is punched at the end of the file.
Further information can be found in the
fallocate(2)
manual page.
The new
/proc/sys/kernel/kptr_restrict
file can be used to prevent exposure of kernel pointers via
/proc
files and other interfaces.
(This affects how pointers are printed when using the new
%pK
specifier
for the kernel-internal
printf()
function.)
See the
proc(5)
manual page for further details.
The addition of the autogroup feature significantly changed
the semantics of the nice value.
For details, see the
sched(7)
manual page.
See also:
LWN
articles on the kernel 2.6.38 merge window
(1,
2)
and the Kernel Newbies
kernel 2.6.38 summary.
Linux 2.6.37 (5 Jan 2011)
API changes include the following:
The permissions on /proc/PID/limits
changed from readable for the owner only to readable for all users
on the system.
The
fanotify_init()
and
fanotify_mark()
system calls were added.
These system calls are designed for use in virus-scanning tools,
but may also serve other more general uses.
They provide functionality that is in some ways similar to
inotify(7).
Note, however, that the
fanotify
interface is not a superset of
inotify.
(The existence of two APIs with heavily overlapping functionality,
rather than a new API that is a superset of the earlier API,
is unfortunate.)
These two system calls were added in Linux 2.6.36,
but disabled while concerns about the API were resolved.
In Linux 2.6.37, the system calls have been enabled.
Documentation for these system calls can be found in the
fanotify_init(2)
and
fanotify_mark(2)
manual pages.
The
fanotify(7)
manual page provides an overview of the API.
The
TCP_USER_TIMEOUT
specifies the maximum amount of time
(in milliseconds) that transmitted
data may remain unacknowledged before TCP will forcibly
close the corresponding connection and return
ETIMEDOUT
to the application.
Details can be found in the
tcp(7)
manual page.
See also:
LWN
articles on the kernel 2.6.37 merge window
(1,
2)
and the Kernel Newbies
kernel 2.6.37 summary.
Linux 2.6.36 (20 Oct 2010)
API changes include the following:
The new
prlimit()
system call is an enhancement of
setrlimit()
and
getrlimit().
It allows the caller to both set and retrieve
its own resource limits
(including retrieving the old limit at the same time
as a new limit is set), and (with suitable permissions)
perform the same task for other processes.
This system call does not suffer
this kernel bug,
which affects
getrlimit()/setrlimit().
(See pages 759 and 760 of TLPI.)
Indeed, starting with version 2.13,
glibc provides library implementations
for
setrlimit()
and
getrlimit()
that employ
prlimit()
to work around the kernel bug.
I've added documentation of this system call to the
getrlimit(2)
manual page.
The
inotify
API adds a new flag,
IN_EXCL_UNLINK,
that prevents children of a watched directory
from generating events for a directory after they have been
unlinked from that directory.
I've added documentation of this flag to the
inotify(7)
manual page.
The OOM killer was been rewritten (again).
In the process, the
/proc/PID/oom_adj
file became obsolete, in favor of the new
/proc/PID/oom_score_adj
file.
For further information, see the
proc(5)
manual page.
As originally designed and implemented, the
inotifyIN_ONESHOT
flag did not cause an
IN_IGNORED
event to be generated when a watch was dropped
after an event was triggered.
Starting with Linux 2.6.36, an
IN_IGNORED
event is generated in this case.
(This was almost certainly an unintended consequence of some
code reworking during the 2.6.36 development cycle.)
The
statfs
structure returned by
statfs()
adds a field,
f_flags,
that returns a bit mask indicating various mount options
that a filesystem was mounted with.
Details can be found in the
statfs(2)
manual page.
This allows the
statvfs()
library function to more efficiently populate the information
returned in the
f_flag
field, as described in the
statvfs(3)
manual page.
See also:
LWN
articles on the kernel 2.6.36 merge window
(1,
2)
and the Kernel Newbies
kernel 2.6.36 summary.
glibc API changes
glibc 2.38 (31 July 2023)
API changes include the following:
…
glibc 2.37 (31 January 2023)
API changes include the following:
…
glibc 2.36 (30 July 2022)
API changes include the following:
…
glibc 2.35 (3 February 2022)
API changes include the following:
…
glibc 2.34 (2 August 2021)
API changes include the following:
A new library function,
closefrom(3),
has been added.
This function closes all file descriptors greater than or equal
to a specified value.
Details can be found in the
closefrom(3)
manual page.
A new library function,
_Fork(3),
has been added.
This function performs the same task as
fork(2)
but is async-signal-safe.
Details can be found in the
_Fork(3)
manual page.
The
mallinfo2()
function has been added.
This function performs the same task as the (now deprecated)
mallinfo()
function, but employs a structure with larger field widths
to allow reporting of values that are than can fit in an
int.
Details can be found in the
mallinfo2(3)
manual page.
The dynamic linker supports two new options,
--argv0
and
--list-tunables.
Details can be found in the
ld.so(8)
manual page.
…
glibc 2.32 (5 August 2020)
API changes include the following:
Two new error-diagnostic functions are added:
strerrorname_np()
and
and strerrordesc_np().
Given an error number argument
(e.g.,
EPERM),
strerrorname_np()
returns a string containing the name (e.g., "EPERM")
of that argument.
Like
strerror(),
strerrordesc_np()
returns a string describing the error number given as its argument,
but differs in that the string is not translated
according to locale settings.
Details can be found in the
strerror(3)
manual page.
Following on from the previous change, the arrays
sys_errlist,
_sys_errlist,
sys_nerr,
and
_sys_nerr
are deprecated: their declarations have been removed from
<stdio.h>
and they are no longer available to newly linked binaries
although the symbols remain available to binaries linked
against earlier versions of the library.
Two new signal-description functions are added:
sigabbrev_np()
and
sigdescr_np().
Given a signal number argument
(e.g.,
SIGHUP),
sigabbrev_np()
returns a string containing the name (e.g., "HUP")
of that argument.
Like
strsignal(),
strerrordesc_np()
returns a string describing the signal given as its argument,
but differs in that the string is not translated
according to locale settings.
Details can be found in the
strsignal(3)
manual page.
Following on from the previous change, the arrays
sys_siglist,
_sys_siglist,
and
sys_sigabbrev
are deprecated: their declarations have been removed from
<string.h>
and they are no longer available to newly linked binaries
although the symbols remain available to binaries linked
against earlier versions of the library.
Two new GNU-specific functions are added:
pthread_attr_setsigmask_np()
and
pthread_attr_getsigmask_np().
These functions allow a program to set and get the
signal mask attribute in a POSIX threads attributes object.
For further information, see the
pthread_attr_setsigmask_np(3)
manual page.
glibc 2.31 (1 February 2020)
API changes include the following:
The
pthread_clockjoin_np()
is added.
This GNU-specific function is similar to
pthread_timedjoin_np()
but provides a
clockid
argument that allows the caller to choose which
clock to wait against.
Currently, the new function supports the
CLOCK_MONOTONIC
and
CLOCK_REALTIME
clocks.
By contrast,
pthread_timedjoin_np()
can be used to measure only against the
CLOCK_REALTIME
clock.
glibc 2.30 (1 Aug 2019)
API changes include the following:
The
twalk_r(3)
function has been added.
Details can be found in the
twalk_r(3)
manual page.
Wrapper functions are added for
gettid(2),
tgkill(2),
and
getdents64(2).
The
pthread_cond_clockwait(),
pthread_mutex_clocklock(),
pthread_rwlock_clockrdlock(),
pthread_rwlock_clockwrlock(),
and
sem_clockwait()
functions are added.
The functions, which are proposed for POSIX.1,
are similar to
pthread_cond_timedwait(),
pthread_mutex_timedlock(),
pthread_rwlock_timedrdlock(),
pthread_rwlock_timedwrlock(),
and
sem_timedwait(),
but provide a
clockid
argument that allows the caller to choose which
clock to wait against.
Currently, the new functions support the
CLOCK_MONOTONIC
and
CLOCK_REALTIME
clocks.
By contrast, most of the "timed" functions can be used to measure
only against the
CLOCK_REALTIME
clock, with the exception that the clock used by
pthread_cond_timedwait()
can be selected at the time of the initialization of
the condition variable (using
pthread_condattr_setclock()).
Other changes include the following:
The dynamic linker
adds a new command-line option,
--preload,
that can be used to preload shared libraries.
Details can be found in the
ld.so(8)
manual page.
The dynamic linker permits the loading of shared objects that
refer to versioned symbols whose definition has moved to a
different soname than that where the symbol was found at static
link time.
Formerly, this situation led to the error
"relocation error: ... symbol SYM version VER not defined
in file X with link time reference".
The former behavior was an intentional design decision,
but that behavior prevented certain useful possibilities,
such as migrating the implementation of a symbol
to a dependency of the original soname.
A longstanding (several years)
bug
in
pldd(1)
has now been fixed.
glibc 2.29 (31 Jan 2019)
API changes include the following:
A wrapper function is added for
getcpu(2).
glibc 2.28 (1 Aug 2018)
API changes include the following:
Wrapper functions are added for
renameat2(2)
and
statx(2).
Support for C11 (ISO/IEC 9899:2011) threads is added,
consisting of the following functions:
thrd_current(3),
thrd_equal(3),
thrd_sleep(3),
thrd_yield(3),
thrd_create(3),
thrd_detach(3),
thrd_exit(3),
thrd_join(3),
mtx_init(3),
mtx_lock(3),
mtx_timedlock(3),
mtx_trylock(3),
mtx_unlock(3),
mtx_destroy(3),
call_once(3),
cnd_broadcast(3),
cnd_destroy(3),
cnd_init(3),
cnd_signal(3),
cnd_timedwait(3),
and
cnd_wait(3).
A large number of new math functions, specified in
ISO/IEC 18661-1:2014 and ISO/IEC TS 18661-3:2015, are added.
glibc 2.27 (1 Feb 2018)
API changes include the following:
Wrapper functions are added for
copy_file_range(2),
memfd_create(2),
and
mlock2(2).
Support is added for memory keys, including wrapper functions for
pkey_alloc(2),
pkey_free(2),
and
pkey_mprotect(2),
as well as two supporting library functions,
pkey_set(3)
and
pkey_get(3).
glibc 2.26 (2 Aug 2017)
API changes include the following:
A new
reallocarray()
function can be used to reallocate an array buffer.
This function is to
calloc()
what
realloc()
is to
malloc().
The purpose of this function (as opposed to the use of
realloc(ptr, nmemb*size))
is to safely handle the condition
where reallocating an array of
nmemb
items of
size
bytes would lead to an overflow when calculating the value
nmemb*size.
Details can be found in the
reallocarray(3)
manual page.
preadv2(2)
and
pwritev2(2).
glibc 2.25 (5 Feb 2017)
API changes include the following:
A new
getentropy(3)
function, which is implemented on top of
getrandom(2),
can be used to obtain a buffer of random data.
This function is nonstandard,
but is also present on at least OpenBSD.
Further details can be found in the manual page.
A new
explicit_bzero(3)
function performs the same task as
bzero(3),
but the call is guaranteed never to be optimized away
by the compiler.
Details can be found in the manual page.
A large number of new math functions and macros, specified in
ISO/IEC TR 24731-2:2010,
ISO/IEC TS 18661-1:2014,
and
ISO/IEC TS 18661-4:2015, are added.
glibc 2.24 (2 Aug 2016)
API changes include the following:
With the exception of 32-bit and 64-bit Intel architectures,
the minimum kernel versions required for glibc 2.24 is Linux 3.2.
For intel architectures, the minimum kernel version is 2.6.32.
The
LD_POINTER_GUARD environment variable
can no longer be used to disable pointer guarding, which is now
always enabled.
Details can be found in the
ld.so(8)
manual page.
glibc 2.22 (5 Aug 2015)
API changes include the following:
Numerous bugs in the implementation of
fmemopen(3)
were fixed.
glibc 2.21 (6 Feb 2015)
The obsolete
sigvec()
function is removed.
glibc 2.20 (7 Sep 2014)
Note: the minimum Linux kernel version to run
with this and later glibc versions is Linux 2.6.32.
API changes include the following:
The
_BSD_SOURCE
and
_SVID_SOURCE
feature test macros are deprecated.
They now have the same effect as
_DEFAULT_SOURCE,
but generate a compile-time warning if used.
For further information, see the
feature_test_macros(7)
manual page.
glibc 2.19 (7 Feb 2014)
API changes include the following:
The
_BSD_SOURCE
feature test macro no longer causes BSD
definitions to be favored in a few cases where
standards conflict.
The affected APIs here include
getpgrp(),
setpgrp(),
sigpause(),
and
setjmp(),
and source code changes may be needed to maintain historical
behavior in applications that use these APIs.
For further information, see the
feature_test_macros(7)
manual page.
A new feature test macro,
_DEFAULT_SOURCE,
has been added.
Defining this macro provides an effect similar to
the feature test macros that are defined by default—that is,
_BSD_SOURCE,
_SVID_SOURCE, and
_POSIX_C_SOURCE=200809.
This macro can be defined to ensure that the "default"
definitions are provided even when the defaults would otherwise
be disabled,
as happens when individual macros are explicitly defined,
or the compiler is invoked in one of its "standard" modes (e.g.,
cc -std=c99).
For further information, see the
feature_test_macros(7)
manual page.
glibc 2.18 (10 Aug 2013)
API changes include the following:
New (nonstandard)
pthread_getattr_default_np()
and
pthread_setattr_default_np()
functions are added.
These functions permit the caller to
get and set the default attributes that are used to create
new threads (i.e., the attributes used when the
attr
argument of
pthread_create()
is
NULL).
For further information, see the
pthread_getattr_default_np(3)
manual page.
glibc 2.17 (25 Dec 2012)
Note: the minimum Linux kernel version to run
with this and later glibc versions is Linux 2.6.16.
API changes include the following:
A new
secure_getenv()
function allows secure access to the environment.
It is similar to
getenv(3),
but returns
NULL
if running in a set-user-ID/set-group-ID process.
Documentation can be found in the
secure_getenv(3)
manual page.
The functions
clock_getres(),
clock_gettime(),
clock_settime(),
clock_getcpuclockid(),
and
clock_nanosleep(),
moved from the realtime library
(librt to the main C library.
Consequently, it is no longer necessary to link against
the realtime library
(cc -lrt)
when using these functions.
The rationale for this change is explained in
glibc bug 14743
("clock_gettime et al from -lrt always bring in libpthread").
glibc 2.16 (30 Jun 2012)
Note: this and subsequent glibc versions
are not expected to work with any Linux kernel less than version 2.6.
API changes include the following:
The glibc header files now handle the
_ISOC11_SOURCE
feature test macro,
as a mechanism for exposing declarations conforming to the
C11
standard.
A new
getauxval(3)
function allows retrieval of auxiliary vector
(AT_*)
key-value pairs passed from the Linux kernel.
Further information can be found in my LWN.net article
"getauxval() and the auxiliary vector"
and in the
getauxval(3)
manual page that I wrote.
glibc 2.15 (tagged 25 Dec 2011)
API changes include the following:
A new
scandirat()
function, which is to
scandir()
as
openat(2)
is to
open().
Documentation can be found in the
scandirat(3)
manual page.
A new
pldd
command lists the dynamic shared objects that are linked into
a process.
For further information, see the
pldd(1)
manual page.
glibc 2.14 (tagged 31 May 2011)
API changes include the following:
Wrapper functions are added for
clock_adjtime(2),
name_to_handle_at(2),
open_by_handle_at(2),
syncfs(2),
setns(2),
and
sendmmsg(2).
glibc 2.13 (tagged 17 Jan 2011)
API changes include the following:
Newly added library implementations of
setrlimit()
and
getrlimit()
bypass the system calls of the same name, instead using the
prlimit()
system call to avoid the bug described above
in the API changes for Linux 2.6.36.
POSIX/Single UNIX Specification
Since the last major release of the POSIX/SUS standard (Issue 7) in 2008,
there have been some Technical Corrigenda—essentially
bug fix releases to the standard.
In addition, work proceeds on the next release (Issue 8).
The Austin Group Defect Tracker can be found
here.
Issues marked for the next POSIX release (in progress)
The issues
tagged
for the next release of the standard (Issue 9) can be found here
here.
POSIX.1-2024 Technical Corrigendum 1 (in progress)
Work on POSIX.1-2024 Technical Corrigendum 1 is in progress.