2
\$\begingroup\$

This code needs to be 100% POSIX-compliant for I want people to be able to run it in virtually any shell. You see correctly, no she-bang (default interpreter is unset = the first line is blank). I spent two hours on getting it in better shape, than it has been years ago. It now prints the iteration number on Ctrl+c by using trap, disables Ctrl+z suspend, can be given command-line argument = number of iterations, includes representative results produced on my laptop = look at the very bottom, prints out the interpreter. Primarily, I re-wrote this script of mine for a final touch, which would, in a fun way show the speed of the interpreter's built-in variables vs calling an external program, id in here.


# It is important you do not use the `errexit` option for benchmarking purposes!
set -o nounset

# Disable suspend functionality, usually invoked by pressing Ctrl+Z.
trap '' TSTP

# Reduce errors when Ctrl+C is pressed to produce current iteration.
trap '[ ${i+set} = set ] && echo $i' INT


# ~~~~~~~~CORRECT USAGE~~~~~~~~
# 1. Intentionally, the file does not need to have an executable bit, nor a shebang!
# 2. To use/run it, call the file directly with your shell interpreter, for example:
#
# On modern shells, this should take a fraction of one second
# (reading $EUID built-in shell variable is extremely fast)
# bash is_user_root__benchmark
#
# POSIX could take couple seconds
# (depends on a CPU, as calling an external program /usr/bin/id here is very expensive)
# dash is_user_root__benchmark


# ~~~~~~~~THE CORE FUNCTION~~~~~~~~
# 1. It is completely safe to disable the ShellCheck SC3028 warning,
# because if $EUID variable is undefined (or empty), id is called.
# 2. Additionally, if using errexit shell option, we must not return
# false value, which ordinary user ID would trigger here, expensive
# workaround on this issue would be e.g. setting `echo TRUE` in case
# the script is running as root, and in the below `while` loop compare
# the returned value, but as said it is very expensive operation,
# if repeated thousands of times... Hence my recommendation on 1st line
is_user_root() {
    # shellcheck disable=SC3028
    [ "${EUID:-$(id -u)}" -eq 0 ]
}


# ~~~~~~~~BENCHMARK / SHOW-OFF~~~~~~~~
# Repeat the is_user_root() function invokes $iterations-times.

# Default number of is_user_root() function calls, you can customize if taking too long.
# You may also call the script with a number argument indicating iteration number.

if [ "$#" -eq 0 ]
then
    readonly iterations=10000
else
    readonly iterations=$1
fi

interpreter=$(readlink -f /proc/$$/exe)

printf '%s\n' "${iterations}x is_user_root() in $(basename "$interpreter")"

print_time() {
    date +"%T.%2N"
}

printf '%s' 'Start :  '; print_time

i=1; while [ "$i" -le "$iterations" ]; do

    is_user_root
    i=$((i + 1))

done

printf '%s' 'Finish:  '; print_time


# ~~~Bash results on AMD Ryzen 9 7845HX~~~

# $ bash is_user_root__benchmark 100000
# 100000x is_user_root() in bash
# Start :  00:49:06.46
# Finish:  00:49:07.17

# $ bash is_user_root__benchmark 1000000
# 1000000x is_user_root() in bash
# Start :  00:49:21.69
# Finish:  00:49:28.63


# ~~~Dash results on AMD Ryzen 9 7845HX~~~

# $ dash is_user_root__benchmark 1000
# 1000x is_user_root() in dash
# Start :  00:54:33.83
# Finish:  00:54:34.48

# $ dash is_user_root__benchmark 10000
# 10000x is_user_root() in dash
# Start :  00:54:44.00
# Finish:  00:54:50.48
\$\endgroup\$
2
  • \$\begingroup\$ EUID=0 dash is_user_root__benchmark \$\endgroup\$ Commented Apr 13 at 2:28
  • \$\begingroup\$ @Joshua the sole purpose of the bench is to call /usr/bin/id repeatedly... \$\endgroup\$ Commented Apr 13 at 7:23

1 Answer 1

4
\$\begingroup\$

Why is performance important?

It seems unlikely that is_user_root would be the bottleneck of any script. When we perform this test, the most likely use is to guard operations that are likely to fail for non-root users - those operations will normally dwarf execution time for id.

If performance really is so important, then the benchmark ought to account for the overhead of the loop - measure the execution time of the loop without is_user_root, and subtract that from the result.

What are we detecting?

This implementation can be spoofed, if the user assigns EUID or executes with a path containing their own id program. If we just guarding against mistakes, that's fine, but if this is intended as a security check then it's seriously flawed.

Why not set EUID on first execution?

If EUID is unset, we always execute id. We could execute just once, using ${EUID:=$(id -u)} (= rather than - in the substitution).

More radically, we could completely redefine the function on first execution:

is_user_root() {
    if [ $(/usr/bin/id -u) = 0 ]
    then
        is_user_root() { echo true; }
    else
        is_user_root() { echo false; }
    fi
    is_user_root
}

The benchmark program

The set and trap commands at the beginning are really part of the benchmark - it could be intrusive to insist on these in actual programs using the function, which probably already have requirements for their shell options and signal handlers.

We're not using printf effectively here:

printf '%s\n' "${iterations}x is_user_root() in $(basename "$interpreter")"

I would write the format string with the two substitutions:

printf '%d✕ is_user_root() in %s\n' \
    $iterations \
    "$(basename "$(readlink -f /proc/$$/exe)")"

Assigning iterations could be simplified: iterations=${1-10000}. It might be better to make it read-write, so we can decrement it until zero rather than introducing extra variable i:

while [ $iterations -gt 0 ]
do
    is_user_root
    : $((iterations -= 1))
done

The print_time function isn't really necessary. I'd replace the call sites (including the printf prefix) with these instead:

date '+Start: %T.%2N'
date '+Finish: %T.%2N'

(Actually, I'd use %s and %N instead and present the desired information, namely the duration, directly to the user)


Improved

is_user_root() {
    if [ $(/usr/bin/id -u) != 0 ]
    then
        is_user_root() { true; }
    else
        is_user_root() { false; }
    fi
    is_user_root
}
iterations=${1-1000000}

interpreter=$(readlink -f /proc/$$/exe)

printf "%d✕ is_user_root() in %s\n" \
    $iterations \
    "$(basename "$(readlink -f /proc/$$/exe)")"

start=$(date +%s.%N)
while [ $iterations -gt 0 ]
do
    is_user_root
    : $((iterations -= 1))
done
end=$(date +%s.%N)

# Calibrate the empty loop
iterations=${1-1000000}
cstart=$(date +%s.%N)
while [ $iterations -gt 0 ]
do
    : $((iterations -= 1))
done
cend=$(date +%s.%N)

printf '%.3g seconds\n' $(dc -e "$end $start- $cend $cstart- - p")

Typical results on my system:

$ for sh in ksh dash bash; do $sh 301833.sh 1000000; done
1000000✕ is_user_root() in ksh93
1.64 seconds
1000000✕ is_user_root() in dash
-0.146 seconds
1000000✕ is_user_root() in bash
3.99 seconds

Yes, the dash result is negative - the code under test actually executed faster than the control loop. That's fairly consistent in my testing.

\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.