Reproducible SCI-F Apps¶
Why do we need SCI-F?¶
The Scientific Filesystem (SCI-F) provides a consistent and modular method to create
containers. The SCI-F makes some assumptions about how the container will be created. By adhering to with the SCI-F format containers can bemore more reproducible.
For example, installing a set of
libraries, defining environment variables, or adding labels that
belong to application foo
makes a strong assertion that those dependencies belong
to foo
. When I run foo
, I can be confident that the container is running
in this context, meaning with foo
’s custom environment, and with foo
’s libraries
and executables on the path. This approach is different from
serving many executables in a single container that may have no way to know which application is associated with the container’s
intended functions. This documentation will walk through some
rationale, background, and examples of the SCI-F integration for
Singularity containers. For other examples (and a client that works
across container technologies) see the the scientific filesystem.
To start, let’s take a look at this series of steps to install dependencies for software foo and bar.
%post
# install dependencies 1
# install software A (foo)
# install software B (bar)
# install software C (foo)
# install software D (bar)
The creator may know that A and C were installed for foo
and B and D for bar
,
but down the road, when someone discovers the container, if they can
find the software at all, the intention of the container creator would
be lost. As many are now, containers without any form of internal
organization and predictability are black boxes. We don’t know if some
software installed to /opt
, or to /usr/local/bin
, or to their custom favorite folder /code
. We
could assume that the creator added important software to the path and
look in these locations, but that approach is still akin to fishing in a
swamp. We might only hope that the container’s main function, the
Singularity runscript, is enough to make the container perform as
intended.
Mixed up Modules¶
If your container truly runs one script, the traditional model of a
runscript fits well. Even in the case of having two functions like foo
and bar
you probably have something like this.
%runscript
if some logic to choose foo:
check arguments for foo
run foo
else if some logic to choose bar:
run bar
and maybe your environment looks like this:
%environment
BEST_GUY=foo
export BEST_GUY
%environment
BEST_GUY=foo
BEST_GUY=bar
export BEST_GUY
You obviously can’t have them at separate times. You would have to source some custom environment file (that you make on your own) and it gets confusing with issues of using shell and sourcing for the container. We don’t know who the best guy is! You probably get the general idea. Without consistent internal organization and modularity the following may result:
You have to do a lot of manual work to expose the different software to the user via a custom runscript (and be a generally decent programmer).
All software must share the same metadata, environment, and labels.
Under these conditions, containers are at best black boxes with unclear
delineation between software provided, and only one context of running
anything. The container creator shouldn’t need to spend inordinate
amounts of time writing custom runscripts to support multiple functions
and inputs. Each of foo
and bar
should be easy to define, and have its own
runscript, environment, labels, tests and help section.
Container Transparency¶
Applications that use the SCI-F make foo
and bar
transparent, and solve the problem of mixed up
modules. Our simple issue of mixed up modules could be solved if we
could do this:
Bootstrap:docker
From: ubuntu:16.04
%appenv foo
BEST_GUY=foo
export BEST_GUY
%appenv bar
BEST_GUY=bar
export BEST_GUY
%apprun foo
echo The best guy is $BEST_GUY
%apprun bar
echo The best guy is $BEST_GUY
and generate the container as follows:
$ sudo singularity build foobar.simg Singularity
and finally, run the container with the context of foo
and then bar
$ singularity run --app bar foobar.simg
The best guy is bar
$ singularity run --app foo foobar.simg
The best guy is foo
Using SCI-F apps, a user can easily discover both foo
and bar
without knowing
anything about the container:
singularity apps foobar.simg
bar
foo
Each application can be inspected individually:
singularity inspect --app foo foobar.simg
{
"SCIF_APP_NAME": "foo",
"SCIF_APP_SIZE": "1MB"
}
Container Modularity¶
This behavior is made possible by a simple, clean organization that
is tied to a set of sections in the build recipe relevant to each app.
For example, I can specify custom install procedures (and they are
relevant to each app’s specific base defined under /scif/apps
), labels, tests, and
help sections. Before we examine the sections, consider what the organization looks like, for each app:
/scif/apps/
foo/
bin/
lib/
scif/
runscript.help
runscript
env/
01-base.sh
90-environment.sh
bar/
....
If you are familiar with Singularity, the above should be very similar to other environments.
It mirrors the Singularity (main container) metadata folder, except
instead of .singularity.d
we have scif
. The name and base scif
is chosen intentionally to be
something short, and likely to be unique. On the level of organization
and metadata, these internal apps are like little containers. You need to remember all the path details because you can use environment variables in your runscripts,
etc. Here we are looking at the environment active for lolcat:
singularity exec --app foo foobar.simg env | grep foo
Consider the output of the above in sections, you will notice
some interesting things. First, notice that the app’s bin
has been added to
the path, and its lib
folder added to the LD_LIBRARY_PATH
. This means that anything you drop in
either will automatically be added. You don’t need to make these folders
either, they are created for you.
LD_LIBRARY_PATH=/scif/apps/foo/lib::/.singularity.d/libs
PATH=/scif/apps/foo/bin:/scif/apps/foo:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
Next, notice the environment variables relevant to the active app’s (foo) data and metadata. They look like the following:
SCIF_APPOUTPUT=/scif/data/foo/output
SCIF_APPDATA=/scif/data/foo
SCIF_APPINPUT=/scif/data/foo/input
SCIF_APPMETA=/scif/apps/foo/scif
SCIF_APPROOT=/scif/apps/foo
SCIF_APPNAME=foo
We also have foo’s environment variables defined under %appenv foo
, and
importantly, we don’t see bar’s.
BEST_GUY=foo
Also provided are more global paths for data and apps:
SCIF_APPS=/scif/apps
SCIF_DATA=/scif/data
Note, each application has its own modular location. When you do an %appinstall foo
,
the commands are all done in context of that base. The bin
and lib
directories are
also automatically generated. Consider a simple application.
Just add a script and name it:
%appfiles foo
runfoo.sh bin/runfoo.sh
and then maybe for install I’d make it executable
%appinstall foo
chmod u+x bin/runfoo.sh
You don’t even need files! You could just do this.
%appinstall foo
echo 'echo "Hello Foo."' >> bin/runfoo.sh
chmod u+x bin/runfoo.sh
We can summarize these observations about using apps:
the specific environment (
%appenv_foo
) is active becauseBEST_APP
is foothe lib folder in foo’s base is added to the LD_LIBRARY_PATH
the bin folder is added to the path
locations for input, output, and general data are exposed. It’s up to you how you use these, but you can predictably know that a well made application will look for inputs and outputs in it’s specific folder.
environment variables are provided for the app’s root, it’s data, and it’s name
Sections¶
Finding the section %appinstall
, %apphelp
, or %apprun
is indication of an application command.
The following string is parsed as the name of the application, and
this folder is created, in lowercase, under /scif/apps
if it doesn’t exist. A
singularity metadata folder, .singularity.d, equivalent to the
container’s main folder, is generated inside the application. An
application thus is like a smaller image inside of it’s parent.
Specifically, SCI-F defines the following new sections for the build
recipe, where each is optional for zero or more apps:
%appinstall corresponds to executing commands within the folder to install the application. These commands would previously belong in %post, but are now attributable to the application.
%apphelp is written as a file called runscript.help in the application’s metadata folder, where the Singularity software knows where to find it. If no help section is provided, the software simply will alert the user and show the files provided for inspection.
%apprun is also written as a file called runscript.exec in the application’s metadata folder, and again looked for when the user asks to run the software. If not found, the container should default to shelling into that location.
%applabels will write a labels.json in the application’s metadata folder, allowing for application specific labels.
%appenv will write an environment file in the application’s base folder, allowing for definition of application specific environment variables.
%apptest will run tests specific to the application, with present working directory assumed to be the software module’s folder
%appfiles will add files to the app’s base at /scif/apps/<app>
Interaction¶
The complete output of a grep
to the environment when
running foo
in the first example was not shown because the remainder of variables
are more germane to a discussion about application interaction. Essentially, when
any application is active, we also have named variable that can explicitly
reference the environment file, labels file, runscript, lib
and bin
folders for
all app’s in the container. For our above Singularity Recipe, we would
also find:
SCIF_APPDATA_bar=/scif/data/bar
SCIF_APPRUN_bar=/scif/apps/bar/scif/runscript
SCIF_APPROOT_bar=/scif/apps/bar
SCIF_APPLIB_bar=/scif/apps/bar/lib
SCIF_APPMETA_bar=/scif/apps/bar/scif
SCIF_APPBIN_bar=/scif/apps/bar/bin
SCIF_APPENV_bar=/scif/apps/bar/scif/env/90-environment.sh
SCIF_APPLABELS_bar=/scif/apps/bar/scif/labels.json
SCIF_APPENV_foo=/scif/apps/foo/scif/env/90-environment.sh
SCIF_APPLABELS_foo=/scif/apps/foo/scif/labels.json
SCIF_APPDATA_foo=/scif/data/foo
SCIF_APPRUN_foo=/scif/apps/foo/scif/runscript
SCIF_APPROOT_foo=/scif/apps/foo
SCIF_APPLIB_foo=/scif/apps/foo/lib
SCIF_APPMETA_foo=/scif/apps/foo/scif
SCIF_APPBIN_foo=/scif/apps/foo/bin
This design means that we can have apps interact with one another internally. For example, let’s modify the recipe a bit:
Bootstrap:docker
From: ubuntu:16.04
%appenv cow
ANIMAL=COW
NOISE=moo
export ANIMAL NOISE
%appenv bird
NOISE=tweet
ANIMAL=BIRD
export ANIMAL
%apprun moo
echo The ${ANIMAL} goes ${NOISE}
%appenv moo
. ${APPENV_cow}
In the above example, we have three apps. One for a cow, one for a bird,
and a third that depends on the cow. We can’t define global functions or
environment variables (in %post
or /environment
, respectively) because they would
interfere with the third app, bird, that has equivalently named
variables. What we do then, is source the environment for “cow” in the
environment for “moo” and the result is what we would want:
$ singularity run --app moo /tmp/one.simg
The COW goes moo
The same is true for each of the labels, environment, runscript, bin, and lib. The following variables are available to you, for each application in the container, whenever any application is run:
**SCIF_APPBIN_*: the path to the bin folder, if you want to add an application that isn’t active to your
PATH
**SCIF_APPLIB_*: the path to the lib folder, if you want to add an application that isn’t active to your ‘LD_LIBRARY_PATH‘
**SCIF_APPRUN_*: the app’s runscript (so you can call it from elsewhere)
**SCIF_APPMETA_*: the path to the metadata folder for the app
**SCIF_APPENV_*: the path to the primary environment file (for sourcing) if it exists
**SCIF_APPROOT_*: the application’s install folder
**SCIF_APPDATA_*: the application’s data folder
**SCIF_APPLABELS_*: The path to the label.json in the metadata folder, if it exists
Singularity containers are already reproducible in that they package dependencies. The basic format of SCI-F adds to that by making the software inside of containers modular, predictable, and programmatically accessible. By pre-setting some set of steps, labels, or variables in the runscript is associated with a particular action of the container, users can better encapsulate how dependencies relate to each step in a scientific work-flow. Making containers can be challenging. When a scientist starts to write a recipe for his set of tools, she probably doesn’t know where to put various tags and data. The SCI-F file system makes it easy to build consistent maintainable containers.
SCI-F Example: Cowsay Container¶
As an example, we will use the `cowsay container`_. cowsay
is a program that generates ASCII pictures of a cow with a message.
It also uses the fortune
program to produce random “fortunes” and the lolcat
application that adds rainbow color to an ASCII string.
Warning
Important! This example has been developed for Singularity 2.4.
Download the recipe, and save it to your present working directory.
wget https://raw.githubusercontent.com/sylabs/singularity/master/examples/apps/Singularity.cowsay
sudo singularity build moo.simg Singularity.cowsay
Determine what apps are installed using the following command:
singularity apps moo.simg
cowsay
fortune
lolcat
Ask for help for a specific application as follows:
singularity help --app fortune moo.simg
fortune is the best app
A simple loop can be used to ask for help from all apps (without knowing in advance what they are):
for app in $(singularity apps moo.simg)
do
singularity help --app $app moo.simg
done
cowsay is the best app
fortune is the best app
lolcat is the best app
To run the fortune
application enter the following (the actual fortune is random, so your display may differ):
singularity run --app fortune moo.simg
My dear People.
My dear Bagginses and Boffins, and my dear Tooks and Brandybucks,
and Grubbs, and Chubbs, and Burrowses, and Hornblowers, and Bolgers,
Bracegirdles, Goodbodies, Brockhouses and Proudfoots. Also my good
Sackville Bagginses that I welcome back at last to Bag End. Today is my
one hundred and eleventh birthday: I am eleventy-one today!"
-- J. R. R. Tolkien
Next, pipe the output of fortune
into lolcat
to add color to the fortune.
singularity run --app fortune moo.simg | singularity run --app lolcat moo.simg
You will be surrounded by luxury.
Send the output of fortune
to the cowsay
application.
singularity run --app fortune moo.simg | singularity run --app cowsay moo.simg
________________________________________
/ Executive ability is prominent in your \
\ make-up. /
----------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
Finally use all three applications and demonstrate how to use an environment variable for the command:
CMD="singularity run --app"
$CMD fortune moo.simg | $CMD cowsay moo.simg | $CMD lolcat moo.simg
_________________________________________
/ Ships are safe in harbor, but they were \
\ never meant to stay there. /
-----------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
The application can be inspected with the following command:
singularity inspect --app fortune moo.simg
{
"SCIF_APP_NAME": "fortune",
"SCIF_APP_SIZE": "1MB"
}
If you want to see the full specification or create your own Scientific Filesystem integration (doesn’t have to be Singularity, or Docker, or containers!) see the full documentation.
Also, you can follow along with this example by going to: take a look at these examples