formal verification in R?
Does anyone know of work applying formal methods to R?
I'm particularly interested in work attempting to formally verify R's machine learning algorithms: when a big ML system generates a counterintuitive answer, what assurances are there that it's correctly implemented the underlying algorithms?
Thank you in advance,
Colin
See also questions close to this topic

How to combine two vectors in a data frame
I am not able to combine two vectors like these
Height Sex 178 M 167 M 183 M ... Height Sex 165 F 161 F 174 F ...
I would like to combine in a data frame like this
Height Sex 178 M 161 M 174 M
Thanks

smoothing spatial data on map (shapefile)
I am trying to plot a data with (Lat, Lon) and a value in each location. What I need to do is to plot the data over a shapefile, represent each value with a different color, and smooth the values, so no gaps occur in the map.
The data has the following structure: (Longitude, Latitude, Value), could you help me please.
I know how to plot the data over a shapefile with a different color, but the issue is in smoothing the values.
The data for the question is available in the following link:
https://drive.google.com/open?id=1bOpWjPoYc0Ix1cQqA73iFrby0J1wU5l
where the:
1 (Lon, Lat, Value) data is in csv file "cluster__groups"
2 The shapefile files are in the folder "shapefile_question"
the output should be something similar to the following image:

how to use the PCs (resulting from PCA) on my dataset in R?
I am an R learner. I am working on 'Human Activity Recognition' dataset from internet. It has 563 variables, the last variable being the class variable 'Activity' which has to be predicted.
I am trying to use KNN algorithm here from CARET package of R.
I have created another dataset with 561 numeric variables excluding the last 2  subject and activity.
I ran the PCA on that and decided that I will use the top 20 PCs.
pca1 < prcomp(human2, scale = TRUE)
I saved the data of those PCs in another dataset called 'newdat'
newdat < pca1$x[ ,1:20]
Now I am tryig to run the below code : but it gives me error because, this newdat doesn't have my class variable
trctrl < trainControl(method = "repeatedcv", number = 10, repeats = 3) set.seed(3333) knn_fit < train(Activity ~., data = newdat, method = "knn", trControl=trctrl, preProcess = c("center", "scale"), tuneLength = 10)
I tried to extract the last column 'activity' from the raw data and appending it using cbind() with 'newdat' to use that on knnfit (above) but its not getting appended.
any suggestions how to use the PCs ?

Which First Order theorem provers are guaranteed to halt on monadic inputs?
Monadic First Order Logic, where all predicates take exactly one argument, is a known decidable fragment of first order logic. Testing whether a formula is satisfiable in this logic is decidable, and there exist resolutionbased methods for deciding this.
I'm in a situation where I need to test some monadic firstorder logic statements for satisfiability. I realize I will hit theoretical limits for complexity, but I'm hoping I can get reasonable performance in common cases.
Right now, there exist a bunch of theorem provers, which provide fast methods for solving firstorder logic problems. These include However, because of undecidability, they're not guaranteed to halt. These include Vampire, SPASS, E, as well as the quantifier extensions of Z3 and CVC4.
My Question
Of existing theorem provers, are there any who are guaranteed to halt when given a monadic formula as input? Or is there a way to use them to (somewhat) efficiently test monadic formulas for satisfiability?

correct use of ``progress`` label
According to the man pages,
Progress labels are used to define correctness claims. A progress label states the requirement that the labeled global state must be visited infinitely often in any infinite system execution. Any violation of this requirement can be reported by the verifier as a nonprogress cycle.
and
Spin has a special mode to prove absence of nonprogress cycles. It does so with the predefined LTL formula:
(<>[] np_)
which formalizes nonprogress as a standard Buchi acceptance property.
But let's take a look at the very primitive promela specification
bool initialised = 0; init{ progress: initialised++; assert(initialised == 1); }
In my understanding, the
assert
should hold but verification fail becauseinitialised++
is executed exactly once whereas theprogress
label claims it should be possible to execute it arbitrarily often.However, even with the above LTL formula, this verifies just fine in ispin (see below).
How do I correctly test whether a statement can be executed arbitrarily often (e.g. for a locking scheme)?
(Spin Version 6.4.7  19 August 2017)
+ Partial Order Reduction
Full statespace search for:
never claim + (:np_:) assertion violations + (if within scope of claim) nonprogress cycles + (fairness disabled) invalid end states  (disabled by never claim)
Statevector 28 byte, depth reached 7, errors: 0
6 states, stored (8 visited) 3 states, matched 11 transitions (= visited+matched) 0 atomic steps
hash conflicts: 0 (resolved)
Stats on memory usage (in Megabytes):
0.000 equivalent memory usage for states (stored*(Statevector + overhead)) 0.293 actual memory usage for states 64.000 memory used for hash table (w24) 0.343 memory used for DFS stack (m10000) 64.539 total actual memory usage
unreached in init
(0 of 3 states)
pan: elapsed time 0.001 seconds
No errors found  did you verify all claims?
UPDATE
Still not sure how to use this ...
bool initialised = 0; init{ initialised = 1; } active [2] proctype reader() { assert(_pid >= 1); (initialised == 1) do :: else > progress_reader: assert(true); od } active [2] proctype writer() { assert(_pid >= 1); (initialised == 1) do :: else > (initialised == 0) progress_writer: assert(true); od }
And let's select testing for nonprogress cycles. Then ispin runs this as
spin a test.pml gcc DMEMLIM=1024 O2 DXUSAFE DNP DNOCLAIM w o pan pan.c ./pan m10000 l
Which verifies without error.
So let's instead try this with ltl properties ...
/*pid: 0 = init, 12 = reader, 34 = writer*/ ltl progress_reader1{ []<> reader[1]@progress_reader } ltl progress_reader2{ []<> reader[2]@progress_reader } ltl progress_writer1{ []<> writer[3]@progress_writer } ltl progress_writer2{ []<> writer[4]@progress_writer } bool initialised = 0; init{ initialised = 1; } active [2] proctype reader() { assert(_pid >= 1); (initialised == 1) do :: else > progress_reader: assert(true); od } active [2] proctype writer() { assert(_pid >= 1); (initialised == 1) do :: else > (initialised == 0) progress_writer: assert(true); od }
Now, first of all,
the model contains 4 never claims: progress_writer2, progress_writer1, progress_reader2, progress_reader1 only one claim is used in a verification run choose which one with ./pan a N name (defaults to N progress_reader1) or use e.g.: spin search ltl progress_reader1 test.pml
Fine, I don't care, I just want this to finally run, so let's just keep
progress_writer1
and worry about how to stitch it all together later:/*pid: 0 = init, 12 = reader, 34 = writer*/ /*ltl progress_reader1{ []<> reader[1]@progress_reader }*/ /*ltl progress_reader2{ []<> reader[2]@progress_reader }*/ ltl progress_writer1{ []<> writer[3]@progress_writer } /*ltl progress_writer2{ []<> writer[4]@progress_writer }*/ bool initialised = 0; init{ initialised = 1; } active [2] proctype reader() { assert(_pid >= 1); (initialised == 1) do :: else > progress_reader: assert(true); od } active [2] proctype writer() { assert(_pid >= 1); (initialised == 1) do :: else > (initialised == 0) progress_writer: assert(true); od }
ispin runs this with
spin a test.pml ltl progress_writer1: [] (<> ((writer[3]@progress_writer))) gcc DMEMLIM=1024 O2 DXUSAFE DSAFETY DNOCLAIM w o pan pan.c ./pan m10000
Which does not yield an error, but instead reports
unreached in claim progress_writer1 _spin_nvr.tmp:3, state 5, "(!((writer[3]._p==progress_writer)))" _spin_nvr.tmp:3, state 5, "(1)" _spin_nvr.tmp:8, state 10, "(!((writer[3]._p==progress_writer)))" _spin_nvr.tmp:10, state 13, "end" (3 of 13 states)
Yeah? Splendid! I have absolutely no idea what to do about this.
How do I get this to run?

Testing polynomial definition (from natural numbers to integers)
For my first formalization. I want to define polynomials in Lean. The first attempt is shown below:
def polynomial (f : ℕ → ℤ ) (p: (∃m:ℕ , ∀ n : ℕ, implies (n≥m) (f n = (0:ℤ ) ) )):= f
Now a want to test my definition using:
def test : ℕ → ℤ  0 := (2:ℤ )  1 := (3:ℤ )  2 := (4:ℤ )  _ := (0:ℤ )
But I am having trouble constructing the proof term:
def prf : (∃m:ℕ , ∀ n : ℕ, implies (n≥m ) (test n = (0:ℤ ) ) ):= exists.intro (3:ℕ ) ( assume n : ℕ, nat.rec_on (n:ℕ) () () )
Also feedback on the definition itself is welcome.

Creating a calculator having add and division functions in Z specification
I am given a task to create a calculator having functionalities of addition and division in z specification. I know nothing about this. Its our first class and we got an assignment. Please anybody help me.

Why can't Coq figure out symmetry of the equality by itself?
Suppose we're trying to formalize some (semi)grouptheoretic properties, like this:
Section Group. Variable A: Type. Variable op: A > A > A. Definition is_left_neutral (e: A) := forall x: A, (op e x) = x. Definition is_right_neutral (e: A) := forall x: A, x = (op x e). Lemma uniqueness_of_neutral: forall a b: A, (is_left_neutral a) > (is_right_neutral b) > (a = b). Proof. intro; intro. intros lna rnb. elim lna with b; elim rnb with a. reflexivity. Qed. End Group.
It works just fine, but, if we reverse the equation in either of the above definitions, i.e. replace the definitions with
Definition is_left_neutral (e: A) := forall x: A, x = (op e x).
and
Definition is_right_neutral (e: A) := forall x: A, (op x e) = x.
respectively, the proof fails at
reflexivity
, since one or both of theelim
applications do nothing. Sure there is a workaround for it, based onassert
, but that's... too much effort and simply annoying...Is there a reason why the involved Coq tactics (
elim
,case
, etc.) are so much sensitive to the order? I suppose, it shouldn't slow down the tactics any noticeably (<< 2 times).Is there a way to make them apply
symmetry
automatically, where needed, without bothering me about it every time? Couldn't find any mention of this issue in the manual.

Where to get hardware model data?
I have a task which consists of 3 concurrent selfdefined (recursive to each other) processes. I need somehow to make it execute on computer, but any attempt to convert a requirement to program code with just my brain fails since first iteration produces 3^3 entities with 27^2 crossrelations, but it needs to implement at least several iterations to try if program even works at all.
So I decided to give up on trying to understand the whole system and formalized the problem and now want to map it to hardware to generate an algorithm and run. Language doesn't matter (maybe even directly to machine/assembly one?).
I never did anything like that before, so all topics I searched through like algorithm synthesis, software and hardware codesign, etc. mention hardware model as the second half (in addition to problem model) of solution generation, but I never seen one. The whole work supposed to look like this:
I don't know yet what level hardware model described at, so can't decide how problem model must be formalized to fit hardware model layer.
For example, target system may contain CPU and GPGPU, let's say target solution having 2 concurrent processes. System must decide which process to run on CPU and which on GPGPU. The highest level solution may come from comparing computational intensity of processes with target hardware, which is ~300 for CPUs and ~50 for GPGPUs.
But a normal model gotta be much more complete with at least cache hierarchy, memory access batch size, etc.
Another example is implementing kary trees. A synthesized algorithm could address parents and children with computingk * i + c
/( i  1 ) / k
or store direct pointers  depending oncomputations per memory latency
ratio.Where can I get a hardware model or data to use? Any hardware would suffice for now  to just see how it can look like  later would be awesome to get models of modern processors, GPGPUs and common heterogeneous clusters.
Do manufacturers supply such kinds of models? Description of how their systems work in any formal language.