Capacitated Slugging Dynamic Programming
Def 1: A slugging graph G = (V, E), is a directed acyclic graph where V = {T1, T2, ..., Tm} is a set of trips and E is a set of directed edges between nodes that is transitive, i.e. if (Ti, Tj) in E and (Tj, Tk) in E, then (Ti, Tk) in E.
Def 2: Given a slugging graph G = (V, E), a slugging plan Gs = (V, Es), Es in E, is a subgraph of G that satisfies the following conditions: (i) For all (Ti, Tj) in Es, there is no k != j such that (Ti, Tk) in Es; (ii) For all (Ti, Tj) in Es, there does not exist k suck that (Tk, Ti) in Es.
Basically, (i) says that Ti can be merged into at most 1 other trip and (ii) says that Ti can be merged into another trip if and only if there is no other Tk that has been merged into Ti.
Def 3: Given a slugging graph G = (V, E), a benefit is a value associated with each edge (Ti, Tj) in E. The value of benefit is given by function B(Ti, Tj) in R+. The value of B depends only on the sorce edge, i.e. B(Ti, Tj) = B(Ti, Tk) = B(Ti, Tm) and so on and so forth.
Def 4: Total benefit is the summation of B(Ti, Tj) in the slugging plan Gs = (V, Es), i.e. (Ti, Tj) in Es, B(Gs).
Def 5: Each trip Ti has number of passengers and number of available seats. If Ti is merged into Tj, passengers of Ti occupy some available seats of Tj. Number of available seats of Tj decreases so. If Ti has more passengers than available seats in Tj, Ti cannot be merged into Tj and, therefore, there is no (Ti, Tj) in Es.
Problem: Given a slugging graph G = (V, E) and definitions 15, find a slugging plan Gs = (V, Es), Es in E, that has maximal total benefit B(Gs).
For a detailed description of the problem, visit this link and see subsection 3.3.
I have to solve this problem using Dynamic Programming. It seems to me related to 01 Knapsack to some extent. For example, deciding whether or not to merge Ta, Tb and Tc into Tj is similar to having objects a, b and c and try to fit then into a bag j in a way the profit is maximum and bag capacity is preserved. However, Ta, Tb and Tc may become a bag rather than an object in this problem. Based on that, I have not been able to find the optimal substructure to start thinking about Dynamic Programming details. Could anyone lend me a hand on this?
See also questions close to this topic

callback in scipy.optimize does not call first value
I've got this simple Problem with callbacks in scipy.
I'm using optimize.minimize with values (func, x0, callback=callbackFunc). The callback function does work, BUT only returns the values after step 1.
x0=(240.,220.) Nfeval=0 interim_res optimize.minimize(func, x0, callback=callbackFunc) def callbackFunc(X): global Nfeval, interim_res print('{0:4d} {1: 3.6f} {2: 3.6f}'.format(Nfeval, X[0], X[1])) Nfeval += 1 interim_res.append([X[0], X[1]])
So in my opinion it should start with
0 240 220 ...
BUT first return is:
0 173.345 159.56
Is this the usual way minimize works or do I somehow interpret something wrong?

SQLite DB Optimization for Large Dataset
I am writing a disk catalogue application in PHP. My script scans through a filesystem, gathering information about each file/dir/link, and stores that information in an SQLite database. A web application then queries the DB to present a navigable representation of the filesystem. I am using a variety of tools to gather metadata based on filetype. I am somewhat familiar with SQL, but this is my first SQLite app.
My application does not update records, ever. Once a catalogue is generated, it does not change. My test filesystem contains over 900,000 files, and each results in a row in the DB.
My first attempt at this was several tables with different types of information, including thumbnail image blobs (bad idea), each with an autoincrement id. I did not use foreign keys. The main files table contained about 10 fields, one of which was parent. In order to create a directory view, I query the files table to find records with parent of X ID, then pull all records with that ID from each of the various tables. Performance was horrible.
CREATE TABLE files ( id INTEGER PRIMARY KEY, parent INTEGER, Pathname TEXT, Path TEXT, Filename TEXT, Extension TEXT, Type TEXT, items INTEGER, newest INTEGER, stat TEXT, LinkTarget TEXT, RealPath TEXT, Inode INTEGER, Size INTEGER, Perms INTEGER, Owner TEXT, ATime INTEGER, MTime INTEGER, CTime INTEGER, gfi_type TEXT, gfi_attr TEXT, gfi_created TEXT, hash TEXT, tinfo TEXT ) CREATE TABLE mdls ( id INTEGER PRIMARY KEY, hasmeta INTEGER, DateAdded TEXT, ContentType TEXT, Creator TEXT, Kind TEXT, UserTags TEXT, FSInvisible INTEGER, PixelWidth INTEGER, PixelHeight INTEGER, spotlight TEXT ) CREATE TABLE metadata ( id INTEGER PRIMARY KEY, duration TEXT, mediainfo TEXT, exiftool TEXT ) CREATE TABLE thumbs ( id INTEGER PRIMARY KEY, thumb BLOB )
I did some reading here and on tutorial sites, and learned autoincrement keys could be a performance issue. So, I redesigned the DB and rewrote the app. Thumbs are now stored externally. I am now using an md5 hash of the filename as a key. The parent/child relationships are stored as serialized arrays, so creating a dir view is as simple as retrieving the array of children, and then looping over the array and selecting the rows by pid from the files table. The files table is a very large table that contains all the metadata on each file. Most are short strings but some are largeish text blobs.
CREATE TABLE family ( pid TEXT, fid TEXT, children TEXT ) CREATE TABLE files ( pid TEXT, fid TEXT, Pathname TEXT, Path TEXT, Filename TEXT, Extension TEXT, Type TEXT, Inode INTEGER, Perms INTEGER, Owner TEXT, ATime INTEGER, CTime INTEGER, MTime INTEGER, LinkTarget TEXT, RealPath TEXT, stat TEXT, items INTEGER, newest INTEGER, gfi_type TEXT, gfi_attr TEXT, gfi_created TEXT, Size INTEGER, Title TEXT, PixelWidth INTEGER, PixelHeight INTEGER, Duration INTEGER, DateTimeOriginal INTEGER, Origin TEXT, GPS TEXT, Author TEXT, spotlight TEXT, kMDItemDateAdded INTEGER, kMDItemLastUsedDate INTEGER, kMDItemUseCount INTEGER, kMDItemContentModificationDate INTEGER, kMDItemContentType TEXT, kMDItemCreator TEXT, kMDItemFSCreatorCode TEXT, kMDItemKind TEXT, kMDItemFSTypeCode TEXT, kMDItemUserTags TEXT, kMDItemFSInvisible INTEGER, kMDItemNumberOfPages INTEGER, kMDItemPageHeight INTEGER, kMDItemPageWidth INTEGER, kMDItemWhereFroms TEXT, kMDItemEncodingApplications TEXT, has_exif INTEGER, has_mediainfo INTEGER, has_hash INTEGER, thumb_filename TEXT, thumb_width INTEGER, thumb_height INTEGER, ProfileDescription TEXT, BitDepth INTEGER, Compression TEXT, Orientation INTEGER, LensType TEXT, VideoFormat TEXT, AudioFormat TEXT, Tracks INTEGER, Profile TEXT, Bitrate INTEGER )
Performance is as bad, if not worse than before. I've read up on best practices for large relational databases. But my data is not complex interrelated sets of information. It is just 900,000 rows of file metadata. All I need to do is retrieve around 12000 rows (average around 20) at a time. Why does it take 30 seconds? Is SQLite the best tool? I can't use MySQL because I need to embed the DB file.
I understand that part of the issue is the size of the DB (1.5gb). Yes, there is a lot of redundant data. I could break out things like file extension into separate tables. Then rather than storing "jpg" over and over I would store ExtID and query the Ext DB to get the string. But that would make my select statements much more complex. From what I understand I would use foreign keys or an inner join? Would that really help with performance? Here is an example query:
$dir = unserialize($dbo>query("SELECT children FROM family WHERE (pid='".$pid."')")>fetch()['children']); foreach ($children as $child) { $list[] = $dbo>query("SELECT * FROM files WHERE (pid='".$thing."')")>fetchAll(); }
Any help is very much appreciated!

code efficiency in python
I'm coding for a project euler question and every now and then, the question will demand a program that is efficient even when doing brute force. Which I struggle with.
Below is a piece of code for problem 35 which I'm fairly certain works correctly so far with numbers under 10000 however when I set it to 1 million it takes way too long to run. I still havent got an answer yet.
If anyone could give me general tips on efficiency that would be awesome!
def rwh_primes(n): sieve = [True] * n for i in range(3,int(n**0.5)+1,2): if sieve[i]: sieve[i*i::2*i]=[False]*int((ni*i1)/(2*i)+1) return [2] + [i for i in range(3,n,2) if sieve[i]] def is_prime(n): for i in range(3, n): if n % i == 0: return False return True def f7(seq): seen = set() seen_add = seen.add return [x for x in seq if not (x in seen or seen_add(x))] primes = rwh_primes(1000000) lisp = [] working = [] count = 0 counter = 0 for x in primes: z = 1 y = (len(str(x)) + 1) if len(str(x)) == 2: thingy = list(str(x)) number = int(thingy[1] + thingy[0]) if is_prime(number) == True: lisp.append(x) elif len(str(x)) < 2: lisp.append(x) else: while count < len(sorted(str(x)))  1: num = list((str(x) + str(x))) new = num[z:y] newest = ''.join(new) verynew = int(newest) working.append(verynew) count += 1 z += 1 y += 1 if count == len(sorted(str(x)))  1: for a in working: if is_prime(a) == True: counter += 1 if counter == len(working): lisp.append(x) lisp = f7(lisp) count = 0 counter = 0 working = [] print(len(lisp))

Algorithm: How to find sub set of a set when new elements are added?
I have a subset algorithm that finds all the subset of a given set. The thing with the original set is that it is a growing set and if elements are added to it, I need to recompute the subset of it again.
Is there a way that I can optimize the subset algorithm that can recompute the subset from the last computed point, instead of computing whole thing again and again.
public static IEnumerable<IEnumerable<T>> SubSetsOf<T>(IEnumerable<T> source) { if (!source.Any()) return Enumerable.Repeat(Enumerable.Empty<T>(), 1); var element = source.Take(1); var haveNots = SubSetsOf(source.Skip(1)); var haves = haveNots.Select(set => element.Concat(set)); return haves.Concat(haveNots); } private static bool Valid(IEnumerable<int> set) { bool flag = false; foreach (var element in set) { var f = element > 0; if (f == flag) { return false; } flag = f; } return true; }

Can be MDOLLS on Spoj solved using LIS tecnique?
The question is:
Dilworth is the world's most prominent collector of Russian nested dolls: he literally has thousands of them! You know, the wooden hollow dolls of different sizes of which the smallest doll is contained in the second smallest, and this doll is in turn contained in the next one and so forth. One day he wonders if there is another way of nesting them so he will end up with fewer nested dolls? After all, that would make his collection even more magnificent! He unpacks each nested doll and measures the width and height of each contained doll. A doll with width w1 and height h1 will fit in another doll of width w2 and height h= if and only if w1 < w2 and h1 < h2. Can you help him calculate the smallest number of nested dolls possible to assemble from his massive list of measurements? link for other informations about input and output and samples: http://www.spoj.com/problems/MDOLLS/
I know this problem can be solved by maximum bipartite matching tecnique but I want to know is there any way to solve using LIS?
If we have
w1 <= w2
andh1 <= h2
instead ofw1 < w2
andh1 < h2
I have anO(n^2 log n)
solution using LIS as follow:Initially we sort dolls by their w values and then try to remove LIS from new order of dolls. number of times we can remove LIS from our dolls is the answer.

Dynamic Programming R Code Annotation
For an assignemnt, I was asked to annotate an R code that gives the optimal path of a directed network using dynamic programming. As you can see, I'm on my way, but am stuck on the step marked by a bunch of question marks. Could someone please clarify where I am getting stuck here, please?
Many thanks!
mincostnetwork=function(data) { N=dim(data)[1] #we set N to be the number of ROWS of our matrix f=rep(Inf,N) #we set f to be a repetition of "infinities" N many times  in this case, however many rows we've got. optarc=f #optimal arc is the f array above #RECURSION for(i in N:1) #we start a for loop that cycles through each row in our matrix, starting from the last, to the first { #TERMINAL CONDITION if(i==N){ #for the first recursion, which is the terminal node f[i]=0 #the last value in the array 'f' we created turns from infinity to 0 optarc[i]=N} #optimal arc for N = N #ITERATE OVER ARCS else{ #for every other arc arc=which(data[i,]!=Inf) #we create a new array called 'arc', that has every value that IS NOT infinity if(length(arc)>0) #if the row has any values other than infinity (i.e. an arc to another node) { costs=rep(Inf,length(arc)) #we create a new array called costs, having as many "infinities" as there are arcs for that row for(j in 1:length(arc)) #start a for loop to iterate as many times as there are arcs { costs[j]=data[i,arc[j]]+f[arc[j]] #???????HAVING TROUBLE FOLLOWING arc[j] f[i]=min(costs) #replace the 'infinity' of the row iterated in the 'f' array for the minimum of the costs #record optimal path optarc[i]=arc[which(costs==f[i])[1]] } } else{ f[i]=Inf optarc[i]=NA } } } #determine optimal path end=1 path=end while(end<N) { end=optarc[end] path=c(path,end) } #return min cost and optimal path list(f[1],f,path) }

Is this an NPC?
Is this problem a NPC problem?
Instance: Graph G = (V, E) and integer k Question: Does G contains a spanning tree T where every node in T is of degree at most k?

Reduction of 0,1 knapsack to subset sum
I know how to reduce subset sum to 0,1 knapsack. But is it possible to reduce knapsack to subset sum? How?

Difference between CSAT and SAT?
What exactly is the difference between these two NPcomplete problems? It seems to me that they are both asking if a boolean formula can be satisfied (i.e. output 1) but one is in the context of a circuit and the other just a formula. However couldnt one write a boolean formula from a boolean circuit?

Shortest Path from Node A to B by going through all other Nodes (NPHard?)
My problem: Find the shortest path from node
A
to nodeB
that passes through all other nodes of the unweighted, direct graph. I know that there exists such a path.I believe this is NPHard, but I can't explain it. My Prof. likes to have the algorithm perform on a runtime of
O(V + E)
, whereV
is the set of nodes andE
the set of edges.It seems it is similar to this problem, but the Graph's properties are different, does it make a difference?

Misunderstanding of NPhard
I know that NPhard is :
Every
X
in NP is reduced toH
. And hereH
is NPhard.But, in the NP diagram, NP is exclusive to NPhard. They have no common thing.
But, as the definition of NP, NP is hard to solve, and easy to verify.
And, as the definition of NPhard, H is also hard to solve, and easy to verify because all NP problem is reduced to
H
.So the result is
X
=H
(NP = NPhard). But why they are not common? 
Reduction from 3SAT
I'm solving a problem, which to prove that it's NPhard.
The problem is:
There're n people who will be positioned in a big line.
Each person must either point both their arms up or both their arms down
There's a list of forbidden arrangements. Each forbidden arrangement is a subset of the people paired with arm positions. For example, A may not point his arms down while B, C, D point their arms up.
I'm thinking of reducing 3SAT to this problem in order to prove that it's NPhard. The variables would be the n people and clauses would be the forbidden arrangements.
But I'm stuck here. I'm not sure what to do next.
Any help/suggestions would be appreciated.