Procesing Large Tables
CapableObjects Forums
Home       Members    Calendar    Who's On
Welcome Guest ( Login | Register )
        



Procesing Large Tables Expand / Collapse
Author
Message
Posted 2008-10-23 19:02:46


Junior Member

Junior MemberJunior MemberJunior MemberJunior MemberJunior MemberJunior MemberJunior MemberJunior Member

Group: Forum Members
Last Login: 2011-07-06 23:38:03
Posts: 22, Visits: 74
Howdy, I have a system that is capturing the ACL rules from every file on a file server into the following structure:

File  0 -- * Rule

There are 218,233 File objects and 915,013 Rule objects.

Once my system "loads" these rows into the database, i need to do some post-processing on each of the Rule Objects.

The Rule objects has a StateMachine, with "Loaded", "RoleDiscovery" and "Ready" states stored to the RootState member.

I am trying to find a (relatively) quick method for pulling 10-50 Rule objects in the RoleDiscovery state to do this post-processing.

I have tried the following:

1)

IObjectList olRules = EcoSpace.OclPs.Execute("NTFSAccessRule.allInstances->select(RootState=\'RoleDiscovery\')");

This of course takes a millenium to execute, and almost 3GB of RAM.

2)

IElement ieRule = EcoSpace.Ocl.Evaluate("NTFSAccessRule.allInstances->select(RootState=\'RoleDiscovery\')->first");

Tried this to get the first object in that state.... still takes forever and a day.

3)

IElementCollection ieRuleX = EcoSpace.OclPs.Execute("NTFSAccessRule.allInstances->select(RootState=\'RoleDiscovery\')").GetAsCollection();

Then, there was this approach, of course, it takes forever too, and renders my system useless due to resources.

 

In SQL, I can "SELECT TOP 10"; how do I do the same in OCL / Code?    EnsureRange doesn't seem to do what I need... even if query an IObjectList, and then EnsureRange following that on say 100 records, the system still queries all 900k rows to get the IDs.

Suggestions?

 



Thanks in advance, Tim McKay

// ECO VI (6.0.0.5610) Microsoft Visual Studio.NET 2010 (10.0.40219.1 SP1Rel)

// Host OS : Mac OS X 10.6.8 running Parallels for Mac Build 6.0.12090
(Revision 660720; May 26, 2011); Guest OS : Windows XP SP3 (32-bit)
Post #641
Posted 2008-10-23 21:35:48


Junior Member

Junior MemberJunior MemberJunior MemberJunior MemberJunior MemberJunior MemberJunior MemberJunior Member

Group: Forum Members
Last Login: 2011-07-06 23:38:03
Posts: 22, Visits: 74
Status Update:

I've optimized my code to draw 1000 records and process in a reasonable time frame, here is my code...  It takes on average 4.105 seconds to process 1000 records.

private void button2_Click(object sender, EventArgs e)

{

DateTime st = DateTime.Now;

IModifiableVariableList vars;

vars = EcoSpace.VariableFactory.CreateVariableList();

vars.AddConstant("_state", "RoleDiscovery");

IObjectList olRules = EcoSpace.OclPs.Execute(null, vars,

"NTFSAccessRule.allInstances->select(RootState=_state)", 1000, 0);

olRules.EnsureObjects();

IElementCollection iecDEB = EcoSpace.Ocl.Evaluate("DirectoryEntryBase.allInstances").GetAsCollection();

if (iecDEB.Count > 0)

{

for (int i = 0; i < olRules.Count; i++)

{

NTFSAccessRule rule = olRules[i].GetValue<NTFSAccessRule>();

rule.Role = FindDEB(iecDEB, rule.SID);

if (rule.Role != null)

{

rule.SetReady();

}

else

{

rule.SetUnknown();

}

}

}

EcoSpace.UpdateDatabase();

DateTime en = System.DateTime.Now;

this.Text = en.Subtract(st).TotalMilliseconds.ToString();

}

--------------

I have a subroutine FindDEB which is my holdup now.  Thoughts on how I may optimize this... on 1000 executions, it takes about 3.75 seconds.  It would tickle me pink to improve on this.  Suggestions?

private DirectoryEntryBase FindDEB(IElementCollection inDEBCollection, string inSID)

{

DirectoryEntryBase rc = null;

foreach (IElement ie in inDEBCollection)

{

DirectoryEntryBase deb = ie.GetValue<DirectoryEntryBase>();

if ((deb.SID == inSID) && (rc == null))

{

rc = deb;

}

}

return rc;

}



Thanks in advance, Tim McKay

// ECO VI (6.0.0.5610) Microsoft Visual Studio.NET 2010 (10.0.40219.1 SP1Rel)

// Host OS : Mac OS X 10.6.8 running Parallels for Mac Build 6.0.12090
(Revision 660720; May 26, 2011); Guest OS : Windows XP SP3 (32-bit)
Post #642
Posted 2008-10-24 00:12:34
Supreme Being

Supreme BeingSupreme BeingSupreme BeingSupreme BeingSupreme BeingSupreme BeingSupreme BeingSupreme Being

Group: Administrators
Last Login: 2010-11-30 12:17:13
Posts: 1 230, Visits: 1 382
Since the list that you do lookups on in FindDEB is the same all the time, the easiest solution would probably be to add all the objects in that list into a Dictionary once and use that for the lookup instead.



/Jonas Hogstrom [CapableObjects]
Post #646
Posted 2008-10-25 13:47:23
Supreme Being

Supreme BeingSupreme BeingSupreme BeingSupreme BeingSupreme BeingSupreme BeingSupreme BeingSupreme Being

Group: Forum Members
Last Login: 2011-07-14 17:05:00
Posts: 290, Visits: 2 617
Hiya

Here are the changes I would make

01: I would make the state an enumeration

public enum RuleState
{
Loaded = 1,
Discovered = 2,
Ready = 3
}


In the model then state that your attribute type is RuleState, and set your persistence mapper to GenericEnumAsInteger

Your OCL in future would be
->select(rootState = #Loaded)


I think an integer will be marginally faster in the DB, but at least take less space :-)


02: You are using .GetValue<> a lot, you could make your code easier to follow by doing this instead

foreach(NTFSAccessRule currentRule in olRules.GetAsIList<NTFSAccessRule>
//currentRule is an NTFSAccessRule now, not an IObject


03: Your loop to find the DEB reads like this...

if ((deb.SID == inSID) && (rc == null))
  rc = deb;


So it continues to search for the deb even after it has found it. You could have written

if (deb.SID == inSID)
  return deb;


04: The biggest bottleneck here (as Jonas pointed out) is that you are doing a linear search for the DEB. Put the DirectoryEntryBase list into a dictionary instead.

DateTime st = DateTime.Now;

IModifiableVariableList vars;
vars = EcoSpace.VariableFactory.CreateVariableList();
vars.AddConstant("_state", "#RoleDiscovery";

var olRules = EcoSpace.OclPs.Execute(null, vars,
  "NTFSAccessRule.allInstances->select(RootState=_state)", 1000, 0);
olRules.EnsureObjects();

var entryBaseBySID = new Dictionary<string, DirectoryEntryBase>();
foreach (var currentBase in EcoSpace.Ocl.Evaluate("DirectoryEntryBase.allInstances".GetAsIList<DirectoryEntryBase>())
  entryBaseBySID.Add(currentBase.SID, currentBase);

foreach (var currentRule in olRules.GetAsIList<NTFSAccessRule>())
{
  DirectoryEntryBase foundRole;
  if (entryBaseBySID.TryGetValue(currentRule.SID, out foundRole))
  {
    currentRule.Role = foundRole;
    currentRule.SetReady();
  }
  else
    currentRule.SetUnknown();
}


05: Finally. If you want to save memory you need to do all of this in a temporary EcoSpace instance instead

using (var tempEcoSpace = new MyEcoSpaceType())
{
  tempEcoSpace.Active = true;
  //Rest of code here using tempEcoSpace instead of EcoSpace
}


Warning I wrote this in notepad, so probably wont compile :-)



====
Pete
Post #664
« Prev Topic | Next Topic »


Reading This Topic Expand / Collapse
Active Users: 0 (0 guests, 0 members, 0 anonymous members)
No members currently viewing this topic.
Forum Moderators: HansKarlsen, Jonas Hogstrom, PeterMorris

Permissions Expand / Collapse

All times are GMT +1:00, Time now is 11:51

Powered By InstantForum.NET v4.1.4 © 2012
Execution: 0,281. 9 queries. Compression Disabled.