blog

Drawing a house construction on a laptop and next to it there is the construction of the house

Between Software and Real Estate projects

I recently attended a conference that was partly on Real Estate policy. And since I was involved in projects across the spectrum from software development through data analysis and optimization up to land management / Real Estate development, I guess I have a say on that. Software engineering is a young discipline, when I learned it, it was more of an art form where guru artisans were building their own object libraries and carrying then from project to project. There were no QA teams and information security standards meant that you had to have a magnetic card to get access to the computer room. Now, of course things are different, and when planning a software or data project, you have to choose between various ecosystems and navigate between industry standards, but still while not an infant, SW engineering is rapidly evolving. Real Estate on the other hand, was dealt with since Sumerians and Egyptians discovered geometry to measure plots and orient buildings. So one major difference is historical depth. The other issue is physics. When dealing with software and data, you’re not dealing with physics, your actually working with applied math. Real Estate is all about classical physics, mainly statics, but you get to have some dynamic modelling once you’re at the extreme size, height or elevation. When you’re not bounded by physics (at least not by classical one) you have to responsibility to control Entropy and by so try to keep it simple as possible, either by reducing the system variables or its dynamic behavior. Design & Planning From Use Case perspective, Real Estate projects are dealing with several simple ones that revolve placing and moving physical objects through structural bindings. In software projects the Use cases tend to be more involved, however in modern IT projects, you actually use IT products as large building blocks and in both environments the project architect is responsible for the design stage. Approval of the design stage in Real Estate is usually given by the local municipal authority, due to buildings residing in the same space. With software, modern architecture is enabling isolation from neighboring processing, but in some ecosystems (like Apple) you do have to have some kind approval by the monarch. Execution Execution of the planed design is usually very straight forward in the building industry, you dig foundations, get building materials in various completion stages into the building site and stack them up using various machines and manual labor. There isn’t much automation going on and innovation is mainly focused on materials and prefabrication methods. In the software industry and IT, in contrast there is a great deal of RPA (Robotic Process Automation) due to the fact that in contrast to the building industry, the main cost driver is labor cost, while in the building industry it’s the cost of materials. Monitoring and process control In both industries process control is performed at the production site during the fabrication (real or virtual) and the end of each phase there

Illustration of a police check on a criminal that may correspond to a crime committed

The contextless project manager

In its earliest uses (documented in the 15th century), context meant “the weaving together of words in language.” This sense, now obsolete, developed logically from the word’s source in Latin, contexere “to weave or join together.” Context now most commonly refers to the environment or setting in which something (whether words or events) exists. When we say that something is contextualized, we mean that it is placed in an appropriate setting, one in which it may be properly considered (Merriam-Webster dictionary). People are driven by context – When your spouse is asking you to drive the kids to the city, you react differently then if a you get the same request from a bloodied bully, who suddenly opens your car’s door, while pushing a half-dressed kid inside. Project management exists within a certain context – You have real-estate project managers and software project manager, and if you sit them together you’ll see that although they deal with the same problems, there is hardly any transitivity in skills. You don’t see software project managers applying for job in the building industry and vice versa. This specificity is common in various fields that require training and domain language. If you train for a marathon, fit as you are, you’ll seldom be considered fit, if required to swim a mile in competitive time. Similarly, jargon that is developed in one “tribal” domain, may have totally different meaning in another. Take for example IM acronym that means Incident Management in the Infosec tribe and Instant Messaging in the software development tribe. PMI’s PMBOK is a nice try in distilling the common knowledge required for Project management, however, reading a general car repair manual, will not make a mechanic, besides the metadata, you need to have specific documentation for the car’s model as well as tools of the trade, and a good amount of grease. The real question I’m pondering with, is whether there can be a General Project Manager position, similarly to the general management profession for which various CEOs usually get nice compensation due to their effort. Or being tactical in nature, Project management discipline require you to chose whether to be a marathon runner or a swimmer. This question can be answered when examining CEOs that actually shift between industries. Those individuals have two attributes in common. The first is: They got the talent. You usually have to be above average player to get to this position, unless your daddy owns the company. And the second attribute, is that that those that jump between the industries are actually experts’ decision makers in money management, or people management, and thus disconnected themselves from domain specific constraints and jargon. The bottom line is that you can be a General Project Manager (GPM), provided you’re overly talented and dealing with projects that are big enough to isolate your decision making to people behavior and money. But of you are competent enough to play this field, why not call yourself a CEO and get better pay grade?

AI Reducing False Positive in InfoSec SOC

Some algebraic background False positive is when you system is alerting the users for some event or anomaly, while there is none. In other words it’s a classification error that causes extra effort that is not needed, and thus needs to be reduced. In Security Operation Center – The analysts are working on security events created by various IS controls, and as the number of false positive events grow, the SOC efficiency drops because the analyst is supposed to resolve a false events instead of working on the true positives. False positives are inherently problematic in environments were the population of events have low incidence rate that is lower than the false positive rate. In this case the false positive rate will always be greater from the incidence rate! This is not intuitive at all, as we can see in this example: Let’s assume we have an organization with 10K computers and virus infection of 1% of the population with Antivirus reporting on it with false positive rate of 5% In this case the number of infected computers is: 0.01*10,000=100 computers infected (True positive) And the false positive number of computers (unaffected but indicated with a virus) will be: 0.99*10,000*0.05=495 (the number of non-infected doubled by the false positive rate. In this case we can see that we got more false positives than true positives as well as if we look at the probability of the case indicate by IMS to the SOC as positive, we can see that the probability of it’s being truly infected is only: 100/(100+495) = 17% and this is where the antivirus is expected to give 95% accurate results! Using supervised learning to reduce workload SOC analysts are recording cases decisions after the analysis – This data is mandatory and should be chosen from a closed list, so that one can easily use it as a training data – We had to do with unstructured text field, and using regexs and other methods to extract if the case was indeed a true or false positive. The next step was to chose an analytical environment – If the organization have data analytics platform you can live within it – Most have by now AI modules that are accessed from the UI and wrap tried and tested libraries like SCI-Kit or Torch embedded in products like Splunk or Tableau. In our case the available environment was Splunk, so we worked with it and avoided the tool selection process. Choosing the model was rather trivial – You basically select from various existing classification algorithms in Splunk, run them on the test data, and chose the best one according the the confusion matrix – Actually the most accurate one was random forest in our case. Once we had the model, we used it to provide SIEM (Security Information and Event Management) with prediction of case truthfulness score, which is basically the expected probability of it being true accurate. This information was presented to the Analysts so that they could

From Insider Trading to AI & Fraud analysis

I recently listened to Ira Lee Sorkin, this guy was Bernard Madoff’s lawyer, and he sure can tell a story! Those old lawyer have a way with words. Anyways, he talked about the history of insider information trading and fraud schemes practically from the beginning of his career and what I took from it is that the you have to establish gains to the one giving the information (even if it is just merely friendship) as well as some other breach of agreement between the owner of the information and the leaker. He also talked about the wolf from Wall Street and how controlling the float allows stock dumping, as well as on Ponzi scheme. The sentence “unfair advantage” was never mentioned in the presentation and this led me thinking about the following: Let’s say I have devised an AI predictor for a certain stock or commodity with low liquidity. This predictor is forecasting the stock movement with probability of let’s say 60%. Now I publish my forecasts anonymously for free to a trading forum, and with time people will start to follow them since it allows for profit over simple gambling. The fact that I will have a growing crowd of followers will do two things – One is that I will have control over the virtual float (I will not own the stock but I can control its movement). The second is that this mechanism has positive feedback loop, the more people believe in the prediction then more the prediction will be true. Now, all one has to do is to reverse the prediction once in a while and go against the market. Since the prediction is generating reality one can earn from phase shifting, I.e you sell just short it before the price reduction due to the prediction. In the last part of his talk, Ira talked about the lack of resource that regulatory agencies suffer from with regard to the amount of data and the fact that it’s hard to identify fraud in such a large scope. On this I beg to differ – Google and facebook are doing quite nicely identifying trends and preferences at a larger scale. Here is a nice article on Fraud identification using Sergey’s Page Rank principle in Fraud detection. It’s perfectly possible just not feasible.

Dealing with a security event as a mini project

Prolog I’m usually more inclined to regular routine – One builds an MRD, agree with stakeholders on the road map, do the planning, some HLD, get the message and priorities to the software mines, some LLD, test… and live with the results. Yet sometimes life gives cashews to those accustomed to peanuts – Several years ago, while leading an Cyber defense project for a large multinational, I got late night text message, to come early next morning to the office, for an urgent meeting. This was somewhat in correlation to me uploading some sniffing software (for learning purposes of course). So I had a sleepless night thinking about how can I stuff all my belonging to a small cardboard case, in such a short notice. Anyways, morning came and when I entered the meeting blurry eyes and all, it was explained to me to postpone my comfortable routine, and followup on a security incident that has developed overnight. the ISO 27002, the NIST, the CERT/CC and a few other standards define step-by-step processes for incident handling. So I’ll try to describe them in order with some sugar added. Detection As with most large companies, a large part of networked corporate activity comes from M&A or other vertical integration efforts, even with vendors. There is always an attention to assimilate those entities into the security infrastructure of the enterprise or mitigate risks that relate to perimeter movement. In this case the firm operated several franchises, and in one of those semi independent branches, a user experienced unusual activity on a computer. People are used to some automation done by IT but those have regular pattern and it looked as if some script was running in a non transparent way. This vigilant person called the company’s SOC and some inspection showed that there is a Power-Shell based malware on his computer. Power-Shell is actually a good sysadmin tool that unfortunately can be used, and is used as malware engine. An analyst was awaken at night to review the logs that showed that the malware existed on other computers as well. Next it was verified that the script has already posted a ciphered payload to on the the known bots C&C centers. Hysteria was building and the CEO was notified (I guess his sleep wasn’t good either). The Triage Taken from the medical jargon, Triage is severity assessment of the patient, and this one seemed very ill especially since a payload was extracted from the company’s premises and the detection was done by chance only. It was decided that we go into ER routine which brought old memories from armed forces. People act differently when they are told an enemy has breached the camp’s fences. Suddenly you have management attention and business people change their priorities including availability at short notice and accepting instructions that are usually ignored. Nothing like old FUD (Fear, Uncertainty & Doubt) to motivate the crowds. Analysis & Incident response The usual process for treatment is linear, you collect the data,

Mathematical formulas are centered together in different colors

My two cents on anomaly detection, AI neural networks and the future of all

“All models are wrong but some of them are useful” (George Box). Data sciences is just a new name for some old math with new computer power. Yet the hype is not totally unjustified. The market One the supply side: Cheap computing power, open source software, huge data-sets availability and knowledge accessibility is making it a whole new ball game. On the demand side of the curve: Organizations are dealing with growing amounts of data as machine instrumentation is growing, and IOT is being established as well the eco systems that are generating revenues directly from data collection and analysis (Google, Facebook). Data driven decision making has well established track-record, yet the immense quantity, diversity and rate of data aggregation – Create a growing need for solutions. The end result is a “Blue Ocean” that should should restructure the information revolution into the knowledge revolution. There is simply no choice the market dynamics are pulling the equilibrium to the right. The technological gap Data analysts were customary the ones who bridged the gap between data availability and the demand for meaning. Data analysis can simply described as providing meaning to the data. Meaning is our model of reality, and the data analyst job is to chose the model / hypothesis wisely and test it according the available data / evidence. if the model can’ t be rejected then there is a good “probability” that the model useful. However, Data analysts are human (Well, most of the time since this profession has a tendency to attract the the tails in the normal distribution). And humans have a limited processing ability. More importantly they don’t scale well – This can be demonstrated by the “Mythical Men Month” paradigm as well as a quick observation of the efficiency of large organizations. One can’ t really scale reasoning without getting stuck in Condorcet Paradox or Arrow’s impossibility theorem. So bridging the gap is a question of replacing human intellect used in problem solving. Most of the talk now is lead by aggressive pessimists like Elon Musk, or Stephan Hawking who are top work dogs by their nature, and the idea of Skynet controlling their existence is intimidating to them. But for most of us, replacing an incompetent or corrupt government with an unbiased predictable entity, and getting UBI while watching reality shows instead of sweating our life in the job trenches – is a promising future for most of the population. The process automation Currently Data sciences are composed of various algorithms that are useful in different situations. Analysts first create the algorithms, usually in the academia. Then they choose which algorithm to implement in each context, which is more of an engineering problem, and finally they do the implementation on actual data including dealing with the problems of messy and missing data. Lastly they review the results and change the model accordingly until at end they get to satisfaction of delivery. This is no different from “regular science” but can we implement this process, less the scientist? It turns out, we can.

Data quality revisited

When I was at school there was lot’s of hype concerning the TQM acronym. “Total Quality Management” – It seemed that quality improvement was the way to get more revenues because the less failures you had in your product line the less likely customers would stop buying your stuff and start buying Japanese products. It was in the days were American cars were slow and under engineered and Japanese cars were less likely to leave you stranded in the middle of a junction due to some electrical malfunction. Lots of effort and money went into Six sigma programs, Quality circles and relearning from the Japanese the lessons they originally got from W. E. Deming. Deming was a professor of statistics, who was brought to Japan as part of Gen. Douglas MacArthur post WWII initiative to rebuild Japan’s economy and he was the one who taught Japanese industrialists SPC (Statistical Process Control) that later on the Americans relearned from them again. Somehow, things have changed. Now we live in a universe where quality does not seem to play such a relevant part. Short TTM (Time To Market) rules the VCs point of view, where you get the MVP (Minimal Viable Product) as fast as you can to the customers in order to get market feedback and make the required changes. Customers in turn are not heavily invested in an application since pricing models are built according to usage, while most got used to marginal quality in applications to begin with. Hardware prices are dropping exponentially, so basically equipment investment is disposable. As for Statistical Process Control, who cares about sampling when one has Big Data, Deep-learning and Hadoop like technologies to process all of it? Well, it’s all about the ratio of data to measurement capacity. Back in the fifties when engineers were using slide rulers for calculations, it made sense to sample. Now that data is being gathered in an increasing rate and Moor’s law is diminishing, we are soon to get back to the old tactics of sifting through the piles. The basics Data quality is by definition target oriented. You invest in quality only so far as to reduce negative effects, and up to a certain cost due to diminishing returns. That means that you have to work from the target backwards. For example if your target is invoicing and you get 10% returned mail. You might invest a considerable effort in correcting addresses when a large debt is involved (dunning up the the cost of debth) and only a slight effort if it’s ‘just” regulatory requirement. Logical entities or subsystems Once we established that the DQ business is target oriented, we have to define the target. A target can be loosely defined in the same was a product is defined. A desired outcome that can be further segmented to sub systems like in a PBS. For example if your product is a CRM product you have to consider sub systems like contact-center automation, Marketing campaigns, orders dispatching etc. The problem with this type of segmentation is

שיעור הסטוריה – איך להגדיל את הצע הקרקעות ולמה הבריטים עזבו מוקדם מדי

בתקופה האחרונה מתרכזים רבות בתפקיד של קבינט הדיור ככלי להגדלת הצע הקרקעות במדינת ישראל. וכן בייעול של תהליכי משילות. דוגמה מעניינת לכך אפשר למצוא בהיסטוריה המודרנית של הקולוניאליזם. בשנות העשרים של המאה שעברה תהה המושל של ציילון הבריטית (כיום סרי לנקה). כיצד הבריטים הצליחו לשלוט ולנהל את תת היבשת ההודית על שלוש מאות מיליון תושביה עם חמש מאות פקידים, בעוד שהצרפתים נכשלו בניהול קמבודיה עם מאתיים פקידים כשאוכלוסייתה מנתה כמיליון וחצי איש בלבד. דוגמה לניהול של סביבה צפופת אוכלוסין, ניתן למצוא באופן הניהול הבריטי של הונג-קונג והטריטוריות החדשות שמצפון לה. רק כדי להזכיר, האי של הונג-קונג נכבש על ידי הבריטים כחלק ממלחמת האופיום הראשונה באמצע המאה התשע עשרה כשהיה כמעט ריק מתושבים ותוך כחמישים שנה, הפך כתוצאה מהגירה כלכלית, לאחד המקומות המאוכלסים בעולם. הגדלת הצפיפות בצירוף לחשש מגידול ההשפעה הצרפתית, גרמה לבריטים לשאת ולתת עם הסינים על צירוף קרקעות נוספים להונג-קונג בהיקפים של כאלף קילומטר מרובעים (כמעט פי שנים עשר) משטח האי המקורי. ממשלת סין באותה תקופה (שושלת קינג) חתמה על הסכם חכירה של תשעים ותשע שנים עם הבריטים, אבל השבטים הסיניים ששכנו בטריטוריות החדשות הרגישו זנוחים והתנגדו. באלף שמונה מאות ותשעים ותשע התארגנו הכפרים באזור שמנו כמאה אלף תושבים ויצרו מיליציה מקומית שתקפה את הבריטים, לא בהצלחה. הבריטים בחרו באופן פרגמטי להימנע מפעילות של פינוי אוכלוסייה אזרחית והתגברו על התנגדויות במספר דרכים: הם התחייבו לא לפגוע במסורת הפוליגמית של התושבים, ועיגנו בחקיקת מקרקעין וקדסטר מסודר כולל בחוקי ירושה, את הזכויות של התושבים המקומיים באדמה, תחת יעוד חקלאי. חוקים אלו היו שונים מהחקיקה באי הונג-קונג עצמו והוו למעשה שתי מערכות ניהול מקרקעין נפרדות. האחת לאי והשנייה לטריטוריות החדשות. עם השנים והגידול באוכלוסין של הונג-קונג רבתי, נוצרה מצוקת דיור, ועקב כך נוצר הצורך בשינויי יעוד קרקע מחקלאי לעירוני. הבריטים הוציאו לאוכלוסייה של הטריטוריות החדשות מכתבי אופציות שהציעו להם למכור את הקרקע החקלאית שברשותם במחיר ידוע, או לקבל התחייבות שתמורת הקרקע שתשמש לפיתוח עירוני יקבלו קרקע חלופית ביחס מסוים, באזור עירוני שיבנה – מתי שהוא בעתיד! למרות שהתחייבות זו לא הייתה תחומה בזמן רוב בעלי הקרקע בחרו באפשרות השנייה ואפשרו פיתוח עירוני מהיר של שטחים חקלאיים ללא תזרים מזומנים ממשלתי. ברור ששיטה זו אינה נטולת פגמים, בין השאר מהיותה חשופה לניצול ספקולטיבי, וכן כתוצאה מיצירה של התחייבות עתידית ממשלתית לאספקת קרקע, שמצריכה פתרון עתידי. אבל היא אפשרה אורבניזציה מהירה ללא קונפליקטים מיותרים עם האוכלוסייה. במהלך השנים, עם הצמצום במקורות קרקע חקלאיים, קיבלה על עצמה רשות המקרקעין של הונג-קונג, גם את האחריות למחזור קרקע לפרוייקטי פינוי-בינוי של חוכרים באזורים אורבניים, תוך ביצוע תחלופה של מבנים ישנים בני שלוש עד שש קומות, בבניה צפופה יותר. אחריות זו מגובה בחקיקה ומאפשרת התגברות על חסמים של ריבוי בעלויות והתנגדויות, שיזמים פרטיים מתקשים להתגבר עליהם. ניהול המקרקעין באזור נחשב למוצלח יחסית, וההבנה שהמשכיות בנושא היא אחד מהתנאים ההכרחיים לממשל תקין, גרמה לכך שגם לאחר העברת השלטון בהונג-קונג לסין, ניהול המקרקעין, החוכרים ותהליכי שחרור הקרקע ממשיכים להתנהל בוועדה בריטית סינית משותפת עד לאלפיים ארבעים ושבע. צפיפות האוכלוסין בהונג-קונג (כשבעת אלפים איש לקמ”ר) צפיפות האוכלוסין בישראל (כשלוש מאות אנשים לקמ”ר). יש מה ללמוד.

Delivering Project & Product Management as a Service