Data Science & Advanced Analytics

 View Only
  • 1.  Machine Learning Model for Cost Estimate

    This message was posted by a user wishing to remain anonymous
    Posted 6 days ago
    This message was posted by a user wishing to remain anonymous

    Dear All,

    I have been developing ML for Cost Estimate especially in oil and gas projects which attributed to features important, correlation factors and cost curve.

    The model was developed using Python within Github repository. The models are using modern statistic i.e. Random Forest, Gradient Boosting, SVR, Ridge, etc.

    The outcome is a predictive model which come out with error ranges +/-. 

    Can I consider the errors derived from the model as Class Accuracy ? or just simply using (P90-P50)/P50 for high range and (P10-P50)/P50 for low range

    Thanks.



    -------------------------------------------


  • 2.  RE: Machine Learning Model for Cost Estimate

    Posted 5 days ago

    Unknown publisher,

    Thank you fir sharing this, I'm very interesting and intrigued on what the thoughts on the behind, so this model requires some patterns of cost based on data, so is the data us in one industry specific how can be use for other one? I'm trying to learn more about data analysis, Tableau and python, how I can learn more about your idea? Thank you Ruth



    ------------------------------
    Ruth Gonzalez Rodriguez
    Cost Engineer
    CNS, LLC
    Powell TN
    ruthmet111@yahoo.com
    ------------------------------



  • 3.  RE: Machine Learning Model for Cost Estimate

    Posted 5 days ago

    I would advise caution before jumping straight to P90/P50 calculations. Using advanced algorithms like Gradient Boosting, Random Forest or the one you have suggested requires that the underlying data supports the complexity of the model.

    Before interpreting the output as a valid 'Class Accuracy,' using AACE Guidance you should consider these steps:

    1. Data Suitability: Ensure your features actually have predictive power. For regression models (Ridge), check P-values, maybe you can use L-jung Box to verify autocorrelation Lag-1, or Lag-2 for Tree-based models (RF/GBM), check Feature Importance to ensure the model isn't fitting noise.

    2. Residual Diagnostics: As you are proposing a range, you must validate the error distribution. A Q-Q Plot of standardized residuals is essential here to ensure the errors follow a normal distribution. Independence: Check for Autocorrelation if your project data is time-dependent (time-series).

    If the residuals are not normal, using a simple standard deviation or mean-based calculation for P10/P90 will give you a false sense of accuracy

    Using machine learning a set of algorithms in supervised learning without following thorough statistical analysis would be worthless and not productive because the data you are using is not relevant to be used for prediction. 

    Also, keep in mind that AACE is just a guidance what it is going to prevail is whether the dataset is relevant. 



    ------------------------------
    Brahim Seddiki
    Ottawa ON
    seddikib@gmail.com
    ------------------------------