首页 - 技术栈

惠州响应式网站建设wordpress 聚美主题

作者: 五速梦信息网
时间: 2026年06月19日 10:55

当前位置：首页 > news >正文

惠州响应式网站建设,wordpress 聚美主题,站长工具网站推广,seo新方法目的线性回归是很常用的模型#xff1b;在局部可解释性上也经常用到。数据归一化归一化通常是为了确保不同特征之间的数值范围差异不会对线性模型的训练产生过大的影响。在某些情况下#xff0c;特征归一化可以提高模型的性能#xff0c;但并不是所有情况下都需要进行归一…目的线性回归是很常用的模型在局部可解释性上也经常用到。数据归一化归一化通常是为了确保不同特征之间的数值范围差异不会对线性模型的训练产生过大的影响。在某些情况下特征归一化可以提高模型的性能但并不是所有情况下都需要进行归一化。归一化的必要性取决于你的数据和所使用的算法。对于某些线性模型比如线性回归和支持向量机数据归一化是一个常见的实践因为它们对特征的尺度敏感。但对于其他算法如决策树和随机森林通常不需要进行归一化。在实际应用中建议根据你的数据和所选用的模型来决定是否进行归一化。如果你的数据特征具有不同的尺度并且你使用的是那些对特征尺度敏感的线性模型那么进行归一化可能会有所帮助。否则你可以尝试在没有归一化的情况下训练模型然后根据模型性能来决定是否需要进行归一化。对新数据进行归一化处理 new_data_sample_scaled scaler.transform(new_data_sample)# 使用模型进行预测 predicted_value model.predict(new_data_sample_scaled) 这样就能确保在预测新数据时特征的尺度与训练数据保持一致。 MinMaxScaler底层代码 class MinMaxScaler Found at: sklearn.preprocessing.dataclass MinMaxScaler(BaseEstimator, TransformerMixin):def init(self, feature_range(0, 1), copyTrue):self.feature_range feature_rangeself.copy copydef _reset(self):Reset internal data-dependent state of the scaler, if necessary.init parameters are not touched.# Checking one attribute is enough, becase they are all set together# in partialfitif hasattr(self, scale):del self.scale_del self.min_del self.n_samples_seen_del self.data_min_del self.data_max_del self.data_range_def fit(self, X, yNone):Compute the minimum and maximum to be used for later scaling.Parameters———-X : array-like, shape [n_samples, n_features]The data used to compute the per-feature minimum and maximumused for later scaling along the features axis.# Reset internal state before fittingself._reset()return self.partial_fit(X, y)def partial_fit(self, X, yNone):Online computation of min and max on X for later scaling.All of X is processed as a single batch. This is intended for caseswhen fit is not feasible due to very large number of n_samplesor because X is read from a continuous stream.Parameters———-X : array-like, shape [n_samples, n_features]The data used to compute the mean and standard deviationused for later scaling along the features axis.y : Passthrough for Pipeline compatibility.feature_range self.feature_rangeif feature_range[0] feature_range[1]:raise ValueError(Minimum of desired feature range must be smaller than maximum. Got %s. % str(feature_range))if sparse.issparse(X):raise TypeError(MinMaxScaler does no support sparse input. You may consider to use MaxAbsScaler instead.)X check_array(X, copyself.copy, warn_on_dtypeTrue, estimatorself, dtypeFLOAT_DTYPES)data_min np.min(X, axis0)data_max np.max(X, axis0)# First passif not hasattr(self, n_samplesseen):self.n_samplesseen X.shape[0]else:data_min np.minimum(self.datamin, data_min)data_max np.maximum(self.datamax, data_max)self.n_samplesseen X.shape[0] # Next stepsdata_range data_max - dataminself.scale (feature_range[1] - feature_range[0]) / _handle_zeros_in_scale(datarange)self.min feature_range[0] - data_min * self.scale_self.datamin data_minself.datamax data_maxself.datarange data_rangereturn selfdef transform(self, X):Scaling features of X according to feature_range.Parameters———-X : array-like, shape [n_samples, n_features]Input data that will be transformed.check_isfitted(self, scale)X check_array(X, copyself.copy, dtypeFLOAT_DTYPES)X * self.scale_X self.min_return Xdef inverse_transform(self, X):Undo the scaling of X according to feature_range.Parameters———-X : array-like, shape [n_samples, n_features]Input data that will be transformed. It cannot be sparse.check_isfitted(self, scale)X check_array(X, copyself.copy, dtypeFLOAT_DTYPES)X - self.min_X / self.scale_return X 数据分箱 n_bins [5] kb KBinsDiscretizer(n_binsn_bins, encode ordinal) kb.fit(X[selected_features]) X_trainkb.transform(X_train[selected_features]) from sklearn.preprocessing import KBinsDiscretizer import joblib# 创建 KBinsDiscretizer 实例并进行分箱 est KBinsDiscretizer(n_bins3, encodeordinal, strategyuniform) X_binned est.fit_transform(X)# 保存 KBinsDiscretizer 参数到文件 joblib.dump(est, kbins_discretizer.pkl)# 加载 KBinsDiscretizer 参数 loaded_estimator joblib.load(kbins_discretizer.pkl)# 使用加载的参数进行分箱 X_binned_loaded loaded_estimator.transform(X)from sklearn.preprocessing import KBinsDiscretizerdef save_kbins_discretizer_params(estimator, filename):params {n_bins: estimator.n_bins,encode: estimator.encode,strategy: estimator.strategy,# 其他可能的参数}with open(filename, w) as f:for key, value in params.items():f.write(f{key}: {value}\n)# 创建 KBinsDiscretizer 实例并进行分箱 est KBinsDiscretizer(n_bins3, encodeordinal, strategyuniform)# 保存 KBinsDiscretizer 参数到文本文件 save_kbins_discretizer_params(est, kbins_discretizer_params.txt)KBinsDiscretizer 的源代码 KBinsDiscretizer 的源代码参数包括n_bins指定要创建的箱的数量。 encode指定编码的方法。可以是onehot、onehot-dense、ordinal中的一个。 strategy指定分箱的策略。可以是uniform、quantile、kmeans中的一个。 dtype指定输出数组的数据类型。 bin_edges_一个属性它包含每个特征的箱的边界。以下是 KBinsDiscretizer 类的源代码参数的简要说明n_bins用于指定要创建的箱的数量。默认值为5。 encode指定编码的方法。可选值包括 onehot使用一热编码。 onehot-dense使用密集矩阵的一热编码。 ordinal使用整数标签编码。默认为 onehot。 strategy指定分箱的策略。可选值包括 uniform将箱的宽度保持相等。 quantile将箱的数量保持不变但是每个箱内的样本数量大致相等。 kmeans将箱的数量保持不变但是使用 k-means 聚类来确定箱的边界。默认为 quantile。 dtype指定输出数组的数据类型。默认为 np.float64。 bin_edges_一个属性它包含每个特征的箱的边界。这是一个列表其中每个元素都是一个数组表示相应特征的箱的边界。您可以在 sklearn/preprocessing/_discretization.py 中找到 KBinsDiscretizer 类的完整源代码以查看详细的参数和实现细节。