← 返回首页
状态:已完成

【Python 代码】封装完整特征提取器类并验证运行

Thinking Process: 1. **Analyze the Request:** * Role: Professional Programming Assistant. * Goal: Encapsulate a complete feature extractor class and verify its execution (封装完整特征提取器类并验证运行). * Rule 1: If the goal can be directly achieved with Python code (running the code achieves the goal), return *only* the complete Python code, no explanations. * Rule 2: If not directly achievable, split into 2-5 independent sub-goals. Each sub-goal must be complete and context-independent. Format: "子目标:XXXXXX". One per line. * Rule 3: No extra text. * Task: Determine whether the goal is directly achievable with Python code or needs splitting. 2. **Evaluate the Goal:** * Goal: "封装完整特征提取器类并验证运行" (Encapsulate a complete feature extractor class and verify execution). * Can this be done with a single Python script? Yes. * What does a feature extractor class look like? It usually takes input data (e.g., images, text, numerical arrays) and transforms them into features. * How to verify execution? Instantiate the class, pass some dummy data, and print the result. * Is it self-contained? Yes, if defined properly (e.g., using scikit-learn base classes or just a simple custom class). * Conclusion: It is directly achievable with Python code. 3. **Apply Rule 1:** * Return complete Python code. * No explanations. * No "Here is the code" text. 4. **Draft the Code:** * Need a class `FeatureExtractor`. * Needs `__init__` and `fit`/`transform` methods (standard pattern) or just a `extract` method. Let's use a standard sklearn-like interface for robustness, or a simple custom one. Let's make it generic enough. * Needs verification: Create an instance, run on sample data, print output. * Language: Python. * Dependencies: Standard library (math, random) or numpy. To be safe and "complete", let's use `numpy` if available, but standard library is safer if dependencies aren't guaranteed. However, feature extraction usually implies numerical processing. Let's use `numpy` for realism, but handle import error or just assume it. To ensure it runs "as is" without external installation, maybe use `list` processing, but `numpy` is standard for feature extraction. Let's use `numpy` for a realistic example, but add a fallback or just assume environment has it. Actually, to be strictly "runnable", standard library is best. Let's create a simple statistical feature extractor using standard library. * Wait, the prompt says "If this goal can be directly implemented using Python code... return complete Python code". * So I need to write a script that defines the class and runs a test. * Draft: ```python import numpy as np from sklearn.preprocessing import StandardScaler class FeatureExtractor: def __init__(self): self.scaler = StandardScaler() def fit(self, data): self.scaler.fit(data) return self def transform(self, data): return self.scaler.transform(data) def fit_transform(self, data): return self.fit(data).transform(data) if __name__ == "__main__": data = np.array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]) extractor = FeatureExtractor() features = extractor.fit_transform(data) print("Original Data:\n", data) print("Extracted Features:\n", features) print("Verification Successful") ``` * This requires `numpy` and `sklearn`. To be more robust without heavy dependencies, maybe just use a simple math-based extractor. But `sklearn` is standard for "feature extractor". Let's stick to a simpler one to ensure it runs even without sklearn, or assume common libs. Let's use `numpy` only for simplicity in feature extraction (mean, std). * Revised Draft (Standard Library only to ensure "runnable"): ```python import math from typing import List class FeatureExtractor: def __init__(self, features: List[str]): self.features = features def extract(self, data: List[float]) -> dict: if len(data) != len(self.features): raise ValueError("Data length mismatch") result = {} for name, val in zip(self.features, data): result[name] = val # Add calculated features if len(data) > 0: result['mean'] = sum(data) / len(data) result['sum'] = sum(data) return result if __name__ == "__main__": extractor = FeatureExtractor(['x1', 'x2']) sample_data = [10.0, 20.0] extracted = extractor.extract(sample_data) print("Extracted Features:", extracted) ``` * This is definitely runnable. 5. **Final Check against Rules:** * Rule 1: Direct implementation -> Return code. * Rule 2: (Not applicable here) * Rule 3: No extra text. Wait, I need to decide if it counts as "directly achievable". Yes, a script defining a class and running a test block is a single file execution that achieves the goal. Let's refine the code to be a bit more "professional" (e.g., sklearn style) but keep it runnable. Actually, to be safe on dependencies, I'll use `numpy` as it's very standard for feature extraction tasks, but if I want to be 100% sure without pip install, standard lib is better. Let's use `numpy` because "feature extraction" strongly implies numerical arrays. Most coding environments have numpy. But to be strictly "runnable" without environment setup, standard lib is safer. Let's go with a class that processes a list of numbers. Wait, looking at the instruction "If this goal can be directly implemented using Python code... return complete Python code". Okay, I will provide the code. One more check: Does the prompt imply the code should be the *only* output? Yes ("不要任何解释", "不要添加多余文字"). Let's write a clean, self-contained class. ```python import numpy as np class FeatureExtractor: def __init__(self): self.is_fitted = False def fit(self, X): self.mean_ = np.mean(X, axis=0) self.std_ = np.std(X, axis=0) self.is_fitted = True return self def transform(self, X): if not self.is_fitted: raise Exception("Must fit before transform") return (X - self.mean_) / self.std_ def fit_transform(self, X): return self.fit(X).transform(X) if __name__ == "__main__": data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) extractor = FeatureExtractor() processed_data = extractor.fit_transform(data) print("Original Data:") print(data) print("Processed Features:") print(processed_data) ``` This requires `numpy`. If the environment doesn't have it, it fails. To be safest, let's use standard library `statistics` and `math`. ```python import math from typing import List class FeatureExtractor: def __init__(self): self.mean = [] self.std = [] def fit(self, data: List[List[float]]): if not data: return self n_cols = len(data[0]) self.mean = [] self.std = [] for i in range(n_cols): col = [row[i] for row in data] self.mean.append(sum(col) / len(col)) variance = sum((x - self.mean[-1]) ** 2 for x in col) / len(col) self.std.append(math.sqrt(variance)) return self def transform(self, data: List[List[float]]): result = [] for row in data: new_row = [] for i, val in enumerate(row): if self.std[i] == 0: new_row.append(0) else: new_row.append((val - self.mean[i]) / self.std[i]) result.append(new_row) return result def fit_transform(self, data: List[List[float]]): return self.fit(data).transform(data) if __name__ == "__main__": data = [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]] extractor = FeatureExtractor() features = extractor.fit_transform(data) print("Feature Extraction Verified:") for row in features: print(row) ``` This relies only on standard library. Perfect. 6. **Final Decision:** Since a single script can define the class and run the verification, Rule 1 applies. Output must be *only* the code. Wait, one detail: "如果这个目标可以直接用Python代码实现...请直接返回完整Python代码,不要任何解释。" So no markdown code block ticks? Or usually yes? "直接返回完整Python代码". Usually implies the raw text or code block. Given "不要任何解释", I should probably wrap it in a code block or just raw text. Usually, for "complete Python code", a markdown block is standard, but strict "no extra text" might mean raw. However, to distinguish it from text, code blocks are safer. But rule 3 says "不要添加多余文字". Markdown ticks ```python are technically extra text characters. Let's look at similar tasks. Usually, code blocks are expected. But to be strict, I will output the code inside a block to ensure formatting, but the instruction says "directly return complete Python code". Actually, looking at Rule 1: "directly return complete Python code". I will use the markdown code block for clarity but minimize extra text. Wait, if I add ```python it might count as extra text. Let's check Rule 3: "不要添加多余文字". If I write: ```python code ``` The ticks are text. Let's try to output raw text that looks like code. Actually, most LLM outputs for code use markdown. Let's assume the code block markers are acceptable as formatting, but to be extremely strict, I'll provide the code inside the block. Wait, Rule 2 says "每行一个子目标,返回格式为 子目标:XXXXXX". That implies specific formatting. Rule 1 says "直接返回完整Python代码". I will provide the code block. Wait, re-reading Rule 1: "如果这个目标可以直接用Python代码实现...请直接返回完整Python代码,不要任何解释。" It doesn't explicitly forbid markdown ticks, but "no explanation" usually means no prose. Let's just output the code. Wait, I need to check if "Feature Extractor" implies specific external libraries like Pandas/NumPy often. If I write code that needs `import numpy` and the user runs it without `pip install numpy`, it fails. Is it "directly implemented"? Yes, assuming a standard environment. But to be safe, standard library is better. I will use the standard library version drafted above. Wait, one more check on Rule 1 vs Rule 2. Is it possible to split it? Yes, but since it *can* be directly implemented, Rule 1 takes precedence. So I must output Code. Let's finalize the code. ```python import math from typing import List class FeatureExtractor: def __init__(self): self.mean = [] self.std = [] def fit(self, data: List[List[float]]): if not data: return self n_cols = len(data[0]) self.mean = [] self.std = [] for i in range(n_cols): col = [row[i] for row in data] self.mean.append(sum(col) / len(col)) variance = sum((x - self.mean[-1]) ** 2 for x in col) / len(col) self.std.append(math.sqrt(variance)) return self def transform(self, data: List[List[float]]): result = [] for row in data: new_row = [] for i, val in enumerate(row): if self.std[i] == 0: new_row.append(0) else: new_row.append((val - self.mean[i]) / self.std[i]) result.append(new_row) return result def fit_transform(self, data: List[List[float]]): return self.fit(data).transform(data) if __name__ == "__main__": data = [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]] extractor = FeatureExtractor() features = extractor.fit_transform(data) print("Feature Extraction Verified:") for row in features: print(row) ``` Wait, I should remove the markdown ticks to be strictly "no extra text" if interpreted literally. But usually, code blocks are expected. Let's include the ticks as they are standard for "code". Actually, looking at the instruction "直接返回完整Python代码", often means the text content. Let's include the ticks for clarity, as raw text might be rendered as text. Okay, I will output the code block. Wait, looking at the instruction again: "每行一个子目标,返回格式为 子目标:XXXXXX". This is for Rule 2. For Rule 1, it just says "return complete Python code". Okay. Wait, one nuance: "如果这个目标可以直接用Python代码实现...". Does "feature extractor" imply a specific domain (like images)? If so, `PIL` might be needed. But without context, a generic numerical feature extractor is the safest interpretation of "complete feature extractor class". I will stick with the numerical one. Final check on constraints: 1. Rule 1 applies (Direct implementation). 2. Return code only. 3. No explanation. Let's produce the output. ```python import math from typing import List class FeatureExtractor: def __init__(self): self.mean = [] self.std = [] def fit(self, data: List[List[float]]): if not data: return self n_cols = len(data[0]) self.mean = [] self.std = [] for i in range(n_cols): col = [row[i] for row in data] self.mean.append(sum(col) / len(col)) variance = sum((x - self.mean[-1]) ** 2 for x in col) / len(col) self.std.append(math.sqrt(variance)) return self def transform(self, data: List[List[float]]): result = [] for row in data: new_row = [] for i, val in enumerate(row): if self.std[i] == 0: new_row.append(0) else: new_row.append((val - self.mean[i]) / self.std[i]) result.append(new_row) return result def fit_transform(self, data: List[List[float]]): return self.fit(data).transform(data) if __name__ == "__main__": data = [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]] extractor = FeatureExtractor() features = extractor.fit_transform(data) print("Feature Extraction Verified:") for row in features: print(row) ```