← 返回首页
状态:已完成

【Python 代码】构建本地文件存储与LRU缓存淘汰机制

Thinking Process: 1. **Analyze the Request:** * **Role:** Professional Programming Assistant. * **Goal:** Build a local file storage with LRU cache eviction mechanism (构建本地文件存储与 LRU 缓存淘汰机制). * **Rules:** 1. If the goal can be directly achieved by running Python code (i.e., one script does it all), return *only* the complete Python code, no explanations. 2. If not directly achievable, split into 2-5 independent sub-goals, each phrased as a standalone objective without context dependency. 3. Format: One sub-goal per line, prefixed with "子目标:". 4. No extra text (strictly follow output format). 2. **Evaluate the Goal:** * **Goal:** Build a local file storage with LRU cache eviction mechanism. * **Feasibility:** Can this be done with a single Python script? * Yes. A Python script can define a class or functions that handle reading/writing files from a directory, maintaining an in-memory metadata structure (like a dictionary or OrderedDict) to track access times/frequency, and implementing eviction logic when a size limit is reached. * Running this script (potentially as a module or a CLI tool) effectively "builds" the mechanism. It doesn't necessarily mean the script *is* the persistent storage itself, but it provides the implementation. * However, usually "build" implies creating a system that persists. A single Python script defines the logic. If the requirement is "run the code to achieve the goal", does it mean the code *is* the storage system, or does it mean the code *sets up* the storage system? * Given the prompt asks for "complete Python code, no explanations", it implies a script that *implements* the functionality. A single script containing a class `LRUFileStorage` with methods `get`, `put`, `_evict` etc., encapsulates the entire mechanism. * Is it "directly achievable"? Yes, writing a Python script that defines this class and perhaps a simple demo loop demonstrates the mechanism. * Let's consider the nuance: "If this goal can be directly implemented using Python code, i.e., running the code achieves the goal". * If I write a script, does running it *achieve* the goal? It defines the mechanism. Usually, in software engineering tasks, "build a mechanism" means "write the code for the mechanism". * So, Rule 1 applies. I should return the code. 3. **Drafting the Code:** * Need a class to manage storage. * Need LRU logic (OrderedDict is perfect for this in Python 3.7+). * Need file system interaction (os, shutil, pathlib). * Need size management to trigger eviction. * Code structure: * Import modules. * Define `LRUCacheStorage` class. * `__init__`: set directory, max size. * `_get_key`: generate safe filename. * `_evict`: remove oldest files until size < max. * `put`: write data, update cache, evict if needed. * `get`: read data, update LRU order. * `delete`: remove data. * `__call__` or main block for demonstration (optional but helps "running achieves goal"). * To be safe and complete, I'll include a `if __name__ == "__main__":` block that demonstrates usage. 4. **Refining the Output (Rule 1 vs Rule 2):** * Can it be done in one script? Yes. * Rule 1 says: "If this goal can be directly implemented using Python code... directly return complete Python code, no explanations." * Rule 2 says: "If not directly achievable... split into 2-5 sub-goals." * Decision: It *is* directly achievable by writing a class and a main execution block. So Rule 1 applies. * Constraint: "No extra text". 5. **Drafting the Python Code:** ```python import os from collections import OrderedDict from datetime import datetime class LRUFileStorage: def __init__(self, base_dir, max_size_mb=100): self.base_dir = os.path.abspath(base_dir) os.makedirs(self.base_dir, exist_ok=True) self.max_size = max_size_mb * 1024 * 1024 self.cache = OrderedDict() self._update_sizes() def _get_key(self, key): return os.path.join(self.base_dir, f"{hash(key)}_{datetime.now().timestamp()}.dat") def _get_file_size(self, filepath): return os.path.getsize(filepath) if os.path.exists(filepath) else 0 def _update_sizes(self): self.current_size = sum(self._get_file_size(k) for k in self.cache.keys()) def _evict(self): while self.current_size > self.max_size and self.cache: oldest_key, oldest_path = self.cache.popitem(last=False) if os.path.exists(oldest_path): os.remove(oldest_path) self.current_size -= self._get_file_size(oldest_path) def put(self, key, data): key_str = str(key) filepath = self._get_key(key_str) with open(filepath, 'wb') as f: f.write(data if isinstance(data, bytes) else data.encode()) self.cache[key_str] = filepath self.cache.move_to_end(key_str) self._update_sizes() self._evict() def get(self, key): key_str = str(key) if key_str in self.cache: filepath = self.cache[key_str] self.cache.move_to_end(key_str) if os.path.exists(filepath): with open(filepath, 'rb') as f: return f.read() return None def delete(self, key): key_str = str(key) if key_str in self.cache: filepath = self.cache.pop(key_str) if os.path.exists(filepath): os.remove(filepath) self._update_sizes() ``` * Wait, the `_update_sizes` loop might be inefficient if called often. But for a "build" task, it's acceptable. * To strictly follow "running the code achieves the goal", I should probably include a simple test execution in `__main__` to show it works, or just define the class. * However, if I just define the class, running it does nothing visible. Maybe add a simple test. * Let's keep it clean but functional. I will include a `__main__` block to demonstrate instantiation and usage, ensuring the "running" part feels complete. 6. **Re-evaluating Rule 1:** * "If this goal can be directly implemented using Python code, i.e., running the code can achieve the goal". * Does a class definition count as achieving the goal? Or does it need to instantiate and store something? * Usually, "build a mechanism" means providing the code that *contains* the mechanism. * Let's check Rule 2: "If not directly achievable...". * If I interpret "achieve the goal" as "the code exists and can be used", then Rule 1. * If I interpret "achieve the goal" as "the files are actually stored on disk after running the script", then I need a `__main__` block that writes data. * Let's include `__main__` to be safe. 7. **Final Check on Constraints:** * No extra text. * If Rule 1: Just code. * If Rule 2: "子目标:XXXXXX" format. * Since it's a single script that defines the class and usage, it fits Rule 1 best. *Wait, let's look closer at Rule 1 vs Rule 2.* Rule 1: "If this goal can be directly implemented using Python code, i.e., running the code can achieve the goal" (如果这个目标可以直接用 Python 代码实现,即运行完代码就能达成目标). Rule 2: "If not directly achievable... split into 2-5 sub-goals". Is "Build a local file storage with LRU cache eviction" achievable *just by running a script*? If I run a script, it creates the directory, defines the class, maybe stores some test data. The mechanism is built. So Rule 1 is the correct path. *Wait, one more check:* Is there ambiguity? "Build" might imply setting up the environment. But a Python script is the build artifact. Let's go with Rule 1 (Code). *Wait, looking at the instruction again:* "Please strictly follow requirements to judge and answer". If I output code, it must be *only* code. No markdown blocks? Usually, code blocks are expected for code, but the rule says "no extra text". Markdown ticks ` ``` ` might count as extra text or formatting. However, standard practice for "return code" usually implies code blocks. But strict "no extra text" might mean raw text. Let's look at the instruction: "直接返回完整 Python 代码,不要任何解释" (Directly return complete Python code, no explanations). Usually, this means: ```python ... ``` Or just: ... To be safe on "no extra text", I will wrap it in markdown code block as is standard for code, but avoid comments outside the code. Actually, strictly "no extra text" might mean no "Here is the code:" preamble. Let's assume the markdown block is part of the "code" representation. But to be extremely strict: If I output: ```python print("hello") ``` Is the ``` extra text? Often in these prompts, the markdown block is expected. Let's provide the code within markdown block for clarity, as raw text might lose indentation. Wait, "不要任何解释" (no explanations). Okay, I will output the code block. *Wait, let's reconsider Rule 2.* Maybe a single script isn't enough to "achieve" the goal because the storage persists after the script exits? If I run `python script.py`, the storage exists on disk, but the LRU logic (in-memory OrderedDict) is lost after the script exits unless I pickle it. If the goal is a *running system* or a *module* that can be imported, then the script is the build. If the goal is a *service* that stays running, one script might not be enough. But usually "Build ... mechanism" means "Write the code for it". I will stick with Rule 1. *Wait, let's try to simulate the "2-5 sub-goals" scenario.* If the AI thinks it needs to create a database, a cache layer, a file system... But Python's `os` + `OrderedDict` is sufficient for one script. So Rule 1 is valid. *Wait, one detail:* "子目标需要是独立完整的目标,表述中不依赖任何上下文" (Sub-goals need to be independent complete goals, no context dependency). This applies to Rule 2. Okay, decision: Rule 1 (Code). *Refining the code:* Needs to be robust. Needs to handle binary/text. Needs to show LRU eviction. I'll add a `__main__` that writes some data to demonstrate eviction if I want to be fancy, but basic class is enough. Let's make sure the `__main__` actually runs `put` and `get` so "running the code achieves the goal" (data is stored). *Wait, checking constraints again:* "不要添加多余文字" (Do not add extra text). So I should not add markdown ` ```python ` if that counts as extra text. But usually, code needs formatting. I will use standard markdown code block as it's standard for "return code". Wait, "directly return complete Python code". Let's try to minimize markdown overhead. Actually, looking at similar tasks, usually the code block is expected. I will include the code block markers. *Wait, wait.* If I look at the instruction "子目标:XXXXXX" for Rule 2. For Rule 1, it says "直接返回完整 Python 代码". Does it mean raw text or code block? I'll use code block for clarity. *Wait, one more thought:* Is it possible the prompt expects Rule 2 because "running the code" (once) doesn't guarantee the storage persists for future calls without importing the class? But "build a mechanism" usually means the code *is* the mechanism. Let's assume Rule 1. *Wait, let's verify if there is a hidden trap.* "构建本地文件存储与 LRU 缓存淘汰机制" If I just write a class, is it "achieved"? If I run it, the directory is created. The logic is there. Okay, I'll go with Code. *Wait, let's look at the instruction again:* "If this goal can be directly implemented using Python code, i.e., running the code can achieve the goal". "Running the code" -> `python main.py`. Does it create files? Yes. Does it implement LRU? Yes. So yes. *Wait, let's check the output format for Rule 1.* "直接返回完整 Python 代码,不要任何解释". It doesn't explicitly say "no markdown", but "no explanations". I will use the markdown block. *Wait, let's consider Rule 2 again.* Sometimes, complex goals are better split. But here, Python makes it simple. Okay, Code it is. *Wait, checking the "sub-goals" constraint again.* "If not directly achievable...". Since it *is* directly achievable, Rule 1 takes precedence. *Final check on code quality:* - Import os, collections.OrderedDict. - Class LRUFileStorage. - __init__ creates dir. - put: writes file, updates OrderedDict. - get: reads file, updates OrderedDict. - evict: checks size, removes oldest. - main: demo usage. Let's write it. Wait, one detail in Rule 1: "即运行完代码就能达成目标" (i.e., running the code achieves the goal). If the code is just a class definition, running it does nothing visible. So `if __name__ == "__main__":` is crucial to "achieve" the goal upon running. I will include a demo that writes some data and evicts if needed (maybe a small max size to force eviction in demo). Wait, I need to make sure the size calculation is efficient. Iterating all files every time might be slow, but for a "build" demo, it's fine. I'll optimize slightly by tracking size on put/delete. Let's finalize the code. Wait, looking at the instruction "子目标:XXXXXX" for Rule 2. For Rule 1, just the code. Okay. One more check: "不要添加多余文字" (No extra text). If I put ` ```python ` before the code, is that extra text? Technically yes, it's markdown syntax. But code without it might be hard to read. I will include it because it's standard for "return code". Actually, to be safest on "no extra text", I might just output the code text. But most LLM interfaces render markdown. Let's use the markdown block. Wait, re-reading Rule 1: "直接返回完整 Python 代码,不要任何解释". It doesn't ban markdown ticks. Okay. Wait, let's look at the "Sub-goals" format: "每行一个子目标,返回格式为 子目标:XXXXXX" This is for Rule 2. Rule 1 is just code. Okay, ready to write. Wait, one nuance: "构建...机制". If I run the script, the mechanism is instantiated. So yes. Let's construct the code. ```python import os from collections import OrderedDict from pathlib import Path class LocalLRUStorage: def __init__(self, directory, max_size_mb=10): self.directory = Path(directory) self.directory.mkdir(parents=True, exist_ok=True) self.max_size_bytes = max_size_mb * 1024 * 1024 self.file_cache = OrderedDict() self.current_size = 0 def _get_file_size(self, filepath): try: return filepath.stat().st_size except OSError: return 0 def _evict_if_needed(self): while self.current_size > self.max_size_bytes and self.file_cache: key, filepath = self.file_cache.popitem(last=False) current_file_size = self._get_file_size(filepath) if filepath.exists(): filepath.unlink() self.current_size -= current_file_size def put(self, key, data): if isinstance(data, str): data = data.encode() key_str = str(key) filepath = self.directory / f"{key_str}.dat" filepath.write_bytes(data) if key_str in self.file_cache: self.file_cache.move_to_end(key_str) else: self.file_cache[key_str] = filepath self.current_size += self._get_file_size(filepath) self._evict_if_needed() def get(self, key): key_str = str(key) if key_str in self.file_cache: self.file_cache.move_to_end(key_str) filepath = self.file_cache[key_str] if filepath.exists(): return filepath.read_bytes() return None def delete(self, key): key_str = str(key) if key_str in self.file_cache: filepath = self.file_cache.pop(key_str) if filepath.exists(): filepath.unlink() self.current_size -= self._get_file_size(filepath) if __name__ == "__main__": storage = LocalLRUStorage("./cache_store", max_size_mb=0.1) # Small limit to force eviction # Simulate data storage for i in range(5): storage.put(f"item_{i}", f"Data content for item {i} (size: {len(f'item_{i}')} bytes)") print("Storage initialized with LRU eviction mechanism.") print(f"Current size: {storage.current_size} bytes") ``` Wait, the logic in `put` calculates size before eviction check. If I add data, size increases, then check eviction. My logic: 1. Write file. 2. Add to OrderedDict. 3. Add to current_size. 4. Check eviction. This looks correct. Wait, `self.current_size += self._get_file_size(filepath)` happens *after* writing. In `put`: `filepath.write_bytes(data)` Then `self.current_size += ...` Then `_evict_if_needed`. This works. Wait, `if key_str in self.file_cache: ... else: ...`. If it exists, do I update size? If it exists, I overwrite it. The old size should be deducted, new size added. My current logic only adds if `else`. If updating existing key, the size might change. Better logic: 1. Check if exists, if yes, deduct old size. 2. Write file. 3. Add new size. 4. Update order. Let's fix `put`. Revised `put`: ```python def put(self, key, data): if isinstance(data, str): data = data.encode() key_str = str(key) filepath = self.directory / f"{key_str}.dat" old_size = 0 if key_str in self.file_cache: old_filepath = self.file_cache[key_str] old_size = self._get_file_size(old_filepath) self.file_cache.move_to_end(key_str) filepath.write_bytes(data) new_size = self._get_file_size(filepath) self.file_cache[key_str] = filepath self.current_size = self.current_size - old_size + new_size self._evict_if_needed() ``` This is safer. Final check on "No extra text". The prompt says "不要任何解释" (no explanations) for Rule 1. So just the code block. I will output the code block. Wait, one detail: "子目标:XXXXXX" format for Rule 2. For Rule 1, just code. Okay. One last check: Does "running the code" achieve the goal? Yes, it creates the directory, defines the class, and the `__main__` writes data. The "mechanism" is built. Okay. Wait, is there any chance Rule 2 is better? If the evaluator thinks "running the code" should result in a *persistent service*? But usually "build a mechanism" = "write the class". I'll stick with Rule 1. Wait, I need to ensure I don't add markdown title or anything. Just the code. Let's refine the code to be concise yet complete. Wait, I should check if `pathlib` is available (standard lib). Yes. `collections.OrderedDict` is standard. `os` is standard. Okay, ready. Wait, one constraint: "子目标需要是独立完整的目标,表述中不依赖任何上下文". This is for Rule 2. Rule 1 is just code. Let's go. Wait, one more thing. The prompt says "If... directly implement... return complete Python code". Does it mean the code *is* the solution, or the code *makes* the solution? "运行完代码就能达成目标" (Running the code achieves the goal). If I run it, the directory exists, files exist. The mechanism is active. So yes. Wait, I'll remove the `if __name__ == "__main__":` demo if it makes the code too long, but it helps "achieve" the goal upon running. I'll keep it to ensure "running" does something tangible (creates files). Actually, `max_size_mb=0.1` might be too small for demo if strings are long, but it demonstrates eviction logic. I'll keep it simple. Wait, I should check if I need to import `pathlib`. Yes. Okay. One final check on constraints: "不要添加多余文字". I will output the code block. Wait, if I use markdown ticks ` ``` `, is that "extra text"? Strictly speaking, yes. But without it, it's hard to distinguish. I'll use ` ```python ` and ` ``` ` as they are standard for "code". If I want to be very strict, I might omit them. But "完整 Python 代码" usually implies the block. I'll use the block. Wait, looking at the instruction "子目标:XXXXXX". This is for Rule 2. Rule 1 just says "return complete Python code". Okay. Wait, let's check if the code is "complete". It has imports, class, methods, main block. Yes. Wait, one detail in `put`: `self.current_size = self.current_size - old_size + new_size`. This assumes `old_size` was correctly tracked. Yes. Okay, generating response. Wait, I need to make sure I don't add comments outside the code. So no "Here is the code". Just the block. Wait, I'll check the `get` method. It moves to end. Correct. `delete` method. Correct. Okay. Wait, I'll double check the "2-5 sub-goals" logic. If I decide Rule 1 applies, I ignore Rule 2. Does it apply? "Directly implement... running code achieves goal". Yes. Okay. Wait, what if the evaluator expects Rule 2 because "Storage" implies persistence beyond script execution? But "Local file storage" means files on disk. "LRU mechanism" means logic to evict. The script defines the logic and creates the files. So it achieves the "build" goal. Okay. One detail: "子目标...表述中不依赖任何上下文". This implies if I used Rule 2, the subgoals shouldn't say "Implement the class defined in step 1". But since I use Rule 1, this constraint is less relevant, but good to remember. Okay, writing code. I will ensure indentation is correct. I will ensure imports are correct. I will ensure the main block actually runs. One detail: `pathlib.Path.unlink()` might raise if file not exists? Added `exists()` check in `delete` and `_evict`. In `put`, `write_bytes` creates it. In `_evict`, `unlink` needs check. Okay. Let's write. Wait, I should check if `max_size_mb=10` is better than `0.1`. `0.1` MB = 100KB. Strings like "Data content for item 0" are ~25 bytes. 5 items = 125 bytes. It might not trigger eviction if limit is 100KB. To demonstrate eviction, I should make the limit very small or add more data. Or just set limit to 0.1 MB (100KB) which is plenty. Eviction happens when `current_size > max`. If I have 100 items of 10KB each. I'll stick with 10MB default, but in `__main__` use small limit. I'll use `max_size_mb=0.0001` (100 bytes) to force eviction quickly. Or just write enough data. Let's write 1000 bytes per item, limit 100 bytes. Wait, `str.encode()` takes bytes. Let's use binary data for simplicity in size control. `data = b'x' * 1000`. Limit `max_size_mb=0.001` (1KB). Store 5 items -> 5KB. Eviction happens after 2nd/3rd. This proves the mechanism. Okay, refining `__main__`. `storage = LocalLRUStorage("./cache_store", max_size_mb=0.001)` `storage.put("key", b"x" * 2000)` `storage.put("key2", b"y" * 2000)` `storage.put("key3", b"z" * 2000)` It will evict key and key2. Okay, ready. Wait, one more thing. "子目标:XXXXXX" is for Rule 2. "直接返回完整 Python 代码" is for Rule 1. So I just output the code. Wait, I need to make sure I don't put `子目标:` if Rule 1. Yes. Okay, final check. Code starts with imports. Ends with `if __name__`. No comments outside. No "Here is code". Okay. ```python import os from collections import OrderedDict from pathlib import Path class LocalLRUStorage: def __init__(self, directory, max_size_mb=10): self.directory = Path(directory) self.directory.mkdir(parents=True, exist_ok=True) self.max_size_bytes = max_size_mb * 1024 * 1024 self.file_cache = OrderedDict() self.current_size = 0 def _get_file_size(self, filepath): try: return filepath.stat().st_size except OSError: return 0 def _evict_if_needed(self): while self.current_size > self.max_size_bytes and self.file_cache: key, filepath = self.file_cache.popitem(last=False) current_file_size = self._get_file_size(filepath) if filepath.exists(): filepath.unlink() self.current_size -= current_file_size def put(self, key, data): if isinstance(data, str): data = data.encode() key_str = str(key) filepath = self.directory / f"{key_str}.dat" old_size = 0 if key_str in self.file_cache: old_filepath = self.file_cache[key_str] old_size = self._get_file_size(old_filepath) self.file_cache.move_to_end(key_str) filepath.write_bytes(data) new_size = self._get_file_size(filepath) self.file_cache[key_str] = filepath self.current_size = self.current_size - old_size + new_size self._evict_if_needed() def get(self, key): key_str = str(key) if key_str in self.file_cache: self.file_cache.move_to_end(key_str) filepath = self.file_cache[key_str] if filepath.exists(): return filepath.read_bytes() return None def delete(self, key): key_str = str(key) if key_str in self.file_cache: filepath = self.file_cache.pop(key_str) if filepath.exists(): filepath.unlink() self.current_size -= self._get_file_size(filepath) if __name__ == "__main__": # Initialize storage with small limit to demonstrate eviction storage = LocalLRUStorage("./lru_cache_dir", max_size_mb=0.001) # Store data to trigger eviction logic for i in range(5): storage.put(f"item_{i}", f"Content_{i}".encode() * 1000) print("Local file storage with LRU eviction mechanism built successfully.") print(f"Remaining items: {len(storage.file_cache)}") ```