CHRONOS: News Timeline Summarization
What is the project about?
CHRONOS is a retrieval-based approach for Timeline Summarization (TLS) of news articles. It generates chronological summaries by iteratively asking questions about the topic and retrieved documents.
What problem does it solve?
CHRONOS addresses the challenge of creating timelines from a large collection of news articles, particularly in open-domain scenarios where the topic is not predefined. Existing methods often struggle with efficiency and scalability in open-domain settings. CHRONOS also addresses the lack of large, up-to-date datasets for open-domain TLS.
What are the features of the project?
- Iterative Self-Questioning: The core of CHRONOS is its ability to iteratively refine its understanding of the topic by posing questions to itself and using the answers to guide further retrieval and summarization.
- Retrieval-Based: CHRONOS relies on retrieving relevant news articles from the web, making it suitable for open-domain scenarios.
- Open-Domain TLS Dataset (Open-TLS): The project introduces a new, large-scale dataset for open-domain timeline summarization, addressing the scarcity of such resources.
- Efficiency and Scalability: CHRONOS is designed to be more efficient and scalable than previous approaches, especially for open-domain TLS.
- Comparable Performance: Achieves results comparable to state-of-the-art methods in closed-domain TLS.
- Chinese Web Demo: A web demo is available for users to interact with the system.
What are the technologies used in the project?
- Python: The primary programming language.
- PyTorch: The deep learning framework used.
- Large Language Models (LLMs): Uses LLMs (specifically Qwen or GPT models, configurable via API keys) for question generation, answering, and summarization.
- BING Web Search API: Used for retrieving news articles from the internet.
- JINA API (Optional): Used for full-page content extraction (if enabled).
- Dashscope API: Used for calling Qwen.
- Streamlit (Optional): Used for creating the web demo.
- Dependencies managed via
requirements.txt
: Indicating use of standard Python libraries for NLP and web interaction.
What are the benefits of the project?
- Improved Open-Domain TLS: Provides a more effective and scalable solution for summarizing news timelines in open-domain settings.
- New Dataset: Offers a valuable new resource (Open-TLS) for researchers working on timeline summarization.
- Efficient and Scalable: The retrieval-based approach and iterative questioning make it more efficient than methods that process all documents directly.
- Easy to use: Provides clear instructions and scripts for running the model.
What are the use cases of the project?
- News Aggregation: Creating timelines of events for news aggregators and readers.
- Historical Research: Summarizing historical events from news archives.
- Trend Analysis: Identifying and tracking trends over time based on news data.
- Event Monitoring: Tracking the development of specific events or crises.
- Information Retrieval: Providing concise summaries of events related to a specific query.
