Getting different responses! 🤔
Answered
Dwarf Crocodile posted this in #help-forum
Dwarf CrocodileOP
So, earlier, I was doing data fetching from S3 directly on frontend:
And then I would do parsing or other processing of this 'data'
(I have fetched the raw data here I guess, I am using "data.Body" later in data processing like:
)
But, to make it more secure, I moved my data fetching from S3 to backend.
Flask API endpoint:
Frontend Code:
And then I would do exact same parsing or other processing of this 'data' too.
But the thing is, these "data" values are not same. I am attaching photos of console logs for both.
1st pic: Old Code
2nd pic: New Code
const params = {
Bucket: "front-end1",
Key: selectedOption,
};
const data = await s3.getObject(params).promise();
console.log("ensemble_data----", data);
And then I would do parsing or other processing of this 'data'
(I have fetched the raw data here I guess, I am using "data.Body" later in data processing like:
const workbook = XLSX.read(data.Body, { type: "array" });
)
But, to make it more secure, I moved my data fetching from S3 to backend.
Flask API endpoint:
@app.route('/api/ensemble/<selectedOption>', methods=['GET'])
@jwt_required()
def get_ensemble_data(selectedOption):
current_user = get_jwt_identity()
if current_user['role'] not in ['admin', 'client', 'employee']:
return jsonify({'message': 'Unauthorized'}), 403
try:
response = s3.get_object(Bucket="front-end1", Key=selectedOption)
file_content = response['Body'].read()
return Response({"ensembleData": file_content})
except Exception as e:
return jsonify({"error": str(e)}), 500
Frontend Code:
const response = await axios.get(
`${baseUrl}/api/ensemble/${selectedOption}`,
{
headers: {
Authorization: `Bearer ${localStorage.getItem("token")}`,
},
}
);
const data = response.data.ensembleData;
console.log("ensemble_data----", data);
And then I would do exact same parsing or other processing of this 'data' too.
But the thing is, these "data" values are not same. I am attaching photos of console logs for both.
1st pic: Old Code
2nd pic: New Code
Answered by B33fb0n3
you shouldn't store data inside an excel file and parse and do whatever to retrieve it. s3 is a service to store files. Not data. It's also not the place to serve it efficiently. Of course you can do all this, but you will either get bugs (like you see now) or you get a huge bill (you might see this in the future).
Store data where it want to be stored and that's inside a database. This data can be efficiently created, read, updated and deleted (CRUD).
So create a database and put your data inside there. Then use a ORM like drizzle to do CRUD operations.
AWS itself has a service for databases as well ("Amazon RDS"): https://aws.amazon.com/de/rds/postgresql/
It can also be seamless integrated to drizzle: https://orm.drizzle.team/docs/connect-aws-data-api-pg
Store data where it want to be stored and that's inside a database. This data can be efficiently created, read, updated and deleted (CRUD).
So create a database and put your data inside there. Then use a ORM like drizzle to do CRUD operations.
AWS itself has a service for databases as well ("Amazon RDS"): https://aws.amazon.com/de/rds/postgresql/
It can also be seamless integrated to drizzle: https://orm.drizzle.team/docs/connect-aws-data-api-pg
16 Replies
you shouldn't store data inside an excel file and parse and do whatever to retrieve it. s3 is a service to store files. Not data. It's also not the place to serve it efficiently. Of course you can do all this, but you will either get bugs (like you see now) or you get a huge bill (you might see this in the future).
Store data where it want to be stored and that's inside a database. This data can be efficiently created, read, updated and deleted (CRUD).
So create a database and put your data inside there. Then use a ORM like drizzle to do CRUD operations.
AWS itself has a service for databases as well ("Amazon RDS"): https://aws.amazon.com/de/rds/postgresql/
It can also be seamless integrated to drizzle: https://orm.drizzle.team/docs/connect-aws-data-api-pg
Store data where it want to be stored and that's inside a database. This data can be efficiently created, read, updated and deleted (CRUD).
So create a database and put your data inside there. Then use a ORM like drizzle to do CRUD operations.
AWS itself has a service for databases as well ("Amazon RDS"): https://aws.amazon.com/de/rds/postgresql/
It can also be seamless integrated to drizzle: https://orm.drizzle.team/docs/connect-aws-data-api-pg
Answer
Dwarf CrocodileOP
actually, i am not storing the data, there are files already present.
And there are a total of 50+ files. Each very long.
But have same format.
They are basically in .csv and .xlsx format.
So, i am fetching specific file based on user's selected option in frontend.
And then I am mapping them and doing other processing.
So i need those files to be on S3 not in a DB
And there are a total of 50+ files. Each very long.
But have same format.
They are basically in .csv and .xlsx format.
So, i am fetching specific file based on user's selected option in frontend.
And then I am mapping them and doing other processing.
So i need those files to be on S3 not in a DB
well... data is data. So get the data out of your excels and into a DB. That's the only solution I want to give you and yes, I am talking from experience about those problems, that you will face:
Of course you can do all this, but you will either get bugs (like you see now) or you get a huge bill (you might see this in the future)
Dwarf CrocodileOP
Ok, will try to transfer the data to DB, but probably next month. Right now, many more files are being generated in sagemaker and transfering them to S3. Once that is done, will try to transfer.
But, for the time being, can you help whats causing this problem?
that sounds great. I am pretty sure, that sagemaker also offers a way to directly export it to your RDS database as both are services from aws
Dwarf CrocodileOP
wow, are there any tutorials or docs that I can follow?
and I think i will be able to read and process/parse the data much faster if the data is in DATABASE than fetching it from S3
and maybe use cloudfront to make it even more fast?
you are right: cloudfront is a CDN and will serve your data. When directly serving it from the origin (your s3) it can get very expensive
Dwarf CrocodileOP
Oh..no worries.
Dwarf CrocodileOP
My webpage takes a bit of time when it first loads a protein structure.
The thing is:
Every protein has a
A user selects a protein from dropdown, and then I fetch all the files of that protein.
These are large files (for some bigger proteins, they are about 5k+ lines).
Protein structures are generated by data coming from
Atom level summaries when a user clicks on specific atom in 3d structure is coming from
How can I improve my website's latency? and overall performance and speed?
The thing is:
Every protein has a
.csv
, .pdb
and some other .xlsx
files on S3.A user selects a protein from dropdown, and then I fetch all the files of that protein.
These are large files (for some bigger proteins, they are about 5k+ lines).
Protein structures are generated by data coming from
.pdb
files.Atom level summaries when a user clicks on specific atom in 3d structure is coming from
.csv
file. And so on.How can I improve my website's latency? and overall performance and speed?
you are experiencing one of many problems of saving data inside a file: the whole file need to be downloaded. And that takes time. So instead of downloading the whole file, you can only fetch the data, that you really need.
And now we are on the Database side: use a database to fetch only the data that you need.
And now we are on the Database side: use a database to fetch only the data that you need.
@Dwarf Crocodile solved?
Dwarf CrocodileOP
For some .csv or .pdb files, i need to fetch the whole data, however long it is, so for that?